Modular•10mo ago

Would you be interested in BigInt support?

I did some programming exercises in Mojo and noticed that BigInt support is missing, and Mojo should probably support BigInt at some point, seeing as Python uses arbitrary precision integers by default. Are there any other users missing this? Would it make sense for me to start working on this as a Mojo stdlib PR or is this the sort of thing Mojo would rather have as a 3rd party library for now?

42 Replies

benny•10mo ago

Can you elaborate on what you mean by BigInt? The Int type in Mojo is arbitary precision if I am interpreting your question correctly, are you refering to a BigInt DType?

bowlermanOP•10mo ago

Int is currently not arbitrary precision. It's a pointer width integer (32 or 64 bits). IntLiteral is arbitrary precision, but cannot be used at runtime.

Ryulord•10mo ago

One of the cool things about Mojo is you can literally just write a struct with some inline MLIR and make your own BigInt that works exactly the way an official one would. Not saying there should never be a stdlib version but I doubt it's going to be a major blocker to anyone for this reason

bowlermanOP•10mo ago

Of course it's not going to be a blocker. You can probably write an efficient one without inline MLIR as well, but any implementation will take a bit of time. If there was a way to make a package easily usable by the ecosystem I would just do that instead, but as things are I'm significantly less motivated to build a third party library for something like this than make a PR to the stdlib. Either way I'd like to know which approach makes more sense for Modular.

Darkmatter•10mo ago

I would like it because rust interop is hard without 128 bit ints, and small width ints are the easiest way to support binding to C bitfields. Ideally, this can ride on top of the _BitInt(N) and unsigned _BitInt(N) support already in LLVM due to C23.

benny•10mo ago

what would be really cool is Int[Width] / Float[Width] / UInt[Width]

sb•10mo ago

Is there any reason to do bigint as fixed length types instead of using const generics? Truly arbitrary bit widths would give a bit more freedom for transformations later that don't respect high bits

Darkmatter•10mo ago

Float should ideally be separate sign/exponent/mantissa parts.

benny•10mo ago

nice idea actually, might try and implement later

bowlermanOP•10mo ago

That's something different from what I'm talking about. You only need a family of types with increasing precision. I'm talking about a single type that can store values of arbitrary precision (perhaps "unbounded dynamic precision" would be a better term). Python doesn't require you to select the precision before running the code.

Darkmatter•10mo ago

Runtime dynamic precision ints are messy. They start to get very slow because they’re very difficult to make fast, if you allow choosing a “large enough” fixed-width integer, you can get better code gen. IIRC, java’s BitInt spends most of its time branching rather than actually doing number crunching.

sb•10mo ago

It's sounding (to me) more like you want a library for symbolic math, maybe with some automatic simplification

bowlermanOP•10mo ago

Sure, you can lose a lot of time to cache misses, and selecting algorithms based on precision also costs a bit. It seems like the slowness in Java specifically is not inherent though: https://codeforces.com/blog/entry/17235 Still, because python uses runtime dynamic precision ints by default Mojo should probably do so as well for compatibility and flexibility. Deciding on an appropriate precision is an optimization that can be done when desired.

Codeforces

BigNum arithmetic in Java — Let's outperform BigInteger! - Codeforc...

Darkmatter•10mo ago

Outside of scientific code, how many people are actually dealing with numbers over 2^63-1? Defaulting int to 64 bit is probably enough for most people and brings massive perf gains.

bowlermanOP•10mo ago

Not at all. I want an integer type that matches the semantics of the python integer type. I think in python it makes sense mostly for flexibility reasons.

Darkmatter•10mo ago

Having an “int” type that’s a bigint is fine, but mojo currently defines Int as essentially ptrdiff_t. We can use the capitalization to make the distinction.

bowlermanOP•10mo ago

Ideally, you could implicitly convert to any of the integer types from IntLiteral, which means that doing the optimization would amount to adding a type annotation, which I think fits very nicely with Mojo's philosophy.

Darkmatter•10mo ago

I want to make sure the big perf hit is explicitly opt-in.

sb•10mo ago

How often do people get bitten by ints "only" being 64 bits? (this is one of the things that contributed to python being as painfully slow as it is, so doing it by default would be a huge perf hit for a lot of people)

Darkmatter•10mo ago

Even if people hit 64 bits, add 128 bit ints, that number becomes much smaller. The only people who I could see using 256 and 512 bit are cryptographers.

sb•10mo ago

That would be a neat use case to add them for...

Darkmatter•10mo ago

But cryptographers want larger and larger fields over time, so arbitrarily compile-time precision seems like the best option here. For instance, I don’t think I’ve ever seen something that would wrap a UInt4096 even in runaway code.

bowlermanOP•10mo ago

It is my understanding that Mojo is not supposed to be reevaluating the semantics of Python, but rather provide ways to easily optimize it. The perf hit can be mitigated by using tagged enums to avoid requiring indirections for small integers.

Darkmatter•10mo ago

Enums mean no SIMD unless the platform has scatter/gather, which would mean tossing out all of the SIMD in Mojo right now for ARM. Neon scatter/gather comes with a big list of issues. And SVE support for Mojo is not a thing.

sb•10mo ago

Even using tagged enums, you need affordances for the various container sizes and would stop being about to pass around bare ints in GP registers quite as easily

bowlermanOP•10mo ago

That's true. An optimized implementation would want to select integer precision manually, but in that case 64 bits might not always be the right choice either.

Darkmatter•10mo ago

And you are forced to pass on the stack by the SYS-V ABI.

sb•10mo ago

(and a lot of optimizations nowadays come from allowing reordering, both at compile time and at runtime) Stack engines nowadays should keep up relatively well, but yeah...that too

ModularBot•10mo ago

Congrats @sb, you just advanced to level 1!

bowlermanOP•10mo ago

https://github.com/modularml/mojo/issues/328#issuecomment-1579468329

GitHub

[BUG] Mojo object integer representation is not arbitrary precisi...

Bug description def factorial(n): if n == 0 or n == 1: return 1 else: return n * factorial(n-1) print(factorial(20)) # Expect 2432902008176640000 2432902008176640000 print(factorial(21)) # Expect 5...

bowlermanOP•10mo ago

Whether the default "integer type" should be arbitrary precision is technically still up for debate, but at least object needs to match the python semantics, so there has to be an arbitrary precision integer there somewhere. It's possible to rely on PythonObject for this, but that means that you need an embedded or external python runtime.

Darkmatter•10mo ago

So you want an IntLiteral -> PythonObject cast?

bowlermanOP•10mo ago

I don't think that's good enough for the long term. It seems unnecessary to require the presence of python here. A IntLiteral -> object cast using PythonObject could be a workable temporary solution though.

Darkmatter•10mo ago

Just using object would also work. Anyone using def has already declared they don’t care about perf so they can have BigInt as a default for all I care.

bowlermanOP•10mo ago

I'm fine with this. I think that it's better for the ecosystem to settle on 64 bit integers as being the de facto default, though I think that we should require type annotations rather than having a compiler blessed default in fns. I also don't like the Int naming, but not because I think dynamic integers should be called Int. Those can be called DynInt for all I care.

sb•10mo ago

I think I agree on that one actually--I'm of the mind people should be explicitly choosing their concrete types, including with explicitly sized scalars (akin to rust's iN or C/++ intN_t)

Darkmatter•10mo ago

Int is the largest size that fits in a register, which I think is important to have.

sb•10mo ago

iNative or something? Sounds like isize with extra confusion

bowlermanOP•10mo ago

Sure, but I don't think it should be called Int, as though it is the canonical integer type. I do like how Rust does it, but I'm obviously biased as a Rust user.

Darkmatter•10mo ago

This is probably for C/C++ devs, since int is supposed to be register sized until Intel decided it wouldn’t be because of back-compat.

Darin Simmons•10mo ago

Oh hey, I was just reading up on this completely independently and it came up in the search today...

Darin Simmons•10mo ago

It's come up in a couple of bugs recently, this one is two bugs in one. There's a bitcasting bug as well as a mojo-python object vs. python-mojo object conversion issue https://github.com/modularml/mojo/issues/3146

GitHub

[BUG] numpy type problem · Issue #3146 · modularml/mojo

Bug description This really surprises me --- the example below shows using "2 * np.pi" and "np.pi * 2" yield different results. Changing the multiplier from 2 to 2.0 and the pro...

Gaming

Programming

Would you be interested in BigInt support?

Did you find this page helpful?