Would you be interested in BigInt support?

I did some programming exercises in Mojo and noticed that BigInt support is missing, and Mojo should probably support BigInt at some point, seeing as Python uses arbitrary precision integers by default. Are there any other users missing this? Would it make sense for me to start working on this as a Mojo stdlib PR or is this the sort of thing Mojo would rather have as a 3rd party library for now?
42 Replies
benny
benny5mo ago
Can you elaborate on what you mean by BigInt? The Int type in Mojo is arbitary precision if I am interpreting your question correctly, are you refering to a BigInt DType?
bowlerman
bowlermanOP5mo ago
Int is currently not arbitrary precision. It's a pointer width integer (32 or 64 bits). IntLiteral is arbitrary precision, but cannot be used at runtime.
Ryulord
Ryulord5mo ago
One of the cool things about Mojo is you can literally just write a struct with some inline MLIR and make your own BigInt that works exactly the way an official one would. Not saying there should never be a stdlib version but I doubt it's going to be a major blocker to anyone for this reason
bowlerman
bowlermanOP5mo ago
Of course it's not going to be a blocker. You can probably write an efficient one without inline MLIR as well, but any implementation will take a bit of time. If there was a way to make a package easily usable by the ecosystem I would just do that instead, but as things are I'm significantly less motivated to build a third party library for something like this than make a PR to the stdlib. Either way I'd like to know which approach makes more sense for Modular.
Darkmatter
Darkmatter5mo ago
I would like it because rust interop is hard without 128 bit ints, and small width ints are the easiest way to support binding to C bitfields. Ideally, this can ride on top of the _BitInt(N) and unsigned _BitInt(N) support already in LLVM due to C23.
benny
benny5mo ago
what would be really cool is Int[Width] / Float[Width] / UInt[Width]
sb
sb5mo ago
Is there any reason to do bigint as fixed length types instead of using const generics? Truly arbitrary bit widths would give a bit more freedom for transformations later that don't respect high bits
Darkmatter
Darkmatter5mo ago
Float should ideally be separate sign/exponent/mantissa parts.
benny
benny5mo ago
nice idea actually, might try and implement later
bowlerman
bowlermanOP5mo ago
That's something different from what I'm talking about. You only need a family of types with increasing precision. I'm talking about a single type that can store values of arbitrary precision (perhaps "unbounded dynamic precision" would be a better term). Python doesn't require you to select the precision before running the code.
Darkmatter
Darkmatter5mo ago
Runtime dynamic precision ints are messy. They start to get very slow because they’re very difficult to make fast, if you allow choosing a “large enough” fixed-width integer, you can get better code gen. IIRC, java’s BitInt spends most of its time branching rather than actually doing number crunching.
sb
sb5mo ago
It's sounding (to me) more like you want a library for symbolic math, maybe with some automatic simplification
bowlerman
bowlermanOP5mo ago
Sure, you can lose a lot of time to cache misses, and selecting algorithms based on precision also costs a bit. It seems like the slowness in Java specifically is not inherent though: https://codeforces.com/blog/entry/17235 Still, because python uses runtime dynamic precision ints by default Mojo should probably do so as well for compatibility and flexibility. Deciding on an appropriate precision is an optimization that can be done when desired.
Darkmatter
Darkmatter5mo ago
Outside of scientific code, how many people are actually dealing with numbers over 2^63-1? Defaulting int to 64 bit is probably enough for most people and brings massive perf gains.
bowlerman
bowlermanOP5mo ago
Not at all. I want an integer type that matches the semantics of the python integer type. I think in python it makes sense mostly for flexibility reasons.
Darkmatter
Darkmatter5mo ago
Having an “int” type that’s a bigint is fine, but mojo currently defines Int as essentially ptrdiff_t. We can use the capitalization to make the distinction.
bowlerman
bowlermanOP5mo ago
Ideally, you could implicitly convert to any of the integer types from IntLiteral, which means that doing the optimization would amount to adding a type annotation, which I think fits very nicely with Mojo's philosophy.
Darkmatter
Darkmatter5mo ago
I want to make sure the big perf hit is explicitly opt-in.
sb
sb5mo ago
How often do people get bitten by ints "only" being 64 bits? (this is one of the things that contributed to python being as painfully slow as it is, so doing it by default would be a huge perf hit for a lot of people)
Darkmatter
Darkmatter5mo ago
Even if people hit 64 bits, add 128 bit ints, that number becomes much smaller. The only people who I could see using 256 and 512 bit are cryptographers.
sb
sb5mo ago
That would be a neat use case to add them for...
Darkmatter
Darkmatter5mo ago
But cryptographers want larger and larger fields over time, so arbitrarily compile-time precision seems like the best option here. For instance, I don’t think I’ve ever seen something that would wrap a UInt4096 even in runaway code.
bowlerman
bowlermanOP5mo ago
It is my understanding that Mojo is not supposed to be reevaluating the semantics of Python, but rather provide ways to easily optimize it. The perf hit can be mitigated by using tagged enums to avoid requiring indirections for small integers.
Darkmatter
Darkmatter5mo ago
Enums mean no SIMD unless the platform has scatter/gather, which would mean tossing out all of the SIMD in Mojo right now for ARM. Neon scatter/gather comes with a big list of issues. And SVE support for Mojo is not a thing.
sb
sb5mo ago
Even using tagged enums, you need affordances for the various container sizes and would stop being about to pass around bare ints in GP registers quite as easily
bowlerman
bowlermanOP5mo ago
That's true. An optimized implementation would want to select integer precision manually, but in that case 64 bits might not always be the right choice either.
Darkmatter
Darkmatter5mo ago
And you are forced to pass on the stack by the SYS-V ABI.
sb
sb5mo ago
(and a lot of optimizations nowadays come from allowing reordering, both at compile time and at runtime) Stack engines nowadays should keep up relatively well, but yeah...that too
ModularBot
ModularBot5mo ago
Congrats @sb, you just advanced to level 1!
bowlerman
bowlermanOP5mo ago
GitHub
[BUG] Mojo object integer representation is not arbitrary precisi...
Bug description def factorial(n): if n == 0 or n == 1: return 1 else: return n * factorial(n-1) print(factorial(20)) # Expect 2432902008176640000 2432902008176640000 print(factorial(21)) # Expect 5...
bowlerman
bowlermanOP5mo ago
Whether the default "integer type" should be arbitrary precision is technically still up for debate, but at least object needs to match the python semantics, so there has to be an arbitrary precision integer there somewhere. It's possible to rely on PythonObject for this, but that means that you need an embedded or external python runtime.
Darkmatter
Darkmatter5mo ago
So you want an IntLiteral -> PythonObject cast?
bowlerman
bowlermanOP5mo ago
I don't think that's good enough for the long term. It seems unnecessary to require the presence of python here. A IntLiteral -> object cast using PythonObject could be a workable temporary solution though.
Darkmatter
Darkmatter5mo ago
Just using object would also work. Anyone using def has already declared they don’t care about perf so they can have BigInt as a default for all I care.
bowlerman
bowlermanOP5mo ago
I'm fine with this. I think that it's better for the ecosystem to settle on 64 bit integers as being the de facto default, though I think that we should require type annotations rather than having a compiler blessed default in fns. I also don't like the Int naming, but not because I think dynamic integers should be called Int. Those can be called DynInt for all I care.
sb
sb5mo ago
I think I agree on that one actually--I'm of the mind people should be explicitly choosing their concrete types, including with explicitly sized scalars (akin to rust's iN or C/++ intN_t)
Darkmatter
Darkmatter5mo ago
Int is the largest size that fits in a register, which I think is important to have.
sb
sb5mo ago
iNative or something? Sounds like isize with extra confusion
bowlerman
bowlermanOP5mo ago
Sure, but I don't think it should be called Int, as though it is the canonical integer type. I do like how Rust does it, but I'm obviously biased as a Rust user.
Darkmatter
Darkmatter5mo ago
This is probably for C/C++ devs, since int is supposed to be register sized until Intel decided it wouldn’t be because of back-compat.
Darin Simmons
Darin Simmons5mo ago
Oh hey, I was just reading up on this completely independently and it came up in the search today...
Darin Simmons
Darin Simmons5mo ago
It's come up in a couple of bugs recently, this one is two bugs in one. There's a bitcasting bug as well as a mojo-python object vs. python-mojo object conversion issue https://github.com/modularml/mojo/issues/3146
GitHub
[BUG] numpy type problem · Issue #3146 · modularml/mojo
Bug description This really surprises me --- the example below shows using "2 * np.pi" and "np.pi * 2" yield different results. Changing the multiplier from 2 to 2.0 and the pro...
Want results from more Discord servers?
Add your server