Would you be interested in BigInt support?
I did some programming exercises in Mojo and noticed that BigInt support is missing, and Mojo should probably support BigInt at some point, seeing as Python uses arbitrary precision integers by default.
Are there any other users missing this?
Would it make sense for me to start working on this as a Mojo stdlib PR or is this the sort of thing Mojo would rather have as a 3rd party library for now?
42 Replies
Can you elaborate on what you mean by BigInt? The Int type in Mojo is arbitary precision if I am interpreting your question correctly, are you refering to a BigInt DType?
Int
is currently not arbitrary precision. It's a pointer width integer (32 or 64 bits). IntLiteral
is arbitrary precision, but cannot be used at runtime.One of the cool things about Mojo is you can literally just write a struct with some inline MLIR and make your own BigInt that works exactly the way an official one would. Not saying there should never be a stdlib version but I doubt it's going to be a major blocker to anyone for this reason
Of course it's not going to be a blocker. You can probably write an efficient one without inline MLIR as well, but any implementation will take a bit of time.
If there was a way to make a package easily usable by the ecosystem I would just do that instead, but as things are I'm significantly less motivated to build a third party library for something like this than make a PR to the stdlib. Either way I'd like to know which approach makes more sense for Modular.
I would like it because rust interop is hard without 128 bit ints, and small width ints are the easiest way to support binding to C bitfields.
Ideally, this can ride on top of the _BitInt(N) and unsigned _BitInt(N) support already in LLVM due to C23.
what would be really cool is Int[Width] / Float[Width] / UInt[Width]
Is there any reason to do bigint as fixed length types instead of using const generics? Truly arbitrary bit widths would give a bit more freedom for transformations later that don't respect high bits
Float should ideally be separate sign/exponent/mantissa parts.
nice idea actually, might try and implement later
That's something different from what I'm talking about. You only need a family of types with increasing precision. I'm talking about a single type that can store values of arbitrary precision (perhaps "unbounded dynamic precision" would be a better term). Python doesn't require you to select the precision before running the code.
Runtime dynamic precision ints are messy. They start to get very slow because they’re very difficult to make fast, if you allow choosing a “large enough” fixed-width integer, you can get better code gen. IIRC, java’s BitInt spends most of its time branching rather than actually doing number crunching.
It's sounding (to me) more like you want a library for symbolic math, maybe with some automatic simplification
Sure, you can lose a lot of time to cache misses, and selecting algorithms based on precision also costs a bit. It seems like the slowness in Java specifically is not inherent though: https://codeforces.com/blog/entry/17235
Still, because python uses runtime dynamic precision ints by default Mojo should probably do so as well for compatibility and flexibility. Deciding on an appropriate precision is an optimization that can be done when desired.
Outside of scientific code, how many people are actually dealing with numbers over 2^63-1? Defaulting int to 64 bit is probably enough for most people and brings massive perf gains.
Not at all. I want an integer type that matches the semantics of the python integer type. I think in python it makes sense mostly for flexibility reasons.
Having an “int” type that’s a bigint is fine, but mojo currently defines Int as essentially ptrdiff_t.
We can use the capitalization to make the distinction.
Ideally, you could implicitly convert to any of the integer types from
IntLiteral
, which means that doing the optimization would amount to adding a type annotation, which I think fits very nicely with Mojo's philosophy.I want to make sure the big perf hit is explicitly opt-in.
How often do people get bitten by ints "only" being 64 bits?
(this is one of the things that contributed to python being as painfully slow as it is, so doing it by default would be a huge perf hit for a lot of people)
Even if people hit 64 bits, add 128 bit ints, that number becomes much smaller.
The only people who I could see using 256 and 512 bit are cryptographers.
That would be a neat use case to add them for...
But cryptographers want larger and larger fields over time, so arbitrarily compile-time precision seems like the best option here.
For instance, I don’t think I’ve ever seen something that would wrap a UInt4096 even in runaway code.
It is my understanding that Mojo is not supposed to be reevaluating the semantics of Python, but rather provide ways to easily optimize it.
The perf hit can be mitigated by using tagged enums to avoid requiring indirections for small integers.
Enums mean no SIMD unless the platform has scatter/gather, which would mean tossing out all of the SIMD in Mojo right now for ARM.
Neon scatter/gather comes with a big list of issues.
And SVE support for Mojo is not a thing.
Even using tagged enums, you need affordances for the various container sizes and would stop being about to pass around bare ints in GP registers quite as easily
That's true. An optimized implementation would want to select integer precision manually, but in that case 64 bits might not always be the right choice either.
And you are forced to pass on the stack by the SYS-V ABI.
(and a lot of optimizations nowadays come from allowing reordering, both at compile time and at runtime)
Stack engines nowadays should keep up relatively well, but yeah...that too
Congrats @sb, you just advanced to level 1!
GitHub
[BUG] Mojo
object
integer representation is not arbitrary precisi...Bug description def factorial(n): if n == 0 or n == 1: return 1 else: return n * factorial(n-1) print(factorial(20)) # Expect 2432902008176640000 2432902008176640000 print(factorial(21)) # Expect 5...
Whether the default "integer type" should be arbitrary precision is technically still up for debate, but at least
object
needs to match the python semantics, so there has to be an arbitrary precision integer there somewhere. It's possible to rely on PythonObject
for this, but that means that you need an embedded or external python runtime.So you want an IntLiteral -> PythonObject cast?
I don't think that's good enough for the long term. It seems unnecessary to require the presence of python here.
A
IntLiteral
-> object
cast using PythonObject
could be a workable temporary solution though.Just using object would also work. Anyone using def has already declared they don’t care about perf so they can have BigInt as a default for all I care.
I'm fine with this. I think that it's better for the ecosystem to settle on 64 bit integers as being the de facto default, though I think that we should require type annotations rather than having a compiler blessed default in
fn
s. I also don't like the Int
naming, but not because I think dynamic integers should be called Int
. Those can be called DynInt
for all I care.I think I agree on that one actually--I'm of the mind people should be explicitly choosing their concrete types, including with explicitly sized scalars
(akin to rust's iN or C/++ intN_t)
Int is the largest size that fits in a register, which I think is important to have.
iNative or something? Sounds like isize with extra confusion
Sure, but I don't think it should be called
Int
, as though it is the canonical integer type.
I do like how Rust does it, but I'm obviously biased as a Rust user.This is probably for C/C++ devs, since int is supposed to be register sized until Intel decided it wouldn’t be because of back-compat.
Oh hey, I was just reading up on this completely independently and it came up in the search today...
It's come up in a couple of bugs recently, this one is two bugs in one. There's a bitcasting bug as well as a mojo-python object vs. python-mojo object conversion issue https://github.com/modularml/mojo/issues/3146
GitHub
[BUG] numpy type problem · Issue #3146 · modularml/mojo
Bug description This really surprises me --- the example below shows using "2 * np.pi" and "np.pi * 2" yield different results. Changing the multiplier from 2 to 2.0 and the pro...