M
Modular6mo ago
Ethan

Random123: Splittable pseudorandom number generators

I made a package similar to Jax.random.https://github.com/YichengDWu/random123
GitHub
GitHub - YichengDWu/random123: Splittable pseudorandom number gener...
Splittable pseudorandom number generators. Contribute to YichengDWu/random123 development by creating an account on GitHub.
13 Replies
Martin Dudek
Martin Dudek6mo ago
Very interesting. Do you have any concrete applications in mind for a random as pure function in Mojo? I just started to learn about JAX, so interestingly different to Pytorch and TF.
Ethan
EthanOP6mo ago
It's important for reproducibility in the async world. Jax's docs explaint it pretty well so please refer to it.
sora
sora6mo ago
I noticed that you used the following pattern in several place places:
fn ...:
var bits = bits[DType.uint64](key, num)
var keys = List[PRNGKey](capacity=num) # (*)
keys.resize(num)
keys.data = bits.steal_data().bitcast[PRNGKey]()
return keys
fn ...:
var bits = bits[DType.uint64](key, num)
var keys = List[PRNGKey](capacity=num) # (*)
keys.resize(num)
keys.data = bits.steal_data().bitcast[PRNGKey]()
return keys
which causes unnecessary allocation on line (*). resize could also allocate, not in this case though. Could have written
var bs = bits[DType.uint64](key, num)
return List(
unsafe_pointer=bs.steal_data().bitcast[PRNGKey](),
size=num,
capacity=num,
)
var bs = bits[DType.uint64](key, num)
return List(
unsafe_pointer=bs.steal_data().bitcast[PRNGKey](),
size=num,
capacity=num,
)
Ethan
EthanOP6mo ago
Yeah I tested it and it showed no impact on the performance. Allocation seems to be very cheap without actual copy.
sora
sora6mo ago
I think your way of writing can cause memory leak the keys.data allocated by List[PRNGKey](capacity=num) is never freed.
Ethan
EthanOP6mo ago
Good point. I will fix it. Thank you for pointing it out!
tkeitt
tkeitt6mo ago
I don't know if the use cases are similar. I ported the xoshiro256 prngs to mojo. You can jump the streams to produce multiple independent sequences. These can be computed in parallel using SIMD. https://github.com/Mojo-Numerics-and-Algorithms-group/MojoSci/blob/main/src/stochasticity/xoshiro.mojo
GitHub
MojoSci/src/stochasticity/xoshiro.mojo at main · Mojo-Numerics-and-...
Numerics for Mojo. Contribute to Mojo-Numerics-and-Algorithms-group/MojoSci development by creating an account on GitHub.
Ethan
EthanOP6mo ago
It's different. Xoshiro256 is not splittable. You still need to explicitly keep tracking of each PRNG instance and and its state to ensure they remain independent. Splittable PRNGs are more scalable and flexible. Suppose you have a function f that calls g1 and g2, g1 then calls h1 and h2. g1 somehow needs to know the existence of g2. So you still need a global state that keeps track of everything.
tkeitt
tkeitt6mo ago
That does sound different. I've used xoshoro et al. in large MPI applications where each process gets its own prng, all seeded from a common global seed, but each process jumps according to its MPI rank, thus providing reproducible results with independent streams running in parallel.
ModularBot
ModularBot6mo ago
Congrats @tkeitt, you just advanced to level 3!
Ethan
EthanOP6mo ago
Yeah it's in general all good. But you need to know the number of prngs before lauching it. In some cases you might not know how many is needed.
tkeitt
tkeitt6mo ago
Interesting. Splittable sounds ideal for a lot of uses. Do you generate a deterministic sequence of keys from an initial seed in each code block? I ran across this: https://github.com/ElsevierSoftwareX/SOFTX-D-23-00704
GitHub
GitHub - ElsevierSoftwareX/SOFTX-D-23-00704: Reproducible random nu...
Reproducible random number generation for parallel computations - ElsevierSoftwareX/SOFTX-D-23-00704
Ethan
EthanOP6mo ago
Yes. More specifically each key splits before use.
for i in range(num_steps):
newkey, subkey = random123.split(key)
f(subkey)
key = newkey
for i in range(num_steps):
newkey, subkey = random123.split(key)
f(subkey)
key = newkey
I was under the impression that all counter-based PRNGs can be splittable.
Want results from more Discord servers?
Add your server