Mohamed Mabrouk Comments - Answer Overflow

Mohamed Mabrouk

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

Then AFAIK this is currently blocked on having basic IO traits that allows a buffered reader/writer to wrap a generic object that implements the Reader? trait for example. we will need first to settle this down.

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

I personally think it should be added to the stdlib with an explicit method as for line in FileHandle().Buffered LineIterator[buf_size]()

83 replies

MModular

•Created by Ivo Balbaert on 1/7/2025 in #questions

Try finally when opening a file

the error would be in opening the file and initializing the file handle, in the case there are no resources to free.

13 replies

MModular

•Created by Ivo Balbaert on 1/7/2025 in #questions

Try finally when opening a file

the try block is scoped, so the initialization is only happening within this scope, otherwise file is still uninialized. you can move the close statement to the try block instead.

13 replies

MModular

•Created by duck_tape on 1/3/2025 in #community-showcase

A Benchmark with Files and Bytes

Here are the corresponding results on intel.

hyperfine --warmup 3 '< big.tsv ./mojo/count_lines' '< big.tsv ./rust/target/release/count_lines'
Benchmark 1: < big.tsv ./mojo/count_lines
  Time (mean ± σ):     893.3 ms ±   8.0 ms    [User: 793.7 ms, System: 101.4 ms]
  Range (min … max):   883.1 ms … 911.5 ms    10 runs
 
Benchmark 2: < big.tsv ./rust/target/release/count_lines
  Time (mean ± σ):      1.012 s ±  0.004 s    [User: 0.897 s, System: 0.116 s]
  Range (min … max):    1.004 s …  1.020 s    10 runs

  < big.tsv ./mojo/count_lines ran
    1.13 ± 0.01 times faster than < big.tsv ./rust/target/release/count_lines

hyperfine --warmup 3 '< big.tsv ./mojo/count_lines' '< big.tsv ./rust/target/release/count_lines'
Benchmark 1: < big.tsv ./mojo/count_lines
  Time (mean ± σ):     893.3 ms ±   8.0 ms    [User: 793.7 ms, System: 101.4 ms]
  Range (min … max):   883.1 ms … 911.5 ms    10 runs
 
Benchmark 2: < big.tsv ./rust/target/release/count_lines
  Time (mean ± σ):      1.012 s ±  0.004 s    [User: 0.897 s, System: 0.116 s]
  Range (min … max):    1.004 s …  1.020 s    10 runs

  < big.tsv ./mojo/count_lines ran
    1.13 ± 0.01 times faster than < big.tsv ./rust/target/release/count_lines

27 replies

MModular

•Created by duck_tape on 1/3/2025 in #community-showcase

A Benchmark with Files and Bytes

yes, I used the latest extramojo, and used the script in the records directory. I can't say if it's rust or the memchr package specifically that they are using subpar algorithm for Arm. based on my previous experiments, I can match the performance or memchr on x86 by careful optimization, removing unnecessary function calls and allocations, and optimizing the buffer size. I can try to see if using the rolled-up buffered reader that I have can make a difference.

27 replies

MModular

•Created by duck_tape on 1/3/2025 in #community-showcase

A Benchmark with Files and Bytes

Here the hyperfine results after fixing a small import bug in the rust script Rust 1.83, Mojo 26.6.

Benchmark 1: < big.tsv ./mojo/count_lines
  Time (mean ± σ):      1.216 s ±  0.007 s    [User: 1.123 s, System: 0.095 s]
  Range (min … max):    1.209 s …  1.225 s    10 runs
 
Benchmark 2: < big.tsv ./rust/target/release/count_lines
  Time (mean ± σ):      1.006 s ±  0.008 s    [User: 0.901 s, System: 0.105 s]
  Range (min … max):    0.997 s …  1.016 s    10 runs
 
Summary
  < big.tsv ./rust/target/release/count_lines ran
    1.21 ± 0.01 times faster than < big.tsv ./mojo/count_lines

Benchmark 1: < big.tsv ./mojo/count_lines
  Time (mean ± σ):      1.216 s ±  0.007 s    [User: 1.123 s, System: 0.095 s]
  Range (min … max):    1.209 s …  1.225 s    10 runs
 
Benchmark 2: < big.tsv ./rust/target/release/count_lines
  Time (mean ± σ):      1.006 s ±  0.008 s    [User: 0.901 s, System: 0.105 s]
  Range (min … max):    0.997 s …  1.016 s    10 runs
 
Summary
  < big.tsv ./rust/target/release/count_lines ran
    1.21 ± 0.01 times faster than < big.tsv ./mojo/count_lines

as expected the rust implementation is more performant on Intel-X86 than on ARM-Mac and now beats mojo.

27 replies

MModular

•Created by duck_tape on 1/3/2025 in #community-showcase

A Benchmark with Files and Bytes

I can give the code a try on an Intel machine, and I can report the results here

27 replies

MModular

•Created by duck_tape on 1/3/2025 in #community-showcase

A Benchmark with Files and Bytes

the memchr crate uses SIMD already and goes to great length to provide vectorized arch-specific implementations. I am not sure if there is a better solution in the rust ecosystem.

27 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

I will probably keep experimenting for a bit to see how this would work. one obvious advantage is that the lifetime of the references to the mapped file would be valid as long as the file is still mapped, that would simplify multi-threads that can be in different stages of a multi-step pipeline

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

it's user-specific most people won't use 50G files on windows, I am just cloning a cross-platform industry standard tool in the field to Mojo and I want to have a good cross-platform sequential IO as the first step, I would like to ideally support multi threading at somepoint and the mmap is much more ergonomic than buffered reader on linux but it seems less predictable and portable on Mac and Windows

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

would the huge page config differ between MAC and Linux or it's POSIX thing? also how portable would this setup be to windows?

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

that would be great, the cost of page faults becomes non-trivial with increasing file size.

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

in my experiments, I am doing zero-copy buffered line iterator at around 5G/s from NVME disk, I will play around with mmap to see how it would work and if I can extract a bit more performance from this strategy (at least avoiding the syscall overhead).

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

It would be great if you can pass around some example code, I tried this in early mojo (b4 0.6) and I would like to circle back to it again now

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

I got so many segfaults when I tried to use mmap API through early mojo ffi that kept me away 😅

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

In general, I think the buffered IO is a nice interface to have in addition to the more fancy IO strategies, coming from other langs and domains, this one of the tools that you intuitively reach for in the beginning.

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

I think for some use cases reading the whole file at once would not work, I work regularly with file sizes of 10s-100s GB per file, and it may not be feasible for most systems to even have memory of this size. mmap could be used in this case, but I haven't tried it yet. Also in my early experiments I found that doing buffered IO and parsing with a smaller buffer (64 KB) is as efficient or even slightly more efficient than reading the whole file at once (at least on my hardware) for file sizes around 1GB. For the zero-copy iterator, I currently have a version which returns byte spans instead for doing additional memory allocation, this minimizes the memalloc overhead at least. the main problem with this approach is the associated lifetime of the span and who it should be invalidated upon the buffer refill.

83 replies

MModular

•Created by duck_tape on 12/10/2024 in #community-showcase

ExtraMojo

It is nice that you found the buffered line reader useful. I originally hand-rolled it for my needs and I made several modification in it over the time, it is also not straightforward implementation as it returns tesnor of bytes and not strings (it was started in the pre-list era). I was planning on splitting a more complicated version of the buffered reader (and a companion buffered writer) as it's own package, if you are interested in the buffered IO part, we can work on a more pythonic implementation, which if useful can find its way eventually to stdlib (python do buffered reads by default in some cases).

83 replies

Gaming

Programming