Firebolt: In-progress implementation of Apache Arrow in Mojo
18 Replies
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format.
u should put test folder out of source folder
The test runner is able to discover the test cases there as well and it has been my preference for python projects.
i mean the test folder, it should not put inside library folder, so people can reduce the size when use
Well, the implementation is not there yet.
Looking forward to seeing all the things go brrrr. I hope that the Apache folks agree with you and make mojo first-class. Props on all the mojo contributions.
Oh yea, the name writes itself, very nice 🙂
One comment: something about PyArrow requiements in the readme or even requirements.txt. Like if I don't use C Data Interface, is PyArrow optional? mandatory?
Entirely optional, it is only used for testing the zero copy exchange interface.
I am a maintainer of apache/arrow. Once mojo gets adopted enough and the arrow impl gets mature enough, the it will make sense to push it upstream. Though that is a long term goal.
@kszucs cool, and interesting! I am curious what you think about this proposal which I opened: https://github.com/modularml/mojo/issues/1515
Because it seems to me that mojo will need to be enhanced to allow zero-copy interactions with arrow formatted data and have any kind of interoperability with the rest of the arrow ecosystem.
GitHub
[Feature Request] memoryview builtin and support for python buffer ...
Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? This enhancement request is to add support for Pyth...
And I also quote from the Arrow documentation in that issue.
I do need to read through it all- it sounds like you may have solved the zero-copy use case.
The python buffer protocol is pretty similar to the arrow c data interface. I think both are really important.
Partially, it only works in one direction for now where Mojo is the consumer because the mojo callbacks cannot be passed to the C side. Also the C layout for the used structs are not guaranteed, but hopefully these are going to be sorted out in mojo soon enough.
@Maxim worked on generating the flatbuffer schema files for arrow in mojo here as part of a different effort for arrow in mojo: https://github.com/mojo-data/arrow-schema
Just wanted to let you know in case it's useful to you. I'm also open to collaborating if you're open to it as well!
That will be required for the IPC format along with mojo json for the integration tests. Yes, ideally we should join efforts.
🙏
Awesome, DMing you to coordinate
hi @Krisztian Szucs if I can make a humble suggestion, would you consider release versions that track the mojo tagged release, ex. v24.4?. This is similar to how https://github.com/endia-org/Endia is doing it for example. It makes it easier to get started as a package user. I am interested in using firebolt to make a geoarrow ( https://geoarrow.org ) integration, and after that, to create a rasterization package that converts geo vector data into mojo Tensors.
Hi @guidorice ! Sorry, I haven't had the time to work on the library lately. Sure, though it uses some features only available in the nightly, so we would need to pin a more recent version.
Also integrating the new magic package manager would be nice.
Hey, I was looking at the codebase yesterday and noticed a comment about Mojo not being able to use functions in CFFI. You can pass mojo functions right to cffi callbacks, you just can’t pass null pointers
You don’t use UnsafePointer, just pass the function in directly and if the ABI matches then it works
Thanks @lukas for the extra context. I can't recall what was the exact issue there, but I am going to take another look.