Firebolt: In-progress implementation of Apache Arrow in Mojo

18 Replies
Krisztian Szucs
Krisztian SzucsOP5mo ago
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format.
Apache Arrow
Apache Arrow
A cross-language development platform for in-memory analytics
GnU So Cute
GnU So Cute5mo ago
u should put test folder out of source folder
Krisztian Szucs
Krisztian SzucsOP5mo ago
The test runner is able to discover the test cases there as well and it has been my preference for python projects.
GnU So Cute
GnU So Cute5mo ago
i mean the test folder, it should not put inside library folder, so people can reduce the size when use
Krisztian Szucs
Krisztian SzucsOP5mo ago
Well, the implementation is not there yet.
Darin Simmons
Darin Simmons5mo ago
Looking forward to seeing all the things go brrrr. I hope that the Apache folks agree with you and make mojo first-class. Props on all the mojo contributions. Oh yea, the name writes itself, very nice 🙂 One comment: something about PyArrow requiements in the readme or even requirements.txt. Like if I don't use C Data Interface, is PyArrow optional? mandatory?
Krisztian Szucs
Krisztian SzucsOP5mo ago
Entirely optional, it is only used for testing the zero copy exchange interface. I am a maintainer of apache/arrow. Once mojo gets adopted enough and the arrow impl gets mature enough, the it will make sense to push it upstream. Though that is a long term goal.
guidorice
guidorice5mo ago
@kszucs cool, and interesting! I am curious what you think about this proposal which I opened: https://github.com/modularml/mojo/issues/1515 Because it seems to me that mojo will need to be enhanced to allow zero-copy interactions with arrow formatted data and have any kind of interoperability with the rest of the arrow ecosystem.
GitHub
[Feature Request] memoryview builtin and support for python buffer ...
Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? This enhancement request is to add support for Pyth...
guidorice
guidorice5mo ago
And I also quote from the Arrow documentation in that issue. I do need to read through it all- it sounds like you may have solved the zero-copy use case.
Krisztian Szucs
Krisztian SzucsOP5mo ago
The python buffer protocol is pretty similar to the arrow c data interface. I think both are really important. Partially, it only works in one direction for now where Mojo is the consumer because the mojo callbacks cannot be passed to the C side. Also the C layout for the used structs are not guaranteed, but hopefully these are going to be sorted out in mojo soon enough.
sa-code
sa-code5mo ago
@Maxim worked on generating the flatbuffer schema files for arrow in mojo here as part of a different effort for arrow in mojo: https://github.com/mojo-data/arrow-schema Just wanted to let you know in case it's useful to you. I'm also open to collaborating if you're open to it as well!
Krisztian Szucs
Krisztian SzucsOP5mo ago
That will be required for the IPC format along with mojo json for the integration tests. Yes, ideally we should join efforts.
Maxim
Maxim5mo ago
🙏
sa-code
sa-code5mo ago
Awesome, DMing you to coordinate
guidorice
guidorice4mo ago
hi @Krisztian Szucs if I can make a humble suggestion, would you consider release versions that track the mojo tagged release, ex. v24.4?. This is similar to how https://github.com/endia-org/Endia is doing it for example. It makes it easier to get started as a package user. I am interested in using firebolt to make a geoarrow ( https://geoarrow.org ) integration, and after that, to create a rasterization package that converts geo vector data into mojo Tensors.
Krisztian Szucs
Krisztian SzucsOP4mo ago
Hi @guidorice ! Sorry, I haven't had the time to work on the library lately. Sure, though it uses some features only available in the nightly, so we would need to pin a more recent version. Also integrating the new magic package manager would be nice.
lukas
lukas3mo ago
Hey, I was looking at the codebase yesterday and noticed a comment about Mojo not being able to use functions in CFFI. You can pass mojo functions right to cffi callbacks, you just can’t pass null pointers You don’t use UnsafePointer, just pass the function in directly and if the ABI matches then it works
Krisztian Szucs
Krisztian SzucsOP2mo ago
Thanks @lukas for the extra context. I can't recall what was the exact issue there, but I am going to take another look.
Want results from more Discord servers?
Add your server