Arena Allocated Coroutines
I was watching the Efficient Coroutine Implementation in MLIR, and it seems like there isn't any room in that design to support arena allocating the frames, nor any place for handling the allocation of a coroutine frame failing. This is somewhat concerning to me because while being able to move to stack allocations is nice, being able to grab a right-sized allocation from an arena allocator is nicer, especially in the context of ensuring you have enough memory for the coroutine. For frequently allocated coroutines (consider the
handle_request
top-level function of an HTTP server), this means that instead of going through all of the machinery in tcmalloc you may be performing a dequeue operation on a ring buffer of free frames, substantially faster. Would it be possible to have the coroutine take an alloc: Allocator[CoroutineFrameType] = DefaultMojoAllocator
parameter in some way or otherwise inject an allocator into the coroutine? I'm still thinking over how I would want custom allocators to behave, but I know that this a feature I and others will want.
As for my specialty of databases, not being able to handle allocation failures (because the database is likely the largest memory consumer on any system it is on and typically has a lot of caching, so it can actually do something about allocation failures), means that you can't use the feature in production code because it could lead to unnecessary crashes.LLVM
YouTube
2024 LLVM Dev Mtg - Efficient Coroutine Implementation in MLIR
2024 LLVM Developers' Meeting
https://llvm.org/devmtg/2024-10/
------
Efficient Coroutine Implementation in MLIR
Speaker: Steffi Stumpos
------
Slides: https://llvm.org/devmtg/2024-10/slides/techtalk/Stumpos-EfficientCoroutineImplementatio-inMLIR.pdf
-----
Because of the growing need to offload compute to GPUs and other types of customized hardw...
0 Replies