io_uring

Hi all, I just published my implementation of the io_uring library in pure Mojo: https://github.com/dmitry-salin/io_uring The design is similar to https://github.com/bytecodealliance/rustix Currently there is only a linux_raw backend, which could potentially be used to create binaries that do not depend on libc. The library is at an early stage and lacks documentation, many operations and tests. But the basic functionality is implemented.
86 Replies
toasty
toasty5mo ago
@Darkmatter I've seen you reference io_uring a few times, this might be of interest for you!
Darkmatter
Darkmatter5mo ago
I’ve seen it but was waiting to collect my thoughts. Jens Axboe (io_uring maintainer) recommends going through liburing because of how tricky the memory ordering is. I’m not sure this is fully sound on ARM due to the weaker memory model there.
ModularBot
ModularBot5mo ago
Congrats @Darkmatter, you just advanced to level 12!
Darkmatter
Darkmatter5mo ago
I was waiting for C interop to use that method to make my own set of bindings that are a bit higher level.
Dmitry Salin
Dmitry SalinOP5mo ago
Logically the implementation mostly matches liburing, for instruction level matching we need this - https://github.com/modularml/mojo/issues/3162 I have a similar library implemented in Rust and it doesn't cause any problems for my workloads. Most Rust and Zig projects I've seen use native implementations rather than bindings. I profiled with valgrind and the native implementation was better.
GitHub
[Feature Request] Add basic atomic instructions: atomic_load, `at...
Review Mojo's priorities I have read the roadmap and priorities and I believe this request falls within the priorities. What is your request? There is already __mlir_op.pop.load[alignment](addr...
Darkmatter
Darkmatter5mo ago
The lack of fences was what I was commenting on. ARM’s new “big machine” extensions do fun things on multi-socket systems if you don’t fence properly and have a queue that goes across NUMA domains. You might be able to guess why they’re on my mind 😭 I ran into this issue with Rustix on an ARM server last week. One hell of a debugging session. It’s probably fine on x86, ARM v8 and single-socket ARM v9. Which covers most people.
Dmitry Salin
Dmitry SalinOP5mo ago
If I understand correctly, Rust relies on LLVM for all atomic instructions. Then there should be problems not only with the io_uring. Explicit fence is required for SQPOLL mode, which is a pretty specialized thing. Jens always said that you have to think before you use it.
Darkmatter
Darkmatter5mo ago
SQPOLL has its share of footguns, but allowing the user to avoid syscalls entirely is very powerful. I tend to use SQPOLL in most of what I write with io_uring because it makes benchmark numbers look good. It’s a bit harder in less controlled environments, but for managed services it’s also very useful.
Dmitry Salin
Dmitry SalinOP5mo ago
My Rust library has fence in the same place as liburing, but I haven't tested it on ARM.
Darkmatter
Darkmatter5mo ago
I’ll double check, phone code review may have failed me.
Dmitry Salin
Dmitry SalinOP5mo ago
GitHub
Fix memory ordering in sq_ring_needs_enter · axboe/liburing@744f415
A full memory barrier is required between the store to the SQ tail in __io_uring_flush_sq and the load of the flags in sq_ring_needs_enter to prevent a situation where the kernel thread goes to sle...
Darkmatter
Darkmatter5mo ago
Yes, that fence. I'd say until you can put it there for Mojo, I'd comment out the SQPOLL flag and leave a comment linking to the PR and explain that until you can add fences SQPOLL is unsound. x86 it might happen to work because of how the ring is designed. But, you're at the mercy of the cache coherence algorithm deciding it has nothing better to do except sync that cache line.
Dmitry Salin
Dmitry SalinOP5mo ago
Yes, that make sense. I think I will add constrained preventing compilation in SQPOLL mode
Darkmatter
Darkmatter5mo ago
I think there will need to be later work on a high-level API on top of io_uring for general use, since we need something that expresses the ownership transfer of the buffers properly. io_uring as a primary io API is something I want to push for, but it will mean convincing people to break from the traditional io model of "you provide the buffer" for everything.
Dmitry Salin
Dmitry SalinOP5mo ago
Yeah, that is the hardest part. One example of higher level abstraction - https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue
A Programmer-Friendly I/O Abstraction Over io_uring and kqueue
The financial transactions database to power the next 30 years of Online Transaction Processing.
Dmitry Salin
Dmitry SalinOP5mo ago
Personally I only use Linux and use the event loop without async, it's quite unusual.
Darkmatter
Darkmatter5mo ago
But we also want safe which makes it harder Programmer friendly is also a relative term That model is systems programmer friendly not "half our community primarily uses python" friendly. They want .for_each_line(lambda: ...).await So, we need something that can easily support things like that while letting people who care drop down the layers of abstraction.
Dmitry Salin
Dmitry SalinOP5mo ago
I like the idea of ​​efficient low-level building blocks that can be combined to create different solutions.
Darkmatter
Darkmatter5mo ago
Hopefully we would be able to build up something like Rust's streams. Since those are actually very efficient.
Dmitry Salin
Dmitry SalinOP4mo ago
I added sockets and basic network operations. Reduced example:
var ring = IoUring[](sq_entries=8)

var fd = socket(AddressFamily.INET, SocketType.STREAM)
bind(fd, SocketAddressV4(0, 0, 0, 0, port=1111))
listen(fd, backlog=64)

for sqe in ring.sq():
_ = Accept(sqe, fd)
break

_ = ring.submit_and_wait(wait_nr=1)
# Obtain a file descriptor for the new connection
# from the completion queue.
_ = fd^
var ring = IoUring[](sq_entries=8)

var fd = socket(AddressFamily.INET, SocketType.STREAM)
bind(fd, SocketAddressV4(0, 0, 0, 0, port=1111))
listen(fd, backlog=64)

for sqe in ring.sq():
_ = Accept(sqe, fd)
break

_ = ring.submit_and_wait(wait_nr=1)
# Obtain a file descriptor for the new connection
# from the completion queue.
_ = fd^
ModularBot
ModularBot4mo ago
Congrats @Dmitry Salin, you just advanced to level 4!
Darkmatter
Darkmatter4mo ago
Is there a reason you didn’t include the protocol field in socket(2)? Overall it looks like a good start. I’ll write an echo app and run it on hardware where I know what the perf should be so we can start teasing out any inefficiencies.
Dmitry Salin
Dmitry SalinOP4mo ago
socket function has 3 overloads, one of it also has SocketFlags https://github.com/dmitry-salin/io_uring/blob/main/mojix/net/socket.mojo What's missing is setsockopt. That's the next thing to add.
Darkmatter
Darkmatter4mo ago
Thank you! I might be the only person to use that for a bit, but I’ll make heavy use of it. I wish you luck making a safe interface to that.
zxenok
zxenok4mo ago
This is excellent. I was trying to learn about this stuff on my own and wanted to stay in mojo/python. I was looking at trying out https://unixism.net/loti/tutorial/webserver_liburing.html to learn. I was missing how to do the syscall in mojo:
from sys._assembly import inlined_assembly
from sys.intrinsics import _mlirtype_is_eq
from sys._assembly import inlined_assembly
from sys.intrinsics import _mlirtype_is_eq
Where do I find this and read more about it? I don't see it in the docs? Thanks a bunch I can't wait to play around with this.
Martin Vuyk
Martin Vuyk4mo ago
Hi @Dmitry Salin , before you get any deeper into your socket interface building. Would you mind trying to build something with the interface and types I layout here? I'm trying to set it up so that anyone can plug in their implementation of the (hopefully) "stdlib" socket interface into the socket struct. Any feedback would be amazing, rn I haven't gotten around making it flexible enough for real injection and haven't implemented any sync TCP but I have the piping there. io_uring would be something amazing to add support to but might not be wanted by the stdlib team. But I'd like it to be possible to integrate that and other edge case platforms while removing the burden of developing and maintaining from the core team 😄
Dmitry Salin
Dmitry SalinOP4mo ago
Hi, my library is low level and will generally follow the rustix design. I think it's too early for higher level abstractions because the trait system is not mature enough yet. As a result, generic types and io_uring specific types are tightly coupled. As for higher level abstractions, the main problem is that there is no feature parity between APIs. For example, io_uring is one of the most advanced, it has registered file descriptors and buffers, multi-shot operations. Other APIs will have no-op implementations for these functions, and on the other hand, they themselves may have some specific features.
Darkmatter
Darkmatter4mo ago
One issue we may run in to is that the recommended way of using io_uring and the only way to use sockets are in contention, in that a recv from io_uring should give you a buffer id to use but sockets make you supply a buffer. I think that leaving sockets as the unix sockets API and creating another io abstraction on top once we have the basic APIs (socket, io_uring, kqueue) in place would be better. What we probably want is a “raw” interface which sits directly on top of the syscalls and then one which uses coroutines to tie it all together as a unified API.
Martin Vuyk
Martin Vuyk4mo ago
The idea would be for you to implement it however you want underneath but provide a BSD-ish (preferably async) socket interface (or we can setup any other generic enough contract as we develop further) for the socket struct to interact with your implementation. We could also develop several "low level socket standards" e.g. one async (kqueue, io_uring, iocp) and another sync yes that kind of problem is the things I think we need to get sorted out at the "overarching API" level if we want this to be generic enough that raw interface in this case would be the io_uring impl and the one that ties all impls together should be the socket struct. If I'm reading this correctly, then yes this is exactly what I'm aiming at
Dmitry Salin
Dmitry SalinOP4mo ago
I don't mind, such an interface is certainly necessary and useful. But there are many things to consider. struct Socket[ sock_family: SockFamily = SockFamily.AF_INET, sock_type: SockType = SockType.SOCK_STREAM, sock_protocol: SockProtocol = SockProtocol.TCP, sock_platform: SockPlatform = _get_current_platform(), ]: ... This struct has over 750 parameter combinations, and even taking into account that not all of them are valid, it's a generic bloat. Restricting and checking combinations at compile time is difficult, and the kernel will do it at runtime anyway. Arc is an extra runtime overhead in this case, if someone needs it, they can easily make such a wrapper. In my opinion, it's better to look at modern systems languages ​​such as Rust and Zig. https://doc.rust-lang.org/src/std/net/tcp.rs.html#50 https://github.com/ziglang/zig/blob/master/lib/std/net.zig#L1788 What they have in common is that both use a zero-cost wrapper for the socket file descriptor. And both are unaware of io_uring registered file descriptors.
Darkmatter
Darkmatter4mo ago
I think having a “nicer” layer on top of socket(2) and associated operations is fine looking like this, but that the synchronous network io (because sometimes you just don’t need async or setting up async isn’t worth it) should be TcpSocket, UDPSocket, SctpSocket and possibly QuicSocket (I’ll probably yank cloudflare’s implementation to write that since it’s Rust which exposes a C interface and it’s sans io. I think that many people will actually like sctp once it’s introduced to them.
Dmitry Salin
Dmitry SalinOP4mo ago
Synchronous variant can still be used with io_uring based event loop. For this I had to reimplement Rust's TcpStream since it is useless with a registered socket file descriptor. In this case Mojo can try to be better. For async it is more complicated, and in many cases the runtime has its own implementations of these wrappers.
Darkmatter
Darkmatter4mo ago
What I’m hoping for is a cross-platform runtime which lets you use separate codepaths to leverage the strengths of each implementation. For instance, io_uring gets to use registered buffers, kqueue will still use register buffers and use them to unify the API, it will just recv into those buffers and then pass it up to the caller. Ideally we should start with io_uring and then bend kqueue to make it fit the good ways to do things in io_uring. This lets it function on MacOS and BSD but Linux, where 99% of people will deploy, gets the best performance. Then, if you REALLY care, you can special case by OS to make slightly better use of kqueue or iocp. My thought is that the sync variant is for programs which don’t want to set up io_uring or kqueue and can eat a giant performance penalty due to the impedance mismatch because they’re just fetching a config file or something like that.
Dmitry Salin
Dmitry SalinOP4mo ago
Wrapper can be generic over file descriptor trait with the regular owned file descriptor as the default parameter. This way the user doesn't need to know about io_uring if he doesn't want to.
Darkmatter
Darkmatter4mo ago
I think that’s a good path. Ideally we want registered Fds, but for some programs (like a find or a recursive grep) registering all files makes no sense. I think we may want to make the abstraction shown to the user be a bit higher level, like an AsyncFile or an AsyncSocket type.
Martin Vuyk
Martin Vuyk4mo ago
I'll read a bit of those links you sent
And both are unaware of io_uring registered file descriptors.
The idea for the Arc[FileDescriptor] is to have a way to duplicate the socket from that FD, I don't know exactly how one would do that with io_uring or any other implementation for that matter. It was just an adaptation I thought of after reading Python's stdlib code.
This struct has over 750 parameter combinations, and even taking into account that not all of them are valid, it's a generic bloat.
The main goal at the API level is to be generic and each implementation to be able to implement what it says it implements, the constraining is something that as Mojo evolves I think will get better tools
the synchronous network io (because sometimes you just don’t need async or setting up async isn’t worth it) should be TcpSocket, UDPSocket, SctpSocket and possibly QuicSocket ... I think we may want to make the abstraction shown to the user be a bit higher level, like an AsyncFile or an AsyncSocket type.
I'd really really rather not separate into an async and a sync API, since AFAIK Mojo allow async code to be quite easilly awaited upon in sync contexts. And we could just use pseudo async API for something that is sync and I see no problem really. As long as we can really figure out a way to abstract all of their interactions and have the implementations be able to be plugged in into the Socket struct
ModularBot
ModularBot4mo ago
Congrats @Martin Vuyk, you just advanced to level 4!
Martin Vuyk
Martin Vuyk4mo ago
We could however add some new APIs on top of that that are documented to be completion model based etc. But my main goal is to get some basic BSD-ish api to bind, listen, accept connections, send and recieve buffers that remains exactly the same regardless of platform, protocol, etc.
Darkmatter
Darkmatter4mo ago
The issue is that the POSIX API 2xes the required memory bandwidth to do TCP. That's not an insignificant tax.
Dmitry Salin
Dmitry SalinOP4mo ago
You do not need Arc to duplicate socket, it's an unnecessary runtime overhead. https://doc.rust-lang.org/src/std/net/tcp.rs.html#237
Martin Vuyk
Martin Vuyk4mo ago
Could you give me a link? I'd like to read more on that. Even then, default config for maximum compatibility does not need to be maximum performant. We can provide ways to customize and take advantage of completions models later on The only problem is that how do you get the destructor to know not to close the socket because another one is using it after it got duplicated ?
Dmitry Salin
Dmitry SalinOP4mo ago
It duplicates into independent owned object. So there is no problem with close in it's destructor.
Martin Vuyk
Martin Vuyk4mo ago
/// After creating a TcpListener by [bind]ing it to a socket address, it listens /// for incoming TCP connections. These can be accepted by calling [accept] or by /// iterating over the [Incoming] iterator returned by [incoming][TcpListener::incoming].
from the bottom of the Readme What this all will allow is to build higher level pythonic syntax to do servers for any protocol and inject whatever implementation for any platform specific use case that the user does not find in the stdlib but exists in an external library. Examples: from forge_tools.socket import Socket async def main(): with Socket.create_server(("0.0.0.0", 8000)) as server: while True: conn, addr = await server.accept() ... # handle new connection # TODO: once we have async generators: # async for conn, addr in server: # ... # handle new connection In the future something like this should be possible: from multiprocessing import Pool from forge_tools.socket import Socket, IPv4Addr async fn handler(conn: Socket, addr: IPv4Addr): ... async def main(): with Socket.createserver(("0.0.0.0", 8000)) as server: with Pool() as pool: = await pool.starmap(handler, server)
Darkmatter
Darkmatter4mo ago
Do all platforms actually properly refcount that on the kernel side?
Martin Vuyk
Martin Vuyk4mo ago
I don't know the underlying implementations for socket, but does that mean that the OS opens another socket when you duplicate it? I got that text from the tcplistener in rust that you sent
Darkmatter
Darkmatter4mo ago
It means that the OS reference counts for you.
Martin Vuyk
Martin Vuyk4mo ago
ooohhhh, nice and all OS offer that functionality?
Dmitry Salin
Dmitry SalinOP4mo ago
The main goal at the API level is to be generic and each implementation to be able to implement what it says it implements, the constraining is something that as Mojo evolves I think will get better tools
It is not Mojo specific. Just read about generic code bloat. In this particular case it doesn't add any value, it just increases compilation time and binary size.
Darkmatter
Darkmatter4mo ago
Rust uses dup(2) to do it, so that would work on anything POSIX-2008 complaint, and windows also has it. Via sys::c::WSADuplicateSocketW So it's probably fine.
Martin Vuyk
Martin Vuyk4mo ago
Mojo has compile time parameters for a reason, and theoretically you get the compilation to include only what is used
Darkmatter
Darkmatter4mo ago
Be careful, that way lies madness without a much stronger type system than Mojo is likely going to have.
Martin Vuyk
Martin Vuyk4mo ago
btw. the code I'm looking at https://doc.rust-lang.org/src/std/net/tcp.rs.html is quite literally the API I mean that a socket implementation could have, we just build socket one level above that
tcp.rs - source
Source of the Rust file library/std/src/net/tcp.rs.
Dmitry Salin
Dmitry SalinOP4mo ago
And we could just use pseudo async API for something that is sync and I see no problem really.
I think the user creating the event loop shouldn't have to pay for asynchrony because they simply don't need it.
Martin Vuyk
Martin Vuyk4mo ago
I'm mostly thinking of building overloads with conditional conformance for the self and anything to do with SockAddr in a way in that it doesn't drive anyone crazy (hopefully lol) I'm still not sure how async in Mojo works (monday is the presentation yay), but I think If you return a Coroutine and await it then it doesn't create an async loop (?) if the function where you await it is not async (??) you can see the valid address types I'm building https://github.com/martinvuyk/forge-tools/blob/main/src/forge_tools/socket/address.mojo
Dmitry Salin
Dmitry SalinOP4mo ago
It's not matter. There is no need to duplicate all the code for parameters that will not be used at runtime.
Darkmatter
Darkmatter4mo ago
What I mean is that recv makes no sense for UDP, meaning that large chunks of code make no sense for a UDP socket. If you add on SCTP, that can do things neither TCP nor UDP can do like move the peer to a new IP and port without breaking the connection. Does python do "fire and forget" async or does it require you to actually await coroutines and you make progess on them by awaiting them?
Martin Vuyk
Martin Vuyk4mo ago
no idea if they're eager or lazy
Darkmatter
Darkmatter4mo ago
"fire and forget" means there's a bunch of hidden magic happening I'm not a fan of if you can just await without setting up a loop.
Dmitry Salin
Dmitry SalinOP4mo ago
It will have some runtime overhead
Darkmatter
Darkmatter4mo ago
Lazy is my preference since it makes implementing an executor much easier and is more performant because you can take over the existing thread.
Martin Vuyk
Martin Vuyk4mo ago
recv makes no sense for UDP,
why wouldn't an async recv that yields the byte buffers as they arrive be useful ?
Darkmatter
Darkmatter4mo ago
That's called recvmsg.
Martin Vuyk
Martin Vuyk4mo ago
ok, but as I said we can provide a "use this unified API for ez mode, get into customization to get maximum performance. Here are the APIs" I think I saw that function name in the Python code as well.
Darkmatter
Darkmatter4mo ago
You can technically recv from a UDP socket, but it's almost always a bad idea.
Martin Vuyk
Martin Vuyk4mo ago
if hasattr(_socket.socket, "recvmsg"):
import array

def recv_fds(sock, bufsize, maxfds, flags=0):
""" recv_fds(sock, bufsize, maxfds[, flags]) -> (data, list of file
descriptors, msg_flags, address)

Receive up to maxfds file descriptors returning the message
data and a list containing the descriptors.
"""
# Array of ints
fds = array.array("i")
msg, ancdata, flags, addr = sock.recvmsg(bufsize,
_socket.CMSG_LEN(maxfds * fds.itemsize))
for cmsg_level, cmsg_type, cmsg_data in ancdata:
if (cmsg_level == _socket.SOL_SOCKET and cmsg_type == _socket.SCM_RIGHTS):
fds.frombytes(cmsg_data[:
len(cmsg_data) - (len(cmsg_data) % fds.itemsize)])

return msg, list(fds), flags, addr
__all__.append("recv_fds")
if hasattr(_socket.socket, "recvmsg"):
import array

def recv_fds(sock, bufsize, maxfds, flags=0):
""" recv_fds(sock, bufsize, maxfds[, flags]) -> (data, list of file
descriptors, msg_flags, address)

Receive up to maxfds file descriptors returning the message
data and a list containing the descriptors.
"""
# Array of ints
fds = array.array("i")
msg, ancdata, flags, addr = sock.recvmsg(bufsize,
_socket.CMSG_LEN(maxfds * fds.itemsize))
for cmsg_level, cmsg_type, cmsg_data in ancdata:
if (cmsg_level == _socket.SOL_SOCKET and cmsg_type == _socket.SCM_RIGHTS):
fds.frombytes(cmsg_data[:
len(cmsg_data) - (len(cmsg_data) % fds.itemsize)])

return msg, list(fds), flags, addr
__all__.append("recv_fds")
Darkmatter
Darkmatter4mo ago
I shudder to think of what that will look like in mojo generics.
Martin Vuyk
Martin Vuyk4mo ago
This is what I've currently got
async fn recv_fds(self, maxfds: UInt) -> Optional[List[FileDescriptor]]:
"""Receive up to maxfds file descriptors.

Args:
maxfds: The maximum amount of file descriptors.

Returns:
The file descriptors.
"""

@parameter
if sock_platform is SockPlatform.LINUX:
return (
await self._impl.unsafe_get[Self._linux_s]()[].recv_fds(maxfds)
)^
else:
constrained[False, "Platform not supported yet."]()
return None
async fn recv_fds(self, maxfds: UInt) -> Optional[List[FileDescriptor]]:
"""Receive up to maxfds file descriptors.

Args:
maxfds: The maximum amount of file descriptors.

Returns:
The file descriptors.
"""

@parameter
if sock_platform is SockPlatform.LINUX:
return (
await self._impl.unsafe_get[Self._linux_s]()[].recv_fds(maxfds)
)^
else:
constrained[False, "Platform not supported yet."]()
return None
or we could do it with a pointer idk
Darkmatter
Darkmatter4mo ago
I think that's for passing fds around using unix sockets. recvmsg is typically for "Please give me the whole datagram and who it came from". With UDP, recv doesn't tell you who the packet is from, which means it's only useful if you have a mapping from user id or similar back to IP address. This is why most higher level APIs actually disable recv with UDP sockets.
Dmitry Salin
Dmitry SalinOP4mo ago
Overall, I don't see the value in this abstraction. It has overhead, which means it's not suitable for something like a thread-per core event loop. At the same time, it's not a fully asynchronous runtime.
Darkmatter
Darkmatter4mo ago
I think we do need to build up something that isn't "manually write the state machine" for async, I think that it might make more sense to write the socket API as the POSIX sockets API, and let it be synchronous, then come back with a clean sheet of paper once we have io_uring and kqueue. Ideally we want an async API that is capable of functioning using sockets, but we should make the highest performance mode io_uring and bend everything else to work with that since most people will deploy on Linux. If networking on Mac and Windows is 2x slower, most people won't notice because they aren't actually doing enough networking for it to matter. 2x slower would still be "1 Gbps is doable" for Windows, not sure about Mac.
Dmitry Salin
Dmitry SalinOP4mo ago
It's my IMHO of course. I think it's good to have efficient building blocks. With async it's different because it's hard to create a runtime that will satisfy all needs.
Darkmatter
Darkmatter4mo ago
I may be biased, but I think that building a runtime for high performance needs and then making performance compromises to have an "easy mode" API makes sense since a lot of the people using the "easy mode" API won't care about performance as much. I'm thinking of a database or something similar that cares a lot about perf but wants cross-platform compatibility. They would be willing to deal with a more complex API to get cross-platform in a vaguely performant manner. Then you have an API which knocks things down to async versions of the posix sockets API for people who don't really care as much.
ModularBot
ModularBot4mo ago
Congrats @Darkmatter, you just advanced to level 15!
Dmitry Salin
Dmitry SalinOP4mo ago
Tigerbeetle's API is pretty good, but it doesn't use async and all the io_uring features.
Darkmatter
Darkmatter4mo ago
That's because of how their DB is assembled.
Dmitry Salin
Dmitry SalinOP4mo ago
But that doesn't mean you can't create an extended version with async.
Darkmatter
Darkmatter4mo ago
You can avoid async with io_uring if you write the state machine yourself. Also zig doesn't have async right now. You can, and I think that's the goal, use the user_data field of the completion to store a pointer to a coroutine. You can just pass the completion back to the coroutine and resume it.
Dmitry Salin
Dmitry SalinOP4mo ago
This is mainly due to how their consensus works.
Darkmatter
Darkmatter4mo ago
Most distributed consensus algorithms can be driven in this way, except for epaxos because it needs to be special.
Dmitry Salin
Dmitry SalinOP4mo ago
GitHub
Proposal: Event loop redesign · Issue #8224 · ziglang/zig
The current event loop is not ready yet (relatively slow, windows unfinished, bugs/races) and many wish for it to be. From the discord communities at least, there seems to be enough interest to war...
ModularBot
ModularBot4mo ago
Congrats @Dmitry Salin, you just advanced to level 5!
Darkmatter
Darkmatter4mo ago
I use paxos for most of mine and also use io_uring like that. Oh boy, trying to make a one size fits all event loop for Zig is going to be fun since it's a language for perf obsessed people. I imagine most people using Zig will write their own since Zig seems to mostly be a "big projects only" language.
Dmitry Salin
Dmitry SalinOP4mo ago
https://github.com/ziglang/zig/issues/8224#issuecomment-847669956
How async should it be? Zig's async is awesome of course. However, here's something surprising from our experience. We tried out Zig's async plus io_uring for TigerBeetle (https://github.com/coilhq/tigerbeetle/blob/main/src/io_async.zig) and then actually went back to explicit callbacks plus io_uring in the end (https://github.com/coilhq/tigerbeetle/blob/main/src/io.zig). The reason being that we were doing this for a distributed consensus protocol where we wanted to make sure our message handlers run to completion with the same system state, whereas coroutines increase dimensionality while a function is still running. We wanted clear jumping off points to I/O just because getting the consensus protocol right was hard enough. This is specific to our use-case for TigerBeetle only, it might not be relevant here, but wanted to share the anecdote if it helps.
Darkmatter
Darkmatter4mo ago
I think for networking specifically we can get to ~95% of peak with a single-threaded design, and then let users handle the rest. I agree that coroutines are less helpful for consensus, my personal paxos impl is also single threaded and doesn't use async. But for most people they want coroutines. I even want coroutines for most stuff. It may be worth investigating whether we can enable tiger-style-like methods with this, where you can mock all of the IO when testing. That would be a feature that might get some people to move over on its own, since testing IO-intensive systems is a massive headache.
Want results from more Discord servers?
Add your server