Which Architecture to Choose?
I am working with two .NET-based architectures for processing HTTP POST requests, and I’m looking to wondering what to choose and how to measure things. Here's how the two architectures work:
Architecture 1: Decoupled Worker with Kafka
A POST request writes data to a database and sends a message to a Kafka topic.
A separate worker app consumes messages from Kafka, processes them, and updates the database.
Architecture 2: Background Service in the Same App
A POST request writes data to a database.
A background service running in the same application continuously polls the database for unprocessed tasks and processes them one by one.
Architecture 1 decouples the request handling and background processing using Kafka, allowing independent scaling of the worker.
Architecture 2 couples the request handling and background processing within the same app.
In what conditions I must chose the first or second one? Performance-wise how should I go about testing both ?
Thanks
33 Replies
my opinion: polling is preferably to be avoided and to me polling a db is mostly wrong (i could understand having that as an emergency procedure)
Another opinion. What volume of incoming calls are you expecting? Just because you can, it doesn't mean you should necessarily decouple the consumer into its own external process right away.
If I didn't already have reasons to believe the system will get absolutely hammered by requests, I'd personally go for a BackgroundService.
As for polling the database, you can get rid of any problem with empty reads (and by that I mean querying and coming up empty handed because there's nothing to do) or ridiculously short polling intervals by using Redis or any of its forks as a message broker with its pub/sub pattern.
Check out:
- https://stackexchange.github.io/StackExchange.Redis/Basics#using-redis-pubsub
- https://redis.io/docs/latest/develop/interact/pubsub/
- https://stackexchange.github.io/StackExchange.Redis/PubSubOrder
Doing things this way, you know exactly when you have something to do and avoid bothering the db. It's also a matter of how fast you need the results and how long the processing takes. If the processing itself doesn't take very long and messages don't have to be processed the second they come in, a tiny SELECT every few seconds is probably not going to kill anyone.
I say, keep it simple unless you already know the traffic will be crazy, or if the processing has to carry on no matter what even if the main app goes down. If it does become crazy, it's then fairly trivial to move the BackgroundService class and make it its own app (without changing anything, because you're using Redis anyway: as long as the machine where you put the consumer can access the same Redis instance, you're golden. Things do change if you already know you're gonna need multiple consumers).
Do note that, if you really want to use Kafka, you can still go for an in-process consumer, and move it out-of-process should you need to.
Redis Pub/Sub
How to use pub/sub channels in Redis
@SteveTheMadLad I cannot use Redis or something like. The requests will be something around 100 reqs/secon, is that considered to be crazy ?
any other help ?
@reacher or @ZZZZZZZZZZZZZZZZZZZZZZZZZ You don't know me, but I trust your opinions a lot. You've greatly influenced me when it comes to simplifications. What do you think about this case? 🙏
Keep it simple for sure, kafka has no place in this
By processing do you mean you're getting incoming POST calls that you are going to process in the background? So the caller doesn't expect a response right away?
It's hard to give good advice when it's very abstract, can you say a bit more about what it is?
How come you can't use redis? Would an in-memory local pubsub thing work instead?
@Pobiega It's a coroporation :/ in-memory ub-sub like what ?
zeromq?
but just to be clear, Valkey is the linux foundation fork of redis which is 100% OSS and fine to use. You could easily slap up a container with valkey and use that
yes but I can't The architectures would never approve this :/
I am limited to using libs here
Ok. Have a look at zeromq then, or just do something in-process
this is what I did, I though of first saving the requests in the db, and, in the same-process, there is a background service constantly looking for unprocessed elements
it's polling the db and sometimes it's empty because nothing is to e treated :/
I do know how to "measure" those downsides
the messaging/pubsub would be to prevent the polling
Okay !
@reacher This API is designed to handle requests for authorizing parking access, making the process straightforward for the client. Here’s how it works now:
1. The client sends a POST request to request parking access.
2. Upon receiving this request, we save it in the database with a status of "pending authorization" and immediately respond with a 200 OK as long as all required data is provided.
3. A background service regularly scans for requests marked as "pending authorization." For each of these, it sends a POST request to an internal system responsible for validating and authorizing parking access.
4.1. If the internal system call succeeds: The request status is updated from "pending authorization" to "authorized."
4.2. If the internal system call fails, a retry counter (stored in the db document) is incremented. The background service will retry the process later.
5. If the retry counter reaches a predefined limit, the system stops retrying, and the request remains unresolved.
(we don't care about having places ot not, it's only a matter of authorization, we don't care about notifying clients, it's someelse problem)
how is the client notified of the authorization result?
it's in the internal system to which we send the requests (It's a third-party software).
so essentially, someone elses problem, right?
yes right
nice
okay, yeah then what we said above works. Prefer messaging over polling, especially if its high throughput, but you will still need polling, in case the service is restarted etc
okay so it's introducing an in-memory pub/sub message but keeping the pooling to activate in case of emergencies right ?
yeah, at a lower rate
or you just schedule a poll to run after system startup
I see, I love the idea of schedling after startup, I've never thought of it this way
@Pobiega in zero-mq it's in-memory but it uses sockets 🤔 ?
it has several modes
iirc it can run entirely in process, but can also run using named pipes etc
I see, thank you
Sounds like something you could just use Hangfire for
Hangfire, Coravel, Quartz could all do it
Oh I didn't know that I can do it with Quartz !
@reacher 🫠 : : : : : : : 🔮🔮 just for my understanding, why did you say that kafka has no place here, what are the arguments behind ?
@Pobiega I am a little bit confused because as far as I know Quartz is not a pub-sub system
There's no need for that kind of complexity, this is really basic stuff
@reacher 🫠 : : : : : : : 🔮🔮 In what cases would kafka make more sense according to you ?
When you have a lot of separate services that need to communicate
Not for something this simple
@reacher will having kafka solve any performance issues here ? If the requests frequency becomes higher for example
Only of you move to a multiple worker setup
Yeah the only way that's going to make a difference is if you have multiple workers, but also before you go in that direction you need to know what the bottlenecks actually are. Having multiple workers to send more requests to a single other API isn't going to solve the fact that the bottleneck is that other API
None of this is magic, adding extra junk on top of something that has real limitations isn't going to do anything about those limitations unless you understand what those limitations are and solve them appropriately
@reacher I see, that's waht I am saying. Here the main aim is to desynchronize POST requests and procesing needed for each request. The desynchronization can be perfectly done using an in-process tool, no need for kafka whatsoever. The lead dev is not convinced :/
What are the lead dev's concerns or arguments for his preferred way?