Optimizing Performance vs. Domain Separation: How to Choose between Two Alternatives
We're designing a system with two microservices:
- Service A: When called, it returns a classification (e.g., some documents are marked for trash, others for forwarding to different administrations).
- Service B: Manages the documents' lifecycle after classification. For example if its classified to be sent to other administration, the service will handle notification to other administrations. It also have an trash resource etc.
There is some front-end used by. a user who give some inputs that are necessary for the service A to be able to classify documents. There is a BFF (back-for-front) as well. Once a document is classified, a tracing number will be printed and put in the document. I need to make a design decision :
Option 1 – BFF Generates the Tracing Number
1. BFF calls Service A.
2. Service A returns a classification.
3. BFF immediately generates a tracing number.
4. BFF calls the external printing service to print the classification with the generated tracing number.
5. BFF posts the classification and tracing number to Service B.
Option 2 – Delegate Tracing Number Generation to Service B
1. BFF calls Service A.
2. Service A returns a classification.
3. BFF forwards the classification to Service B.
4. Service B generates a tracing number and returns it to the BFF.
5. BFF calls the external printing service to print the classification with the returned tracing number.
Each workflow balances performance and domain separation differently. Option 1 minimizes latency by printing the tracing number immediately at the BFF level, while Option 2 maintains clearer separation by delegating the tracing number generation to Service B. How to choose ?
16 Replies
I'm a bit confused reading through this - Why does the BFF take over the notification to a printing service? Shouldn't this be part of service B's responsibility entirely if it is responsible for the lifecycle?
I think the order of operations is super important here though
Option 1 sounds risky - What happens if the printing service receives the printing command, but the workflow for some reason breaks and the classification does not reach service B?
I would definitely try to ensure a resource is present in my own system as a source of truth and then potentially handle failures in propagation to external systems than the other way round
So in a vacuum, I'd pick option 2 any day
Right now, this is how it’s being done, but assuming we move it, we’ll then have a tracing-number resource with a POST endpoint to generate one, and a separate endpoint to print it. Does that sound correct?
Or will it be a single endpoint responsible for both generating the tracing number and printing the label?
how to optimize for performance with all those calls ?
I think the decision of whether printing is a necessary part of the lifecycle is an important one - I could imagine if you have other services consuming a document, they might not need printing, so in that case a single service could very well take over merely maintaining raw data. But that's something that's difficult to estimate
When it comes to performance I guess the core question would be:
1) Are you rightfully concerned about performance? Do you have SLA's you need to hit, do you have users expecting certain response times? Or would it also be fine to run on asynchronous workflow which at some point finishes up, but it's not a big deal when it takes a few minutes even
2) If you're explicitly designing with multiple microservices, you just kind of willingly accept that network traffic increases latency and potential failures, so it's also a trade off which you'll hopefully be able to pay back with organizational advantages you gain from the separation
For the point 1., no it's not possible. Users have dozens of documents to treat every day. A user cannot move to the next document unless the label with the tracing number is printed.
For the point 2. Unfortanetly, it's a organization decision. Architectures decided to do microservices :/ So I am trying to find tricks to improve performance and reduce failures that will block the whole users' worflow
I mean for option 1, the graph would look like
User -> BFF -> A, BFF -> Printing, BFF -> B
For option 2
User -> BFF -> A, BFF -> B, BFF -> Printing
So the exact same steps would be performed anyways unless I miss some specific step?
that's right !
but for option 2. It depends.
It can be :
User -> BFF -> A, BFF -> B, BFF -> Printing (the printing is done implicitly when BFF calls B, B generates and prints. A single call )
Or
User -> BFF -> A, BFF -> B, BFF (generating the tracing number) -> B, BFF (printing)
So I have a hard time making a pick here because I don't know if service B would benefit from offering separate endpoints (Are there other consumers who might only need one of both atomic operations?). I feel like as a rule of thumb I'd favor domain consistency over attempting to optimize a problem space prematurely. And if it turns out you have a potential optimization you can back up with tangible data then it should be quick to introduce another aggregated endpoint I guess
I feel like an aggregated endpoint seems plausible here
The point is that, what if a user looses the first printed label, he or she may want to re-print. If generating the tracing number and printing it is aggregted in a single endpoint, how we will manage a second print ?
I'd assume you can still have separate steps within an aggregated service which can individually be implemented with retries in mind
E.g. hangfire jobs etc
I am not sure I understand this
I am not sure how i can use those jobs in general in my endpoint
Oh, right, you mentioned it can't be an asynchronous process
Scratch that suggestion then
So the flow will be like this :
BFF --> A : have the classification. Note that A needs also to call service C to get some information
BFF --> B, endpoint 1 : generating the tracing-id
BFF --> B endpoint 2: printing
I fear it's a lot
all those calls needs to be done before the user could move to another document
Agree, overall I do feel like the biggest handle you'd have to optimize the flow is to change the UX on the client side, e.g. being able to batch documents, give the user async feedback etc
you mean instead of printing document per document, we print for 10 documents at once or something like this ?
Yeah, if I was a user of such a system I'd probably expect that I can just go through a few documents and sign them off, without having to wait for them to upload
Imagine something like Google Photos or whatever Apple provides for that
You also don't just swipe through photos sequentially, press upload, wait for them to upload to a cloud and only then move onto the next one
You can just press upload and carry on
There's a notification bar to notify you how the progress is going, but it doesn't block your flow
yeah but then there is a ris that you put the wrong label on the wrong document
documents are physical documents