R
RunPod3w ago
Bitman

best architecture opinion

Hello, I would like to build an app that out of 1 prompt specified by a user, create 10 prompts. Then call a model once for each of these 10 prompts, giving me 10 responses. Then, do a final call to aggregate the 10 responses into one final response that will be returned to the user. My question is the following, do you have any advice on how to build this ? option a) send the user prompt to the serverless endpoint, and within the endpoint, create the 10 prompts, and call the model sequentially, and then one last time to aggregate the result. All of that in 1 call from the user to the serverless endpoint option b) create the 10 prompts on the client, send them to the serverless endpoint (could be done in parallel), wait for the 10 responses, and then send the final aggregation prompt, together with the 10 responses, to aggregate and get the final response. In terms of speed, I think option B is faster, as we can make the 10 calls in parallel. In terms of cost, I don't think there is much of a difference, as we have to call the model 11 times in any case, but please correct me if Im missing something In terms of complexity, it makes the serverless endpoint very simple, simple model inference, no other logic. I will go with option B, but Im not experienced with serverless architecture, so please let me know if Im missing anything, or maybe an option C ?
4 Replies
nerdylive
nerdylive3w ago
B is faster yes just set queue delay to 1 to make sure they work in the almost same time Oh it also depends, if your gpu is fast enough to generate the response, then option a can be faster
yhlong00000
yhlong000002w ago
I feel create a backend that takes user input and calls a serverless endpoint to generate 10 prompts. Then, you can make parallel calls to serverless endpoint to process these prompts simultaneously. Finally, you can call a serverless endpoint again to aggregate the responses. It’s important to handle all possible cases, such as handling failed calls where only some responses are returned, or when the aggregation call fails.
nerdylive
nerdylive2w ago
That seems redundant to use serverless endpoints to call endpoints
yhlong00000
yhlong000002w ago
I agree that it adds extra complexity. The advantage is that your backend only manages the workflow, while your serverless function has a single responsibility for performing the inference. By separating these tasks, the backend can scale or be modified independently as future needs arise.