Scalability Questions
Hi, I'm building an app which handles some reasonably intensive requests on the server (pupetteer headless + LLMs + image editing etc). I'm not sure how wasp (wasp open saas specifically) will handle multiple concurrent requests. Would it be better to host these processes in a separate docker container, scale horizontally and make requests to the containers? From the documentation, I'm struggling to understand exactly how the applications are deployed. I know how to do it with the CLI tools, but I don't know what the architecture looks like and what these limitations would be.
4 Replies
Hi @josh. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
----------------------
Wasp applications are full-stack apps that consist of a Node.js server, a static client, and a PostgreSQL database. You can deploy each part anywhere where you can usually deploy Node.js apps or static apps. For example, you can deploy your client on Netlify, the server on Fly.io, and the database on Neon. Wasp also offers a single-command deployment through the Wasp CLI [^1^].
However, when it comes to handling CPU-intensive tasks, Wasp starts pg-boss alongside your web server's application, where both are simultaneously operational. This means that jobs running via pg-boss and the rest of the server logic (like Operations) share the CPU, therefore you should avoid running CPU-intensive tasks via jobs. Currently, Wasp does not support independent, horizontal scaling of pg-boss-only applications, nor starting them as separate workers/processes/threads [^5^].
As for Docker, there is an open issue on GitHub discussing the possibility of completely dockerizing Wasp apps. The idea is to have both the web app and server running in their own Docker containers, ensuring that they run in exactly the same way for all developers. However, this approach might disrupt the typical development flow and possibly slow it down. It might also be tricky getting it to work with the IDE of choice + debugger, possibly also live reloading of the web browser [^4^].
In conclusion, while Wasp does not currently support independent, horizontal scaling, you could potentially use Docker to manage your CPU-intensive tasks. However, this might introduce additional complexity and potential issues. It might be worth considering hosting these processes in a separate Docker container, scaling horizontally, and making requests to the containers as you suggested.
[^1^]: (https://wasp-lang.dev/docs/advanced/deployment/overview)
[^4^]: (https://github.com/wasp-lang/wasp/issues/172)
[^5^]: (https://wasp-lang.dev/docs/advanced/jobs#pg-boss-considerations)
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Sorry @josh, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist:
Perhaps try again in 15mins?
Can anybody help with this?
You will get some answers from wasp team tomorrow I imagine, but from my POV - you get the same “limitations” as from a regular express node backend. Wasp isn’t a magical thing, it produces a nodejs server as an output, that can be deployed anywhere. Take a look at what wasp produces as a build output + you should always run your own benchmarks and tests to verify how the server handles the load
@josh as @IamIconLiving 🌶 said, it is currently quite simple: you get one client (SPA) and one nodejs server. So anything you execute is executing on that single server. Meaning, if you are running CPU intensive tasks on the server in order to serve those requests, then yeah, that might be something that needs extra thought. Simplest solution is just scaling vertically: having server with more RAM / CPU. Alternatively, you can move some of the heavy tasks to another server (a "microservice", that you would write and deploy yourself, outside of wasp, probably just simple nodejs server) that would be specialized for those heavy task(s) and and then used by the main server (the wasp one, it would call it via HTTP API most likely, or whatever you want). And you will likely want to have some queue on the side of that microservice, so if it gets overloaded with tasks, they get queued. We plan to have support for this directly in Wasp in the future (ability to say "I want another server and it will be used only for these operations and it can have a queue"), but probably not super soon.
If you are doing calls to LLMs via API, e.g. OpenAI API, then that is not intensive. If you are running local LLM on your server, then that will be resource intensive I imagine. Imagine editing, also resource intensive indeed.
What I would suggest is starting simple -> keep it all in one Wasp app. Monitor how that works, and when you see that these requests are indeed becoming a bottleneck, scale vertically (stronger machine) and consider starting working on extracting demanding operations to the microservice. I am suggesting this in order to avoid over-engineering the whole thing in advance, when you don't yet konw how things will go exactly. If you are very confident in how things will go then you might wnat to do it form the start though.
Sorry if I over-explained a bit, I am nt sure what your backgroudn / context is. Feel free to ask me more questions and I will try to clarify the details!
Btw here is a bit on how one user used an external Flask Python server with Wasp, that might be helpful if you go the route of the microservice: https://github.com/wasp-lang/wasp/issues/1877