Llama3 setup
Hi, everyone.
We are planning to deploy Llama3 for our app with millions of users.
How can we achieve this?
And which GPU series or cloud platforms are best for achieving high speed and scalability?
1 Reply
At a high level, first, you need to decide which model you want to run—8B, 70B, or 405B—because each has different GPU RAM requirements. The next important thing is understanding the request patterns from your millions of users, like average requests per second, peak times, and how long each request takes to process. This will help you figure out how many GPUs you need. And of course, there are a lot more factors to consider and experiment with. It’s best to start small and test things out.