R
RunPod4mo ago
Yasmin

Llama

Hello! For those who tried, how much GPU is needed for inference only, and for fine-tuning of Llama 70B? How about the inference of the 400B version (for knowledge distillation)? Is the quality difference worth it? Thanks!
Solution:
Only for inference
No description
Jump to solution
3 Replies
Solution
Kanoi
Kanoi4mo ago
Only for inference
No description
Kanoi
Kanoi4mo ago
And then to fine- tune the model you need 4x memory space. Hope it could help you.
Yasmin
YasminOP4mo ago
This is so helpful. Many thanks!

Did you find this page helpful?