Hello! For those who tried, how much GPU is needed for inference only, and for fine-tuning of Llama 70B? How about the inference of the 400B version (for knowledge distillation)? Is the quality difference worth it? Thanks!