RunPod•8mo ago

Llama-70B 3.1 execution and queue delay time much larger than 3.0. Why?

I deployed these two model who seem to be using same techniques. I'm using same machine 2x80GB but the execution time and queue delay time has massive differences: Queue delay: Llama70B 3.0: 0.02 secs Llama70B 3.1: 0.15 secs Execution time: Llama70B 3.0: 0.65 secs Llama70B 3.1: 3 secs Models: Llama 70B 3.0: https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 Llama 70B 3.1: https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-lorablated

1 Reply

Charixfox•8mo ago

You can find sources online that indicate 3.0 averages around three times as fast as 3.1. So while 3.1 is more accurate, 3.0 is speedier.