R
RunPod4mo ago
octopus

Llama-70B 3.1 execution and queue delay time much larger than 3.0. Why?

I deployed these two model who seem to be using same techniques. I'm using same machine 2x80GB but the execution time and queue delay time has massive differences: Queue delay: Llama70B 3.0: 0.02 secs Llama70B 3.1: 0.15 secs Execution time: Llama70B 3.0: 0.65 secs Llama70B 3.1: 3 secs Models: Llama 70B 3.0: https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 Llama 70B 3.1: https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-lorablated
1 Reply
Charixfox
Charixfox4mo ago
You can find sources online that indicate 3.0 averages around three times as fast as 3.1. So while 3.1 is more accurate, 3.0 is speedier.
Want results from more Discord servers?
Add your server