RunPod•4mo ago

What are ttft times we should be able to reach?

Of course this depends on token inputs, hardware selection etc. But for the life of me, I cannot get a TTFT of under 2000 ms on serverless. I'm using llama 3.1 7b / gemma / mystral on 48 GB gpu workers. For performance evaluation I use guidellm which test for different throughput (continous, small, large) scenarios. Even with 50 input tokens and 100 output tokens I see 2000-2500 ms ttft. I should add that I'm running guideLLM from a local python script to the serverless endpoint. Has anyone observed quicker times?

1 Reply

yhlong00000•4mo ago

Maybe try different GPU types? 48 pro, 80, 80 pro

Gaming

Programming

What are ttft times we should be able to reach?

Did you find this page helpful?