R
RunPod13mo ago
esho

Faster Whisper Latency is High

I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.
11 Replies
Jason
Jason13mo ago
Hi there just wondering how did you benchmark those
esho
eshoOP13mo ago
"import time start = time.time() response = requests.post(url, json=payload, headers=headers) print("Time taken: ", time.time() - start)" a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.
Jason
Jason13mo ago
Oh is this from your pc? Or is that on your handler code?
esho
eshoOP13mo ago
it's from my PC. I also tested it on GCP
Jason
Jason13mo ago
it may be the network latency+execution time
esho
eshoOP13mo ago
the executionTime is in the response, about 800ms. I think this is also high.
Jason
Jason13mo ago
oh what config (inputs) do you use its pretty average i think yeah and what gpu are you using too
digigoblin
digigoblin13mo ago
800ms is pretty quick actually
Jason
Jason13mo ago
yeah pretty avg right depends on what config hes using too
esho
eshoOP13mo ago
I am using the default config. I think it should run as fast as the local machine. Although it is called serverless, only me is using the server after cold start. This should be really fast. I am using RTX3090 and RTX4090.
Jason
Jason13mo ago
Hmm yeah makes sense It will be if your requests keep coming I think I don't know yet but maybe try another longer audio maybe it will be faster

Did you find this page helpful?