R
RunPod8mo ago
esho

Faster Whisper Latency is High

I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.
11 Replies
nerdylive
nerdylive8mo ago
Hi there just wondering how did you benchmark those
esho
eshoOP8mo ago
"import time start = time.time() response = requests.post(url, json=payload, headers=headers) print("Time taken: ", time.time() - start)" a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.
nerdylive
nerdylive8mo ago
Oh is this from your pc? Or is that on your handler code?
esho
eshoOP8mo ago
it's from my PC. I also tested it on GCP
nerdylive
nerdylive8mo ago
it may be the network latency+execution time
esho
eshoOP8mo ago
the executionTime is in the response, about 800ms. I think this is also high.
nerdylive
nerdylive8mo ago
oh what config (inputs) do you use its pretty average i think yeah and what gpu are you using too
digigoblin
digigoblin8mo ago
800ms is pretty quick actually
nerdylive
nerdylive8mo ago
yeah pretty avg right depends on what config hes using too
esho
eshoOP8mo ago
I am using the default config. I think it should run as fast as the local machine. Although it is called serverless, only me is using the server after cold start. This should be really fast. I am using RTX3090 and RTX4090.
nerdylive
nerdylive8mo ago
Hmm yeah makes sense It will be if your requests keep coming I think I don't know yet but maybe try another longer audio maybe it will be faster
Want results from more Discord servers?
Add your server