Faster Whisper Latency is High
I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.
11 Replies
Hi there just wondering how did you benchmark those
"import time
start = time.time()
response = requests.post(url, json=payload, headers=headers)
print("Time taken: ", time.time() - start)"
a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.
Oh is this from your pc?
Or is that on your handler code?
it's from my PC.
I also tested it on GCP
it may be the network latency+execution time
the executionTime is in the response, about 800ms. I think this is also high.
oh
what config (inputs) do you use
its pretty average i think yeah
and what gpu are you using too
800ms is pretty quick actually
yeah pretty avg right
depends on what config hes using too
I am using the default config. I think it should run as fast as the local machine.
Although it is called serverless, only me is using the server after cold start. This should be really fast.
I am using RTX3090 and RTX4090.
Hmm yeah makes sense
It will be if your requests keep coming I think
I don't know yet but maybe try another longer audio maybe it will be faster