RunPod•13mo ago

Faster Whisper Latency is High

I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.

11 Replies

Jason•13mo ago

Hi there just wondering how did you benchmark those

eshoOP•13mo ago

"import time start = time.time() response = requests.post(url, json=payload, headers=headers) print("Time taken: ", time.time() - start)" a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.

Jason•13mo ago

Oh is this from your pc? Or is that on your handler code?

eshoOP•13mo ago

it's from my PC. I also tested it on GCP

Jason•13mo ago

it may be the network latency+execution time

eshoOP•13mo ago

the executionTime is in the response, about 800ms. I think this is also high.

Jason•13mo ago

oh what config (inputs) do you use its pretty average i think yeah and what gpu are you using too

digigoblin•13mo ago

800ms is pretty quick actually

Jason•13mo ago

yeah pretty avg right depends on what config hes using too

eshoOP•13mo ago

I am using the default config. I think it should run as fast as the local machine. Although it is called serverless, only me is using the server after cold start. This should be really fast. I am using RTX3090 and RTX4090.

Jason•13mo ago

Hmm yeah makes sense It will be if your requests keep coming I think I don't know yet but maybe try another longer audio maybe it will be faster

Gaming

Programming

Faster Whisper Latency is High

Did you find this page helpful?