is anyone experiencing a massive delay time when sending jobs to GPUs on serverless?
We are sending jobs off to our whisper serverless functions and experiencing massive delay times sometimes and sometimes it just goes through quickly? At the moment we are just testing so we are using a single 16GB GPU? Has anyone got any advice on this?
4 Replies
Don't use 1 single GPU, its probably throttled. Leave max workers alone, don't change it from the default of 3.
Flashboot on? Are you using the default template?
Problem is always going to be 1 single worker, people keep doing that then complaining about delay times, when they MUST NOT set max workers to 1 🤦♂️
Its set to 3 by default FOR A REASON, DON'T CHANGE IT.
I think RunPod should just disallow setting it below 3, because people keep making this mistake time and again.
Thanks everyone its seems to work better with more max workers!