RunPod•10mo ago

delay time

I have a serverless worker, which is configured to have 15 max workers. However, I notice that only about three of them are actually usable. My workload is configured to timeout if it takes longer than a minute to process. The other workers randomly have issues such as timing out when attempting to return job data or completely failing to run and having to be retried on a different worker, leading to a delay/execution time of over 2-3 minutes Executing 6 different jobs all have very different delay times. Some worker ids are consistenly low delay time but some randomly take forever. Is there anything I can do to lower this randomness? Additionally can I delete/blacklist these workers that perform poorly

7 Replies

digigoblin•10mo ago

You can terminate the ones that are behaving badly, but unfortunately no way to blacklist them. I have also experienced similar behavior 😦 My execution time is usually around 40s and my timeout is 5 mins and the 5 mins was reached pretty recently I suggest logging a ticket for this on the website. I logged a ticket and still waiting for RunPod to get back to me.

nerdylive•10mo ago

lmk how it went maybe* its just the cold start time, if those are the same workers

1AndOnlyPikaOP•10mo ago

How does the cold start time differ so much from workers though

digigoblin•10mo ago

Already been like 3 days and just crickets from RunPod as usual 😢

nerdylive•10mo ago

😦

1AndOnlyPikaOP•10mo ago

this error happens randomly,

{"requestId": "e617c5c9-b14c-42c6-886e-ec35f1b05bc9-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/rtqb8oacytm879/job-done/oh8mcc8cdhv1cx/e617c5c9-b14c-42c6-886e-ec35f1b05bc9-u1?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"}

causing the job to fail

digigoblin•10mo ago

This is a different issue, log a support ticket on the website for this.,

Gaming

Programming

delay time

Did you find this page helpful?