R
RunPod•4w ago
1AndOnlyPika

delay time

I have a serverless worker, which is configured to have 15 max workers. However, I notice that only about three of them are actually usable. My workload is configured to timeout if it takes longer than a minute to process. The other workers randomly have issues such as timing out when attempting to return job data or completely failing to run and having to be retried on a different worker, leading to a delay/execution time of over 2-3 minutes Executing 6 different jobs all have very different delay times. Some worker ids are consistenly low delay time but some randomly take forever. Is there anything I can do to lower this randomness? Additionally can I delete/blacklist these workers that perform poorly
No description
7 Replies
digigoblin
digigoblin•4w ago
You can terminate the ones that are behaving badly, but unfortunately no way to blacklist them. I have also experienced similar behavior 😦 My execution time is usually around 40s and my timeout is 5 mins and the 5 mins was reached pretty recently I suggest logging a ticket for this on the website. I logged a ticket and still waiting for RunPod to get back to me.
nerdylive
nerdylive•4w ago
lmk how it went maybe* its just the cold start time, if those are the same workers
1AndOnlyPika
1AndOnlyPika•4w ago
How does the cold start time differ so much from workers though
digigoblin
digigoblin•4w ago
Already been like 3 days and just crickets from RunPod as usual 😢
nerdylive
nerdylive•4w ago
😦
1AndOnlyPika
1AndOnlyPika•4w ago
this error happens randomly, {"requestId": "e617c5c9-b14c-42c6-886e-ec35f1b05bc9-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/rtqb8oacytm879/job-done/oh8mcc8cdhv1cx/e617c5c9-b14c-42c6-886e-ec35f1b05bc9-u1?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"} causing the job to fail
digigoblin
digigoblin•4w ago
This is a different issue, log a support ticket on the website for this.,