Runpod workers getting staggered when I call more then 1 at a time.
So i'm currently running an connected to the endpoint, and I've noticed that the workers tend to be deployed in a staggered way. That is I have a function that is splitting a workload into 50 runpod jobs. However I've noticed that for some reason, my endpoint does not actually use all 50 workers that I have that are ready. Instead it seems like the workers are getting staggered deployed that is I'll see that 36 of the jobs went through and are running and i still have 14 jobs in queue while I have 14 workers that are untouched? I've sent my server scaling to by request count with count set to 1 (should be the most aggressive way). I'm just stuck trying to resolve this because for the life of me I cant figure it out. Its causing what should be a 30 second tasks take over 1 minute ( as I have to wait for the staggered deployment and result) . Any one else have this issue or a recommendation. Thanks I appreciate it!
1 Reply
Can I get your endpoint id? Let me see if i can locate the issue
I just so happened to have support open when you emailed in, I'll take a look but I'll leave it to support either way as you will probably need them for a resolution.