jg
jg
RRunPod
Created by Hara Kang on 3/3/2024 in #⚡|serverless
Huge P98 execution time in EU-RO region endpoint
It would really help if we can get any updates on this.. the increase in execution time is causing us to spawn up more than 3 time the normal amount of workers that we needed to handle our normal traffic.
4 replies
RRunPod
Created by Hara Kang on 3/3/2024 in #⚡|serverless
Huge P98 execution time in EU-RO region endpoint
Adding on to this issue, we've noticed that there might be messages in the queue that have not been properly handled. Based on the logs from one of our endpoints, we see KeyError: 'input' even when there are no requests being sent to this specific endpoint (1wfnup871iklus)
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
KeyError: 'input'
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
if job["input"] is None:
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
File "/usr/local/envs/venv/lib/python3.9/site-packages/runpod/serverless/work_loop.py", line 43, in start_worker
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
return future.result()
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
KeyError: 'input'
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
if job["input"] is None:
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
File "/usr/local/envs/venv/lib/python3.9/site-packages/runpod/serverless/work_loop.py", line 43, in start_worker
2024-03-04 16:09:49.500
[co12bhjtlvxjpx]
[info]
return future.result()
We suspect that in advent of this error, the worker refreshes and results in causing additional delays in processing the requests. But this is speculation and any help would help us a lot in addressing this issue.
4 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
it seems like our endpoint has recovered
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
not the pro
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
oh wait 24GB GPU
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
yup
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
it does seem like one instance is processing the requests
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
no worries thanks for the quick response
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
also shared in the screen shot
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
50 at the moment
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
1wfnup871iklus
23 replies
RRunPod
Created by jg on 2/22/2024 in #⚡|serverless
[URGENT] EU-RO region endpoint currently only processing one request at a time
61 Throttled, 16 Running, 0 Idle
23 replies
RRunPod
Created by jg on 2/17/2024 in #⚡|serverless
ECC errors on serverless workers using L4
Awesome! Thank you very much for the help. We're seeing no failures so far from our endpoint in production 👍
13 replies
RRunPod
Created by jg on 2/17/2024 in #⚡|serverless
ECC errors on serverless workers using L4
@flash-singh I know you guys might be on holiday but do you have any updates for us?
13 replies
RRunPod
Created by jg on 2/17/2024 in #⚡|serverless
ECC errors on serverless workers using L4
Even after refreshing, the machine might recover, but this as well fails after some time.
13 replies
RRunPod
Created by jg on 2/17/2024 in #⚡|serverless
ECC errors on serverless workers using L4
We've tried terminating, but at some later point in time, some of our workers get spawned on the same machine that has been throwing ECC errors.
13 replies
RRunPod
Created by jg on 2/17/2024 in #⚡|serverless
ECC errors on serverless workers using L4
thanks! we keep seeing this particular machine (x4udv5lkhl7d) with ECC errors
13 replies