nevermind
nevermind
RRunPod
Created by pseudoterminalx on 9/4/2024 in #⛅|pods
GPU errored, machine dead
May be I should bring it into the feedback
11 replies
RRunPod
Created by pseudoterminalx on 9/4/2024 in #⛅|pods
GPU errored, machine dead
Our practice is to run a short cuda test (like getting statistics or something). I think it will enhance DX if they do this on their side.
11 replies
RRunPod
Created by pseudoterminalx on 9/4/2024 in #⛅|pods
GPU errored, machine dead
Why these pods are exposed to the users 🤯 It's such an easy task to detect broken gpu for RunPod, but they just ignore this issue for like 3 month
11 replies
RRunPod
Created by nevermind on 9/3/2024 in #⛅|pods
My pod had been stuck during initialization
Normally this image pulling for 1-2 mins, but these pods were pulling it for 5 min, until I've killed them
11 replies
RRunPod
Created by nevermind on 9/3/2024 in #⛅|pods
My pod had been stuck during initialization
Endless image fetching. Like there was no progress bar, just "still fetching XXX"
11 replies
RRunPod
Created by nevermind on 9/3/2024 in #⛅|pods
My pod had been stuck during initialization
This happened again right now - 9ff2sxw9irvb5s
11 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
I appreciate your advice, I'll send it as feedback tho
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
Because kubernetes does that way and it allows pod to handle graceful term, instead of instant annihillation (runpod does that way)
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
pod receives termination -> sigterm -> 1min alive (graceful period) -> pod sends sigkill and dies
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
for a graceful minute
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
yeah you right
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
on demand
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
Yes, it is, but while stream is in the progress - it would be dropped without any graceful period. My point is that the pod should have a graceful period instead of an instant sigkill
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
It is crucial for us, because of LLM streaming. We wanna utilize some big GPUs, but it is not really possible, because any progressing stream will be terminated no matter what 😦
26 replies
RRunPod
Created by nevermind on 8/21/2024 in #⛅|pods
How does runpod handle pod terminating
Just to mention: kubernetes sends SIGTERM
26 replies
RRunPod
Created by Ercan on 8/17/2024 in #⛅|pods
URGENT! Network Connection issues
It happened likely at [2024-08-16 02:58:40,236] +- 10 mins
157 replies
RRunPod
Created by Ercan on 8/17/2024 in #⛅|pods
URGENT! Network Connection issues
Also encountered other network timeouts, but we've erased these logs
157 replies
RRunPod
Created by Ercan on 8/17/2024 in #⛅|pods
URGENT! Network Connection issues
k5svp2kqw0rh7s - I've faced ReadTimeout: (ReadTimeoutError(\\"HTTPSConnectionPool(host=\'cdn-lfs.huggingface.co\', port=443) with this one
157 replies
RRunPod
Created by Ercan on 8/17/2024 in #⛅|pods
URGENT! Network Connection issues
We also experience temporary network issues such as read timeouts, dropped connections, and slow network.
157 replies
RRunPod
Created by nevermind on 5/21/2024 in #⛅|pods
graphql Unauthorized
The solution was given on Slack I had to use this query not like
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
But
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n **machine { id }** }\n }\n}"
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n **machine { id }** }\n }\n}"
Note, that gql spec is outdated
11 replies