Runpodctl in container receiving 401
Over the past few days, I have sometimes been getting a 401 response when attempting to stop pods with
runpodctl stop pod $RUNPOD_POD_ID
at the end of my jobs. This is causing the container to restart on exit rather than stop. Do the credentials passed to the container expire?Solution:Jump to solution
ok. so any pods created before the migration will fail when stopping via runpodctl
55 Replies
The pod api key only gets deleted when the pod is terminated as far as I am aware. Sounds like some kind of bug, because this shouldn't happen. @Justin Merrell any idea why this would happen?
Will look at this in a moment, also tagging @nathaniel to assist me
@blakeblackshear are you also creating the pod through runpodctl?
im not
Or through some other means? Check to see if your api key is populated in runpod config
i create then in the UI then use the API to start them on demand to process a job queue
it doesnt always happen
Should be at ~/.runpod/config.toml
Oh hmm
Ok I can look in more detail in a minute
it might only be happening for older pods
i create the pods in the UI and they stay in the exited state until a job is ready for processing
so some could be fairly old
or at least been a while since they had a GPU available
We recently introduced an api key cleanup script which removes api keys which exclusively give permissions for pods that no longer exist in the db, but it sounds like your case is not that because your pods do exist
yea. they do
i have one running now that should finish its job in 45 minutes and I can see if it has the same issue
how old? a few hours or a few days?
weeks
ok
some have probably been around for more than month
i reuse the same pool of pods so that the image is already cached
do you still have any of the pod ids that runpodctl has failed to stop?
i deleted them already. im watching 26ajo9dnjtx7vt to see if it fails
just happened on that pod
here is the message in the logs
Error: statuscode 401
I have a hunch what's happening, will run some tests to confirm
is there not a way to stop a pod in the UI? this is hanging up my job queue
from the web terminal:
That’d be a bit of a problem and was my initial guess
i get the same message even if i generate an api key in settings
i already deleted that api key from my acct
Did you make a new one? It obviously won’t work if you have an empty string there
yea. i generated a new one with read&write
ran the commands in that screenshot
you can see it did update the config file
but i still got a 401
Ok never mind
are some of these pods created on community cloud and some on secure cloud?
all secure cloud
ok, because the fact that you’re not seeing a way to do this on the ui makes me think maybe the 401 is deliberate. And you’re stopping but not terminating the pods
right
i can terminate in the UI
but not stop
nor can i in the web console
ok. this would have been a very recent change. its been running for months just fine
that's not ~/.runpod/config.toml, that's ~/.runpod.yaml
right
are you running this from inside a pod, so using an older version before the migration?
~/.runpod/config.toml doesnt exist
yea. im inside a pod
maybe these pods were created prior to the migration
yep the version of runpodctl inside there is before the migration. sorry for confusion there
Solution
ok. so any pods created before the migration will fail when stopping via runpodctl
so if i delete my pods and recreate, i should be ok
the migration just affected where the api key config file is written to
i have a job finishing in a few minutes on a new pod
this pod doesnt have either
are you trying to stop the pod from the web terminal inside the same pod?
i was just seeing if the config was there
but yes
my CMD script runs
runpod stop pod
at the endI see. All runpodctl does is issue commands to runpod api, it does not need to be run from inside the same pod if that helps. but it sounds like it's deliberately done from inside the pod
yea it is
i dont know when the job finishes from outside
so maybe pods no longer have permissions to stop themselves if this doesnt work on a new pod
when i tried before, the pod just restarts when exiting the CMD script regardless of the exit code
is that still the case?
new pod seemed to exit fine
so i think if i delete all my old pods it will resolve itself
@Justin Merrell thinks the api key used inside a pod is injected from the environment variables to the config yaml or toml is not necessary inside a pod
makes sense
either way. i just terminated all my pods and am spinning up new ones
seems like 4090s are super scarce all of a sudden
In a pod check if
echo $RUNPOD_API_KEY
returns an API key.
Running runpodctl stop pod $RUNPOD_API_KEY
should work without needing to setup the API as long as you are running this inside a pod.that env var is set
looks like the jobs on new pods are completing
so i think this is resolved
cool, I'm glad you got it, sorry for the wild goose chase earlier
@Justin Merrell looks like @blakeblackshear exposed their API key above, mind deleting those messages please, @blakeblackshear I suggest deleting that API key and creating a new one.
@ashleyk I deleted the api key before posting the screenshot
This is not related to to this thread, its a completely different issue, you need to authenticate with CivitAI.
yay