Runpodctl in container receiving 401

Over the past few days, I have sometimes been getting a 401 response when attempting to stop pods with runpodctl stop pod $RUNPOD_POD_ID at the end of my jobs. This is causing the container to restart on exit rather than stop. Do the credentials passed to the container expire?
Solution:
ok. so any pods created before the migration will fail when stopping via runpodctl
Jump to solution
55 Replies
ashleyk
ashleyk10mo ago
The pod api key only gets deleted when the pod is terminated as far as I am aware. Sounds like some kind of bug, because this shouldn't happen. @Justin Merrell any idea why this would happen?
Justin Merrell
Justin Merrell10mo ago
Will look at this in a moment, also tagging @nathaniel to assist me
nathaniel
nathaniel10mo ago
@blakeblackshear are you also creating the pod through runpodctl?
blakeblackshear
blakeblackshearOP10mo ago
im not
nathaniel
nathaniel10mo ago
Or through some other means? Check to see if your api key is populated in runpod config
blakeblackshear
blakeblackshearOP10mo ago
i create then in the UI then use the API to start them on demand to process a job queue it doesnt always happen
nathaniel
nathaniel10mo ago
Should be at ~/.runpod/config.toml Oh hmm Ok I can look in more detail in a minute
blakeblackshear
blakeblackshearOP10mo ago
it might only be happening for older pods i create the pods in the UI and they stay in the exited state until a job is ready for processing so some could be fairly old or at least been a while since they had a GPU available
nathaniel
nathaniel10mo ago
We recently introduced an api key cleanup script which removes api keys which exclusively give permissions for pods that no longer exist in the db, but it sounds like your case is not that because your pods do exist
blakeblackshear
blakeblackshearOP10mo ago
yea. they do i have one running now that should finish its job in 45 minutes and I can see if it has the same issue
nathaniel
nathaniel10mo ago
how old? a few hours or a few days?
blakeblackshear
blakeblackshearOP10mo ago
weeks
nathaniel
nathaniel10mo ago
ok
blakeblackshear
blakeblackshearOP10mo ago
some have probably been around for more than month i reuse the same pool of pods so that the image is already cached
nathaniel
nathaniel10mo ago
do you still have any of the pod ids that runpodctl has failed to stop?
blakeblackshear
blakeblackshearOP10mo ago
i deleted them already. im watching 26ajo9dnjtx7vt to see if it fails just happened on that pod here is the message in the logs Error: statuscode 401
nathaniel
nathaniel10mo ago
I have a hunch what's happening, will run some tests to confirm
blakeblackshear
blakeblackshearOP10mo ago
is there not a way to stop a pod in the UI? this is hanging up my job queue
blakeblackshear
blakeblackshearOP10mo ago
from the web terminal:
No description
nathaniel
nathaniel10mo ago
That’d be a bit of a problem and was my initial guess
blakeblackshear
blakeblackshearOP10mo ago
i get the same message even if i generate an api key in settings
blakeblackshear
blakeblackshearOP10mo ago
No description
blakeblackshear
blakeblackshearOP10mo ago
i already deleted that api key from my acct
nathaniel
nathaniel10mo ago
Did you make a new one? It obviously won’t work if you have an empty string there
blakeblackshear
blakeblackshearOP10mo ago
yea. i generated a new one with read&write ran the commands in that screenshot you can see it did update the config file but i still got a 401
nathaniel
nathaniel10mo ago
Ok never mind are some of these pods created on community cloud and some on secure cloud?
blakeblackshear
blakeblackshearOP10mo ago
all secure cloud
nathaniel
nathaniel10mo ago
ok, because the fact that you’re not seeing a way to do this on the ui makes me think maybe the 401 is deliberate. And you’re stopping but not terminating the pods
blakeblackshear
blakeblackshearOP10mo ago
right i can terminate in the UI but not stop nor can i in the web console
nathaniel
nathaniel10mo ago
Runpodctl is a pretty old tool and our offerings have changed since we updated this part of runpodctl so it’s possible this is a deprecated piece of functionality that slipped through. I know that’s probably not what you want to hear but I think it’s true. Will check the permissions on stop pod action to see why it wouldn’t always deny I was mistaken Stopping but not terminating I mean
blakeblackshear
blakeblackshearOP10mo ago
ok. this would have been a very recent change. its been running for months just fine
nathaniel
nathaniel10mo ago
that's not ~/.runpod/config.toml, that's ~/.runpod.yaml
blakeblackshear
blakeblackshearOP10mo ago
right
nathaniel
nathaniel10mo ago
are you running this from inside a pod, so using an older version before the migration?
blakeblackshear
blakeblackshearOP10mo ago
~/.runpod/config.toml doesnt exist yea. im inside a pod maybe these pods were created prior to the migration
nathaniel
nathaniel10mo ago
yep the version of runpodctl inside there is before the migration. sorry for confusion there
Solution
blakeblackshear
blakeblackshear10mo ago
ok. so any pods created before the migration will fail when stopping via runpodctl
blakeblackshear
blakeblackshearOP10mo ago
so if i delete my pods and recreate, i should be ok
nathaniel
nathaniel10mo ago
the migration just affected where the api key config file is written to
blakeblackshear
blakeblackshearOP10mo ago
i have a job finishing in a few minutes on a new pod this pod doesnt have either
blakeblackshear
blakeblackshearOP10mo ago
No description
nathaniel
nathaniel10mo ago
are you trying to stop the pod from the web terminal inside the same pod?
blakeblackshear
blakeblackshearOP10mo ago
i was just seeing if the config was there but yes my CMD script runs runpod stop pod at the end
nathaniel
nathaniel10mo ago
I see. All runpodctl does is issue commands to runpod api, it does not need to be run from inside the same pod if that helps. but it sounds like it's deliberately done from inside the pod
blakeblackshear
blakeblackshearOP10mo ago
yea it is i dont know when the job finishes from outside so maybe pods no longer have permissions to stop themselves if this doesnt work on a new pod when i tried before, the pod just restarts when exiting the CMD script regardless of the exit code is that still the case? new pod seemed to exit fine so i think if i delete all my old pods it will resolve itself
nathaniel
nathaniel10mo ago
@Justin Merrell thinks the api key used inside a pod is injected from the environment variables to the config yaml or toml is not necessary inside a pod
blakeblackshear
blakeblackshearOP10mo ago
makes sense either way. i just terminated all my pods and am spinning up new ones seems like 4090s are super scarce all of a sudden
Justin Merrell
Justin Merrell10mo ago
In a pod check if echo $RUNPOD_API_KEY returns an API key. Running runpodctl stop pod $RUNPOD_API_KEY should work without needing to setup the API as long as you are running this inside a pod.
blakeblackshear
blakeblackshearOP10mo ago
that env var is set looks like the jobs on new pods are completing so i think this is resolved
nathaniel
nathaniel9mo ago
cool, I'm glad you got it, sorry for the wild goose chase earlier
ashleyk
ashleyk9mo ago
@Justin Merrell looks like @blakeblackshear exposed their API key above, mind deleting those messages please, @blakeblackshear I suggest deleting that API key and creating a new one.
blakeblackshear
blakeblackshearOP9mo ago
@ashleyk I deleted the api key before posting the screenshot
frunzzealt
frunzzealt7mo ago
No description
digigoblin
digigoblin7mo ago
This is not related to to this thread, its a completely different issue, you need to authenticate with CivitAI.
nerdylive
nerdylive7mo ago
yay
Want results from more Discord servers?
Add your server