Active worker keeps downloading images and Im being charged for it
why is it that a worker will finish downloading, extracting, and initializing--then get into a 'worker is ready' state to only go back to downloading when it receives a job? Its just wasting credits at this point...and fairly frustrating.
57 Replies
can u share ur template screenshot? if ur okay with that?
its a standard template as far as im aware. The image itself is ~30GB
Ah ok nvm guess is private then. does ur container image have a tag? just to confirm?
And is the platform on dockerhub indicating amd64?
like should be username/image:1.0 or something
ur container disk can be like 50gb then too
yeah its on dockerhub. Tbh Im not too concerned about it being public:
teamclashofficial/dot-diff-comfy:latest
hm
just wondering why it has to redownload every other moment. For example, one that was just idle for a minute had to do a complete redownload. Would a network drive be more suitable depsite the less available GPUs?
It doesnt, this might be a runpod issue let me check my stuff
Refreshing ur page doesnt show they are idle?
let me try to deploy ur stuff so i can see too
yeah when I refresh at times, it'll show like 2 are idle--then when I push a job to it, I'll check and they'll be downloading in an active state
k
Maybe try to delete the endpoint and remake it? :/ ive seen the active downloading thing before myself
I'll try now
yea im deploying my own endpoint again template to see if it also ur template vs a wide issue
will lyk
Seems like a bug, only max workers should redownload the docker image, not active workers.
Do u mind to share
ur dockerfile?
are u calling the python handler?
i can replicate it
just deleted and deployed a new endpoint. Here's the id so you can monitor:
07o7r7hqcd22z1
it's set to 1 active and 3 workers
nah no need set to 0,0 this might be a flash thing to ask
yeah
testing my own templates now
to see if its rhe same
k
is ur start.sh calling python handler? or what is it calling
its calling a custom server.py file that has the runpod handler
Start the handler only if this script is run directly
if name == "main":
runpod.serverless.start({"handler": handler})
try to get rid of this if check
i think this is causing a bug
k
where runpod isnt catching ur handler
urgh runpod error checking rlly sucks
i wish there better error debugging
lol true. This may take a while as I'll have to rebuild
no worries
why dont u try
a fake one for now
Its not, this is normal in Python, I do it all the time and never had an issue like this.
hm
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
Got it just my guess
also maybe use ur built image as the base
and just copy ur handler.py over it
Maybe flash can help then
Its most likely just a bug with the serverless handling of active workers and treating them like max workers, there is nothing wrong with the code, image etc.
Best for @flash-singh to advise, he already asked for endpoint id in #🎤|general
Got it, will leave to @flash-singh , and i guess share ur current endpoint @black_zero6641
forwarded the id to him in general
I think keep to 0-0 so ur not burning cash
thank you for the help. Its much appreciated
i do find it weird that it’s replicable with ur image tho / not my others ones which is why i thought maybe something inherent to the image
yeah I thought I was going crazy for a sec lmao
Also avoid using
latest
as tag, its best practice to use a version tag, but thats most likely not the cause of the issue.it may end up being related to the image if its not happening to others.
will do
By the way this is also not the correct way of handling errors:
justinwlin/runpodwhisperx:1.4
https://github.com/justinwlin/runpodWhisperx
GitHub
GitHub - justinwlin/runpodWhisperx: Runpod WhisperX Docker Containe...
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
an ex my template
Correct way of handling errors and causing the job to fail:
If the error is a string:
If its a list or dict:
Its important to note that
error
key can only handle string and not list or dict.a another one
GitHub
GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
got it. I'll do some quick reformatting to the file. Im guessing this is for clearer error logging on the runpod side?
(in case u wanted reference)
By the way for version numbers, I recommend semantic versioning, not arb version numbers:
https://semver.org/
thanks! One question I have though is if it would be better to attempt to split the image up into specific smaller domains for faster startup time (I think Im limited to 5)? Im not sure if runpod caches the images to avoid the downloading issue.
got it. Honestly I should have just asked questions here sooner. Would have caused less headaches 😂
They are supposed to cache
my workers dont refresh
i recommend always have 2 max workers minimum, preferably three, and runpod will spin up 5 idles for u (maybe 1-2 throttled)
but gives u more workers to get to download ur image and work, they still honor the max workers at any given time tho
But i do think if u wanna sanity check urself
this tutorial is a good sanity check
if ur getting diverging behavior, especially cause it so consistent on ur image
i feel something is wrong but i honestly cannot fathom a guess
anyways hopefully Flash can help out
Sounds like the issue is due to pushing a new release to the
latest
tag.thats what Im going through now. Also just specified a tag and removed latest
the pod is downloading the new tag now, so I should be able to confirm in a few minutes
I think you called it. Did more tests and now Idle/Initializing pods go right to startup instead of downloading. You're the 🐐
Guess that answers my never answered question before too xD
https://discord.com/channels/912829806415085598/1208257003131113502
👁️
Not me, thank @flash-singh , he nailed it.
Wonder how come i was getting an infinite download too tho interesting
weird weird
but as long it working now