Delay Time
Hello,
I'm wondering if those Delay Time are normal?
If not, what should I do?
36 Replies
Seems pretty low to me. Depends on what your worker does.
It takes a bunch of image urls and do ML inference on them
Maybe one question, does the delay time increase if the "active and supposed to be warm" worker hasn't actually been working for a while?
it kinda feels unreliable atm
@Papa Madiator
?
I'm experiencing extreme delay time, how can I get them back to normal?
I'm not sure what are you doing and running
atm it just takes a bunch of image urls and calculate their embeddings using a specific ML model
did you bake models into docker image?
my dockerfile looks like this:
Woah the python line im not sure its caching it in runtime
run it on CMDor ENTRYPOINTfile
not on the build but on the runtime
This python line does two things:
- it downloads the model and save it somewhere
- it loads it into the ram
If I remove the line and only do that on run time, I'd have to download it each time no ?
(or maybe I misunderstood something sorry for that)
Do it on the runtime too
its fine
Yess but it loads on your build machine not on the runtime
then on the runtime its gonna be different machine so its not cached yet
Usually whats the proper way to bake models into docker images?
Any examples?
Like on the docs
It loads but on the runtime using a python file / .sh file
I would check logs when worker is starting
🚧 If your handler requires external files such as model weights, be sure to cache them into your docker image. You are striving for a completely self-contained worker that doesn't need to download or fetch external files to run.
Oh on the docs its only like that
I guess running the python line that loads the model or creates the pipeline works in handler.py works
exactly, so that's not very clear to me how to cache models in the docker image, I thought my implementation would work
Is that what you meant?
Yep
So everything put outside of the handler function will be cached in the docker image??
before the start i think
serverless.start
Yes but the documentation says "be sure to cache them into your docker image."
How to do that correctly? (the doc doesn't provide enough information I think)
Just before the start() line
so it has to be loaded before that i think
If I have no active worker, each time I'll spawn a new one, It'll dowload the model, that's not what I want, I'd like the model to be cached in the docker image so when I spawn a new worker, the model is already almost ready to use
cached means it has to be stored somewhere
also good idea is that in handler you load model to vram
Either you store it in the image or network storage first
then load it into the vram in runtime thats how you cache it
so if you want to download the model, download it into network volume
@Papa Madiator so like this is fine right?
yup
GitHub
worker-sdxl/src/rp_handler.py at main · runpod-workers/worker-sdxl
RunPod worker for Stable Diffusion XL. Contribute to runpod-workers/worker-sdxl development by creating an account on GitHub.
but this is only to load the model in VRAM, first I'd need to download it from somewhere
after reading the documentation I understand it has to be in from the cache of the image, so it has to be linked with the dockerfile somehow
ok update, I just checked the dockerfile of the project you shared
-------------
I think this is what I was looking for
Nono dont use RUN.. use CMD or Entrypoint
well I'm even more confused now... that's on the official repo
what's wrong with this dockerfile?
I don't understand this part, that's the opposite of what shown on the repo
It's loading models at build time
To the vram
What you're looking is to download models at the build time then cache the models in vram on runtime
No don't use run, use a script because run isn't gonna be run on the runtime
Or on your runpod serverless
@Minozar
1. enable snapshot mode,reduce cold start time
2. Consider speeding up model loading and replacing a model with one that loads faster
For cold start, the model will be reloaded every time, so it is normal to be slow.
enable flashboot if you haven't, if your using queue delay scale, set a lower time to scale faster