Custom Template Taking Hours To Initialize
I made a custom template with the docker template https://huggingface.co/spaces/rwitz/go-bruins-v2 and it is taking hours to initialize on serverless.
25 Replies
Wait I am confused what you are trying to do?
You are trying to initialize this on serverless? or is this a docker building?
Im just confused cause this by itself already has issues
@Ryan Witzman (rwitz) What is your min / max workers
there should be workers
do u have:
1) your runpod template that you used to start your serverless
2) what is this?
3) do you have your docker file
min workers is 0 and max is 1
yeah its on huggingface
yes
Can I see your docker file?
And also... Im not sure if hugging face holds docker images?
I could be wrong
For #1, there is something on runpod you used to create it
what i mean is this
oh
Ah interesting
I learnt something new
Okay, so just as an FYI:
I recommend trying to build a template using:
FROM runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel
RUN pip install runpod transformers
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN python -c 'from transformers import pipeline; import torch; pipe = pipeline("text-generation", model="rwitz/go-bruins-v2",device=0,torch_dtype=torch.bfloat16)'
ADD handler.py .
And don't do the run command, and test it using GPU Pod 🙂 it always is a great place to test.
2) https://huggingface.co/spaces/rwitz/go-bruins-v2
Your build failed
So im guessing that your serverless function
is trying to find something
that doesn't exist
yeah because the huggingface doesnt have cuda
the serverless should have cuda
Serverless does have cuda, but if your docker build failed, it won't produce an image for serverless to use
that means in your docker build step, you gotta maybe default to using CPU to load the model
vs a torch cuda
so that the image builds successfully
do you see the :latest tag somewhere tho?
just curious, never used hugging face myself
to host an image registry
What is happening is when you run the build command:
1. It sets up an isolated repeatable environment
2. If it fails to set up that isolated environment it won't produce an "image" for other environment to start off of
So #2 is what is happening.
3. Gotta download the models without using like torch.cuda etc.
Ok so I will have the docker download the model instead of instantiated it
yes
can prob just use a curl request or somethign to the models directly
Also, ive never used hugging face as a docker registry before, but just make it public? i guess haha if not already. im guessing there might be issues if its a private repository, but ive never used it before
i usually just use dockerhub
gl gl! 🙂
yes
I also recommend again:
if you end up wanting to test it on a GPU Pod, vs serverless, i always find that an easier validation step
then in your serverless, you can do as you did, where you overrode the CMD command
or you can have a second docker file that does:
Which would make your future iterations of handler.py super fast
Since all you got is now a base image with all the models - not needing to rebuild them every time, and all u gotta do is add your new handler.py and stuff
ah i see
Yup yup~ is a new thing i learnt last week haha, cause i have an audio sound effect stuff, that I was playing with some big models
and i was getting painful to keep downloading the model
on every iteration haha
so small fun tip 😄
@justin it says it's running but when i look at the logs it is still downloading the image and at a terribly slow speed
Just as a side thing, try to set your max workers to three, 🙂 you won't pay for additional workers unless they are actively being utilized
Second thing is I can't say anythign about download speed tbh
Sometimes I find that it is slow too, but I am not too sure as to what causes it. Usually other than one instance, been pretty fast for me, but i usually use dockerhub
It also depends how big your model is
this doesn't seem too big of a model tho, so maybe should take about 5-10 minutes
its 14gb
is my experience
yea
i will set to three though
yeah
sometimes i find that setting it to three, u sometimes get a bit better worker
who downloads faster
i find it works a bit wonky with one
ah that made it so much quicker thanks