Serverless docker image deployment
Hi,
I finetuned a lora from llama 3.2 3B using unsloth. and want to deploy that on serverless.
Using vLLM with merged model degrades the performance too much to be of use. I then, followed instructions from this link https://github.com/runpod-workers/worker-template/tree/main and created a serverless endpoint using the docker image. but it keeps on initializing and does not complete one job. job remains in queue.
I might be missing something. I also don't have much experience with docker. I might be making a mistake there. But I did test the docker locally before deploying. I would appreciate any help regarding this.
GitHub
GitHub - runpod-workers/worker-template: 🚀 | A simple worker that c...
🚀 | A simple worker that can be used as a starting point to build your own custom RunPod Endpoint API worker. - runpod-workers/worker-template
3 Replies
Did you upload it on hugging face?
so how did you deploy it (that didnt work last time)?
how do you want to deploy it?
yes, uploaded the lora on hugging face. and base model by unsloth is already on hugging face.
Last time I deployed it by merging the lora weights with base model and then uploading that on hugging face and them deploying using vLLM ui on serverless. but the performance of merged model is not good.
I want to deploy it in any way I can. I thought using docker image was the only option of using lora as it is, without merging it in base model.
Oh hmm I'm not sure how to configure it on vllm worker on runpod but I'm guessing you can download both vllm and the Lora in your image for serverless vllm
Then use vllm args to configure your lora