LocalAI Deployment
Hello RunPod Team, I'm considering your platform for deploying an AI model and have some questions.
My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai.
Could you guide me on how to set up this functionality on your platform?
Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use.
Thank you!
LocalAI :: LocalAI documentation
Documentation for LocalAI
Solution:Jump to solution
What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py.
U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime...
5 Replies
Solution
What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py.
U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime
https://github.com/justinwlin/runpodWhisperx/blob/master/Dockerfile
This one isnt using a runpod as a base but can get the idea
GitHub
runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
this is me doing another one dividing it into two. one is for a gpu persistent service runpod has so i can debug with the baked in model. the other is for me using it for serverless
gpu pod:
Use the updated base CUDA image
FROM runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04
WORKDIR /app
Best practices for minimizing layer size and avoiding cache issues
RUN apt-get update && \
apt-get install -y --no-install-recommends ffmpeg && \
rm -rf /var/lib/apt/lists/* && \
pip install --no-cache-dir torch==2.1.2 torchvision torchaudio xformers audiocraft firebase-rest-api==1.11.0 noisereduce==3.0.0 runpod
COPY preloadModel.py /app/preloadModel.py
COPY handler.py /app/handler.py
COPY firebase_credentials.json /app/firebase_credentials.json
COPY suprepo /app/suprepo
RUN python /app/preloadModel.py
Then this is the serverless one:
Use the updated base CUDA image
FROM justinwlin/audiocraft_runpod_gpu:1.0
WORKDIR /app
COPY handler.py /app/handler.py
Set Stop signal and CMD
STOPSIGNAL SIGINT
CMD ["python", "-u", "handler.py"]
If u want to, u can build and test ur docker image LOCALLY before ever purchasing runpod credit to make sure ur template works as expected
runpod has a test locally section in docs
Oh I think I get it, I need to build a docker image which will run the API, it should be builded with a model I choose, and then the handler will simply make calls to the API.
I was thinking to use localai for that, because it has built-in support with enforcing grammer(json format), maybe you can advise me, should I use localai or a different tool you know?
Thanks 🙂
Rlly depends what u wanna do
if u have a specific model usually they have instructions how to run it
@eldoo7100 So my recommendation is if u want:
1) deposit 10 bucks on runpod if u want to risk using it (or test locally if u can)
2) Use a gpu pod, and start up a pytorch template, or use ur own locally again
3) record the steps u need to get ur code running. And then build ur stuff from that
that is how i came up with this audiocraft one
by using a runpod base image on their website, going on the web terminal / jupyter lab and playing around with it
(make sure to terminate pod when done, or else ull be charged for running pods)
again all this can be done locally as long as ur computer / model / code supports it. i cannot say tho bc idk what ur doing / i prob dont have specific knowledge
as i just use runpod for my own personal projects