how to host 20gb models + fastapi code on serverless
I have 20gb model files and a fastapi pipeline code to perform preprocessing and inference+ training.
How can I use runpods serverless?
18 Replies
Create a serverless handler, make it into a docker image, make a template with your docker image that contains the serverless handler and model
Or, put the model in runpod's network storage then connect it to your endpoint and use your template to access the model
But fastAPI seems to be for the API part only, what about the model runner? or the software that runs the model for inferencing and for training
Maybe HF Transformers or tensorflow, pytorch
It's Pytorch + tensorflow
alright then you can create a docker image template with a serverless handler that executes those codes on gpu
Workers | RunPod Documentation
Build your LLM with serverless workers.
Endpoints | RunPod Documentation
Learn how to customize the serverless functions used by in your applications.
Do I have to dockerize the models with code?..
The docker image is around 50GB
Yes that works
But bulkier image will slow the runtime i think
Yeah.. that's why thought to keep model out of docker
Theres an alternative to that
by putting the model in network storage and accessing it through the endpoint
it will be mounted in /runpod-volume like in the docs
Perfect that sounds a plan
Thanks
Can u share links for the network storage access and deploy too plz
alright but try using the search or the AI in bottom right next time its cool
Manage Endpoints | RunPod Documentation
Learn to create, edit, and manage Serverless Endpoints, including adding network volumes and setting GPU prioritization, with step-by-step guides and tutorials.
Endpoint configurations | RunPod Documentation
Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.
Goodluck on building bro
Don't use fastAPI on serverless, its already an API layer.
and don't put your models in your image most likely, def use network drives
Why not
just several reasons to prefer smaller images, it will work but with a lot of overhead
Network drives are about 1 million percent slower than baking things into the image, so I don't know why you are saying its better, because you are wrong.
Its always better to bake a model into the image wherever possible every single time, you should only use the garbage network storage if you absoultely have to.