R
RunPod2mo ago
RK

how to host 20gb models + fastapi code on serverless

I have 20gb model files and a fastapi pipeline code to perform preprocessing and inference+ training. How can I use runpods serverless?
18 Replies
nerdylive
nerdylive2mo ago
Create a serverless handler, make it into a docker image, make a template with your docker image that contains the serverless handler and model Or, put the model in runpod's network storage then connect it to your endpoint and use your template to access the model But fastAPI seems to be for the API part only, what about the model runner? or the software that runs the model for inferencing and for training Maybe HF Transformers or tensorflow, pytorch
RK
RK2mo ago
It's Pytorch + tensorflow
nerdylive
nerdylive2mo ago
alright then you can create a docker image template with a serverless handler that executes those codes on gpu
nerdylive
nerdylive2mo ago
Workers | RunPod Documentation
Build your LLM with serverless workers.
Endpoints | RunPod Documentation
Learn how to customize the serverless functions used by in your applications.
RK
RK2mo ago
Do I have to dockerize the models with code?.. The docker image is around 50GB
nerdylive
nerdylive2mo ago
Yes that works But bulkier image will slow the runtime i think
RK
RK2mo ago
Yeah.. that's why thought to keep model out of docker
nerdylive
nerdylive2mo ago
Theres an alternative to that by putting the model in network storage and accessing it through the endpoint it will be mounted in /runpod-volume like in the docs
RK
RK2mo ago
Perfect that sounds a plan Thanks Can u share links for the network storage access and deploy too plz
nerdylive
nerdylive2mo ago
alright but try using the search or the AI in bottom right next time its cool
nerdylive
nerdylive2mo ago
Manage Endpoints | RunPod Documentation
Learn to create, edit, and manage Serverless Endpoints, including adding network volumes and setting GPU prioritization, with step-by-step guides and tutorials.
nerdylive
nerdylive2mo ago
Endpoint configurations | RunPod Documentation
Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.
nerdylive
nerdylive2mo ago
Goodluck on building bro
digigoblin
digigoblin2mo ago
Don't use fastAPI on serverless, its already an API layer.
Neuraldivergent
Neuraldivergent2mo ago
and don't put your models in your image most likely, def use network drives
nerdylive
nerdylive2mo ago
Why not
Neuraldivergent
Neuraldivergent2mo ago
just several reasons to prefer smaller images, it will work but with a lot of overhead
digigoblin
digigoblin2mo ago
Network drives are about 1 million percent slower than baking things into the image, so I don't know why you are saying its better, because you are wrong. Its always better to bake a model into the image wherever possible every single time, you should only use the garbage network storage if you absoultely have to.