RunPod•12mo ago

how to host 20gb models + fastapi code on serverless

I have 20gb model files and a fastapi pipeline code to perform preprocessing and inference+ training. How can I use runpods serverless?

18 Replies

Jason•12mo ago

Create a serverless handler, make it into a docker image, make a template with your docker image that contains the serverless handler and model Or, put the model in runpod's network storage then connect it to your endpoint and use your template to access the model But fastAPI seems to be for the API part only, what about the model runner? or the software that runs the model for inferencing and for training Maybe HF Transformers or tensorflow, pytorch

RKOP•12mo ago

It's Pytorch + tensorflow

Jason•12mo ago

alright then you can create a docker image template with a serverless handler that executes those codes on gpu

Jason•12mo ago

Read: https://docs.runpod.io/category/workers And : https://docs.runpod.io/category/endpoints

Workers | RunPod Documentation

Build your LLM with serverless workers.

Endpoints | RunPod Documentation

Learn how to customize the serverless functions used by in your applications.

RKOP•12mo ago

Do I have to dockerize the models with code?.. The docker image is around 50GB

Jason•12mo ago

Yes that works But bulkier image will slow the runtime i think

RKOP•12mo ago

Yeah.. that's why thought to keep model out of docker

Jason•12mo ago

Theres an alternative to that by putting the model in network storage and accessing it through the endpoint it will be mounted in /runpod-volume like in the docs

RKOP•12mo ago

Perfect that sounds a plan Thanks Can u share links for the network storage access and deploy too plz

Jason•12mo ago

alright but try using the search or the AI in bottom right next time its cool

Jason•12mo ago

https://docs.runpod.io/serverless/endpoints/manage-endpoints#add-a-network-volume

Manage Endpoints | RunPod Documentation

Learn to create, edit, and manage Serverless Endpoints, including adding network volumes and setting GPU prioritization, with step-by-step guides and tutorials.

Jason•12mo ago

https://docs.runpod.io/serverless/references/endpoint-configurations#select-network-volume

Endpoint configurations | RunPod Documentation

Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.

Jason•12mo ago

Goodluck on building bro

digigoblin•12mo ago

Don't use fastAPI on serverless, its already an API layer.

Neuraldivergent•12mo ago

and don't put your models in your image most likely, def use network drives

Jason•12mo ago

Why not

Neuraldivergent•12mo ago

just several reasons to prefer smaller images, it will work but with a lot of overhead

digigoblin•12mo ago

Network drives are about 1 million percent slower than baking things into the image, so I don't know why you are saying its better, because you are wrong. Its always better to bake a model into the image wherever possible every single time, you should only use the garbage network storage if you absoultely have to.

Gaming

Programming

how to host 20gb models + fastapi code on serverless

Did you find this page helpful?