vaventt
vaventt
RRunPod
Created by vaventt on 1/9/2024 in #⛅|pods
Does GPU Cloud is suitable for deploying LLM or only for training?
I'm pretty new in RunPod, I have already build 4 endpoints on Serverless and it's pretty straight-forward for me, however I don't understand is GPU Cloud is also suitalbe for pure LLM Inferencing via API for chatbot purposers or it's only for training models and saving weights. The main question is that can I also deploy my LLM for inference on GPU Cloud for production? Where to find API on which I should make calls? Because I find Serverless very unstable for production, or maybe it's mine fault, whenever the worker starts, it choose to download again model weights, which sometimes weight 100GB+ it takes 5-15 minutes, after user will make his query he would need to wait 15 minutes for response from Serverless, while worker first downloads weights from HuggingFace and than make inference
26 replies