Are Pods good for batch inference of encoders?
Hello all,
I want to deploy an encoder. Think of BERT model (like "bert-base-uncased" on huggingface) with some aggregation header, such as, predicting class probabilities. However, I do not want to use that model in real-time but for batch inference.
Typical scenario: I need to predictor for 1 Mio records within 10 mins. I need my GPU nodes to scale up from 0 to n, process those mio records stored on cloud storage, create predictions on cloud storage, and scale down from n to 0.
I accomplished this with Azure ML using ther batch inference endpoints. It was a horrible experience for many reasons, including time. My question: would RunPod be a great fit for such a use case?
Thanks,
Paul
Solution:Jump to solution
I'd suggest reading about skypilot, since runpod doesn't provide like a platform for batch inference
2 Replies