kinsvater
kinsvater
RRunPod
Created by kinsvater on 5/27/2024 in #⛅|pods
Are Pods good for batch inference of encoders?
Hello all, I want to deploy an encoder. Think of BERT model (like "bert-base-uncased" on huggingface) with some aggregation header, such as, predicting class probabilities. However, I do not want to use that model in real-time but for batch inference. Typical scenario: I need to predictor for 1 Mio records within 10 mins. I need my GPU nodes to scale up from 0 to n, process those mio records stored on cloud storage, create predictions on cloud storage, and scale down from n to 0. I accomplished this with Azure ML using ther batch inference endpoints. It was a horrible experience for many reasons, including time. My question: would RunPod be a great fit for such a use case? Thanks, Paul
5 replies