R
RunPod•5mo ago
BadNoise

Deploy BART on serverless

Hi! Does anyone know how to deploy bart-large-mnli on serverless? Been trying with the hugging face template (ghcr.io/huggingface/text-generation-inference) but I always get "Error: ShardCannotStart" I already tried setting NUM_SHARD = 1 in env but still failing. Repo for reference: https://huggingface.co/facebook/bart-large-mnli Let me know if you need further details! Thank you 🙂
6 Replies
digigoblin
digigoblin•5mo ago
Why doo you want to use that template? Just create your own handler and copy and paste the code from Hugging Face into the handler function. That docker image is probably for a pod, it won't work in serverless without a RunPod serverless handler.
BadNoise
BadNoiseOP•5mo ago
ok thank you! and do you think it can still handle concurrency? because that's my main concern for example writing my own handler and sending 10 requests simultaneously, do I have to wait for the previous one to complete? (of course independently on the gpu that I am using)
digigoblin
digigoblin•5mo ago
Don't know much about how transformers library works, but serverless can handle multiple concurrent requests depending on your max worker count You don't have to wait for the previous request to complete if you have multiple workers and a decent scaling policy configured on your endpoint
BadNoise
BadNoiseOP•5mo ago
okok I'll try it, because for example using the VLLM template even with 1 worker I can handle more than 1 request per time without scaling it but I will try deploying it with the custom handler thank you again!
digigoblin
digigoblin•5mo ago
Yeah vllm engine can handle that which is why the vllm worker can do it too.
digigoblin
digigoblin•5mo ago
I see someone actually logged an issue for vllm to add support for it: https://github.com/vllm-project/vllm/issues/5985
GitHub
[New Model]: support for BartForSequenceClassification · Issue #598...
The model to consider. Hi! Is there any plan on supporting https://huggingface.co/facebook/bart-large-mnli on vllm? When I try to run it it says "BartForSequenceClassification" is not sup...
Want results from more Discord servers?
Add your server