RunPod•10mo ago

Deploy BART on serverless

Hi! Does anyone know how to deploy bart-large-mnli on serverless? Been trying with the hugging face template (ghcr.io/huggingface/text-generation-inference) but I always get "Error: ShardCannotStart" I already tried setting NUM_SHARD = 1 in env but still failing. Repo for reference: https://huggingface.co/facebook/bart-large-mnli Let me know if you need further details! Thank you 🙂

facebook/bart-large-mnli · Hugging Face

6 Replies

digigoblin•10mo ago

Why doo you want to use that template? Just create your own handler and copy and paste the code from Hugging Face into the handler function. That docker image is probably for a pod, it won't work in serverless without a RunPod serverless handler.

BadNoiseOP•10mo ago

ok thank you! and do you think it can still handle concurrency? because that's my main concern for example writing my own handler and sending 10 requests simultaneously, do I have to wait for the previous one to complete? (of course independently on the gpu that I am using)

digigoblin•10mo ago

Don't know much about how transformers library works, but serverless can handle multiple concurrent requests depending on your max worker count You don't have to wait for the previous request to complete if you have multiple workers and a decent scaling policy configured on your endpoint

BadNoiseOP•10mo ago

okok I'll try it, because for example using the VLLM template even with 1 worker I can handle more than 1 request per time without scaling it but I will try deploying it with the custom handler thank you again!

digigoblin•10mo ago

Yeah vllm engine can handle that which is why the vllm worker can do it too.

digigoblin•10mo ago

I see someone actually logged an issue for vllm to add support for it: https://github.com/vllm-project/vllm/issues/5985

GitHub

[New Model]: support for BartForSequenceClassification · Issue #598...

The model to consider. Hi! Is there any plan on supporting https://huggingface.co/facebook/bart-large-mnli on vllm? When I try to run it it says "BartForSequenceClassification" is not sup...

Gaming

Programming

Deploy BART on serverless

Did you find this page helpful?