ichabodcole
RRunPod
•Created by ichabodcole on 3/17/2024 in #⚡|serverless
Can multiple models be queried using the vllm serverless worker?
Just getting started with the vllm serverless worker and my first question is can I query multiple models via a single vllm serverless endpoint, or is it only possible to query one model per endpoint? If mutiple is possible are there any special steps to get it to work?
If the answer to the above is no (it's 1 endpoint per model), is it recommended to use a 1 to 1 serverless endpoint to network volume?
FYI, I tried pre-loading some models onto my volume, but my serverless endpoint could not find any of them other than the one explicitly loaded via the vllm MODEL_NAME env var, so not sure if I'm just missing something or that is a limitation.
2 replies