3WaD
3WaD
RRunPod
Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
Then it would work the same since the Flashboot does not take any effect with the Ray. That's how I meant it. VLLM has two possible distributed executor backends - Ray or MultiProcessing which are needed if you want to use VLLM's continuous batching and RunPod worker concurrently.
13 replies
RRunPod
Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
The Flashboot just doesn't seem to work with the Ray distributed executor backend as I see now. This makes sense I guess. It's overkill for single-node inference anyway so I'll stick to the MP which works. But good to know. I'll try to discourage everyone from using it with my custom image.
13 replies
RRunPod
Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
Nope
13 replies
RRunPod
Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
It's the official VLLM selected in the RunPod dashboard. I added only the model name and used Ray. Otherwise, everything should be the default
13 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
I am finally back home, so I can test this myself. And for anyone coming here in the future and wondering what's the answer to this simple question: Yes, you can send non-nested payloads to api.runpod.ai/v2/<ENDPOINT ID>/openai/* path that is using any custom async handler or software internally, and it will be available in the handler params. That means when you send {"foo":"bar"} to .../openai/abc:
async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})
async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})
I really wish this would be mentioned somewhere in the official docs, in the ask-ai knowledge base, or at least widely known to the team when asked. But thank you anyways.
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Yes. We're asking if it's possible to send the json without wrapping it with "input" or not. Because that's how openAI standard requires it.
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Passed to vllm? What if there's no vllm? I'll put it simply - when I send {"foo":"bar"} to https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc, will ANY handler function receive the payload (input and path) so we can work with it further or it's not possible on RunPod?
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Ahhh. So it's not
{"input": {"openai_input": {}, "openai_route": {}}}
{"input": {"openai_input": {}, "openai_route": {}}}
but just
{"openai_input": {}, "openai_route": {}}
{"openai_input": {}, "openai_route": {}}
Thank you very much for the confirmation!
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Thank you
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Just to make sure - is this the correct place to reach someone from RunPod who knows about their endpoints and might have a short definite answer to this question? I appreciate the input and your time so far guys, but to summarize it so far, I've got a link to the official repo using the thing I am asking about, and got told to "hack around and find out" 😆
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Uhmm, no. I am asking because I want to build a custom image, not a fork of the vLLM image. Do you know which part of that source code is making the path work exactly?
47 replies
RRunPod
Created by DEOGEE on 10/22/2024 in #⚡|serverless
Thinking of using RunPod
2 replies
RRunPod
Created by Orca234 on 10/21/2024 in #⚡|serverless
Batch processing of chats
Perhaps you're looking for vLLM continuous batching and RunPod Concurrent Handler ?
4 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
I've only tested that calling /openai/ path does indeed produce different responses even on my existing non-openai endpoints. Response with /openai/... path:
{"error": "Error processing the request"}
{"error": "Error processing the request"}
Response with any other path (e.g. /testpath/):
404 page not found
404 page not found
Based on this behaviour, I started writing my worker code. I hope I'll be able to test it soon and that it will work.
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Yes. I've read through that source code before asking. Based on that, it should be how I wrote it. But someone also mentioned it might be allowed just for that official worker, so I wanted to make sure the /openai path also works for custom images before I write the whole code around it. Does it also mean we can send any raw "non-wrapped" payload to that endpoint even when it's not openai related? It should pass any content to the "openai_input" and any route after to the "openai_route", right? Having a setting or another documented endpoint on serverless allowing us to send raw payloads would solve such problems with predefined APIs.
47 replies