3WaD
RRunPod
•Created by jackson hole on 1/8/2025 in #⚡|serverless
Some basic confusion about the `handlers`
Do you mean this Handler?
That's what the serverless containers (including your vLLM) are running on. When you want to develop a container image for RunPod serverless, you use their SDK and put the execution code inside the Handler functions. So you are already using it just didn't have to write it since you're using someone else's image (the VLLM template).
5 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
Well, the Flashboot 'goes away' only with the worker availability, correct? Or does it expire after some time even if the worker is still ready as idle for your endpoint? I am not sure because I am always losing the whole worker and different ones are taking its place and I don't think one ever stayed for so long to test this. But I know the cached workers are prioritised for a new job even when they are idle in the "Extra workers" group and not the "latest".
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
A high-level overview of how this would work is adding a warm-up routine to your handler that would not generate anything (or just blank/small data if the AI framework doesn't have manually callable init and generate methods), but just initialize the framework and load the models. It could be set to react to something like . The magic here is, that since Flashboot cached workers are already initialized if you send such a prewarm request to this worker, it would execute in milliseconds and just return something like , while the uninitialized workers would first load and then return it. You would then send the prewarm request as many times as needed to occupy all workers since you can't unfortunately target just the new specific one. The app code or pc utility would then periodically fetch either the /health endpoint or would web-scrape the GUI (if it's not against the TOS and depending on how complex we want to make it) for changes in workers (again, it would be ideal to have some endpoint we can subscribe to that would push data about newly assigned worker to our app but we have to work with what we have). As the changes in workers would be detected, you send the prewarm requests. Just a draft. What do you think?
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
It would ensure that all idle workers would be always pre-warmed with Flashboot. And that's what we need.
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
That being said, I'm starting to wonder if automatic worker warming is not currently the only possible solution for our cold starts problem. I am already thinking about the code in my head. It would require just adding a bit to the handler and your app server OR making a small utility service for PC, that would periodically fetch the workers to check changes and send the prewarm requests to them. Wanna cooperate on this @testymctestface ?
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
The serverless idea is made around on-demand computation that scales from 0. If I have a constant and predictable flow of traffic I can use a dedicated pod/server. Sure, I am baking all models to my images as long as I can remember. But as you see in the VLLM log, the model loading is not the problem.
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
I am talking about this also here: https://discord.com/channels/912829806415085598/1326321926469189754 Any ideas would be much appreciated
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
The ideal would be something that doesn't require additional execution cost and is more time-effective, like sharing loaded states between workers via network volume or something similar. Model loading can be optimized with special formats, but things like engine initializations are still a problem.
46 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
I am now fighting with cold-start too and automatically pre-warming the workers with sample job as soon as they spawn is not ideal, but good idea. But I guess there's no tool to programatically get new workers or even their count, neither you can send the job to the specific one.
46 replies
RRunPod
•Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
Then it would work the same since the Flashboot does not take any effect with the Ray. That's how I meant it.
VLLM has two possible distributed executor backends - Ray or MultiProcessing which are needed if you want to use VLLM's continuous batching and RunPod worker concurrently.
13 replies
RRunPod
•Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
The Flashboot just doesn't seem to work with the Ray distributed executor backend as I see now. This makes sense I guess. It's overkill for single-node inference anyway so I'll stick to the MP which works. But good to know. I'll try to discourage everyone from using it with my custom image.
13 replies
RRunPod
•Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
Nope
13 replies
RRunPod
•Created by 3WaD on 11/22/2024 in #⚡|serverless
How long does it normally take to get a response from your VLLM endpoints on RunPod?
It's the official VLLM selected in the RunPod dashboard. I added only the model name and used Ray. Otherwise, everything should be the default
13 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
I am finally back home, so I can test this myself. And for anyone coming here in the future and wondering what's the answer to this simple question:
Yes, you can send non-nested payloads to api.runpod.ai/v2/<ENDPOINT ID>/openai/* path that is using any custom async handler or software internally, and it will be available in the handler params.
That means when you send
{"foo":"bar"}
to .../openai/abc
:
I really wish this would be mentioned somewhere in the official docs, in the ask-ai knowledge base, or at least widely known to the team when asked. But thank you anyways.47 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Yes. We're asking if it's possible to send the json without wrapping it with "input" or not. Because that's how openAI standard requires it.
47 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Passed to vllm? What if there's no vllm? I'll put it simply - when I send
{"foo":"bar"}
to https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc, will ANY handler function receive the payload (input and path) so we can work with it further or it's not possible on RunPod?47 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Ahhh. So it's not but just
Thank you very much for the confirmation!
47 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Thank you
47 replies
RRunPod
•Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Just to make sure - is this the correct place to reach someone from RunPod who knows about their endpoints and might have a short definite answer to this question? I appreciate the input and your time so far guys, but to summarize it so far, I've got a link to the official repo using the thing I am asking about, and got told to "hack around and find out" 😆
47 replies