deanQ
deanQ
RRunPod
Created by spooky on 10/30/2024 in #⚡|serverless
jobs queued for minuets despite lots of available idle worker
Rather than just the endpoint id, can you also tell us the request id? That would give us a better way to trace. Thanks
21 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Yes. When you’re testing these with longer-running jobs, do they block like they are sync requests?
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Despite that, are you able to send requests to the /openai/v1 path?
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
We refer you to that page because that’s where the answers are. Sending the requests to that specific path to your endpoint is the answer to your question. Have you tried it? Are you having issues with vllm blocking sync? It doesn’t do that. Our requests to vllm will always be async. You’ll notice that when you make requests to a vllm endpoint on our serverless. There is nothing we’re doing that changes that. We’re essentially proxying the requests. When you send sync requests, they come back immediately. It’s non-blocking due to vllm’s multi-concurrent processing. If there are issues with this, please file a support ticket so we can help you. CS will be asking for additional information that shouldn’t be divulged here on a public forum.
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
I see. Yes, the endpoint can be treated like an openai server. Your endpoint should have a value for OPENAI BASE URL You just have to make sure you send it to that path like so...
import openai
import asyncio

async def run():
runpod_endpoint_id = "vllm-1234567890"
runpod_api_key = "xxxxxxxxxxxx"
runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

openai_client = openai.AsyncOpenAI(
base_url=runpod_base_url,
api_key=runpod_api_key,
)

completion = openai_client.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={
"guided_choice": ["positive", "negative"]
},
)

return await asyncio.create_task(completion)
import openai
import asyncio

async def run():
runpod_endpoint_id = "vllm-1234567890"
runpod_api_key = "xxxxxxxxxxxx"
runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

openai_client = openai.AsyncOpenAI(
base_url=runpod_base_url,
api_key=runpod_api_key,
)

completion = openai_client.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={
"guided_choice": ["positive", "negative"]
},
)

return await asyncio.create_task(completion)
For more information, please go through https://github.com/runpod-workers/worker-vllm/
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
I’m not sure if I’m missing something here. “input” is what you provide us. We take anything inside that and put it inside “openai_input”.
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
"input" is key here
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Just {"input":{"model": "...", "prompt": "..."}} to pass to /openai/* and that essentially gets passed to vllm as {"openai_input": {"model": "...", "prompt": "..."}, "openai_route": {}}
47 replies
RRunPod
Created by 3WaD on 10/11/2024 in #⚡|serverless
OpenAI Serverless Endpoint Docs
Can someone please confirm that this works in custom images too?
Yes, our serverless API transforms input to openai_route + openai_input as long as you send the request to /openai/*
Is there any way I can develop this locally?
This happens on our platform only. As of now, there is nothing in the SDK to simulate this during local development.
47 replies
RRunPod
Created by vitalik on 10/10/2024 in #⚡|serverless
Job retry after successful run
SDK 1.7.4 has been released. Thank you for your patience.
27 replies
RRunPod
Created by zfmoodydub on 10/24/2024 in #⚡|serverless
Worker frozen during long running process
SDK 1.7.4 has been released. Thank you for your patience.
38 replies
RRunPod
Created by Keffisor21 on 10/3/2024 in #⚡|serverless
Job timeout constantly (bug?)
Hi. Please file a support ticket and mention this thread so that you can share more info that would help us determine what's going on and how to fix it. Feel free to mention me on your tickets. Thank you.
23 replies
RRunPod
Created by rougsig on 9/27/2024 in #⚡|serverless
Stuck IN_PROGRESS but job completed and worker exited
I was referring to this "payload_size_bytes": 0 <-- seems sus? It's going to always be zero for GET requests. Payload only exists for post or put requests.
17 replies
RRunPod
Created by rougsig on 9/27/2024 in #⚡|serverless
Stuck IN_PROGRESS but job completed and worker exited
not the best of logs. That field actually refers to the size of the body payload on post or put requests. Get requests have none.
17 replies
RRunPod
Created by rougsig on 9/27/2024 in #⚡|serverless
Stuck IN_PROGRESS but job completed and worker exited
I have looked at the logs of your endpoint 74jm2u3liu0pcy. It still says it's using 1.6.2 all week. Could it be a different endpoint ID?
17 replies
RRunPod
Created by rougsig on 9/27/2024 in #⚡|serverless
Stuck IN_PROGRESS but job completed and worker exited
We recently fixed a bug and released it on 1.7.2. The bug caused our platform to disregard workers that are currently working a job. So if a job took longer than an endpoint's idle time (for example) it would put that worker to sleep. By the time the job is finished, it would have no worker to report back to.
17 replies
RRunPod
Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
This is exactly where flash-boot should help. I’ll investigate what I can about this.
56 replies
RRunPod
Created by Keffisor21 on 10/3/2024 in #⚡|serverless
Job timeout constantly (bug?)
1.7.2 is officially the latest release as of today
23 replies
RRunPod
Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
With a setup like this, you will face cold start issues. For example, if you have burst consecutive jobs coming in, workers will stay alive and take those jobs. The moment a second or two have a gap without a job then your workers will go to sleep. Any job that comes in after that will have to wait in queue until a worker is ready. And by ready I mean, flash-booted or fully booted as a new worker. Extra few seconds will not cost you more, and will guarantee quick job takes between the gaps. Incurring cold start and boot times will end up costing you more time in total.
56 replies
RRunPod
Created by Keffisor21 on 10/3/2024 in #⚡|serverless
Job timeout constantly (bug?)
FYI: v1.7.2 is on pre-release while I do some final tests https://github.com/runpod/runpod-python/releases/tag/1.7.2
23 replies