deanQ Comments - Answer Overflow

deanQ

Posts Comments

RRunPod

•Created by truman on 2/14/2025 in #⛅｜pods-clusters

Docker image infinitely restarts

Can you please provide pod ids so we can check what's happening?

4 replies

RRunPod

•Created by ab on 2/19/2025 in #⛅｜pods-clusters

Official template vllm-latest is broken

That "Container Start Command" section is the equivalent of CMD in a Dockerfile. We have a quick-deploy vLLM template. I recommend going through this documentation https://docs.runpod.io/category/vllm-endpoint

6 replies

RRunPod

•Created by blue whale on 2/11/2025 in #⚡｜serverless

Job stuck in queue and workers are sitting idle

Is anyone still experiencing this today? Please report and indicate an endpoint or worker ID. Thanks.

49 replies

RRunPod

•Created by Milad on 12/17/2024 in #⚡｜serverless

Delay time even when there are many workers available

What’s the endpoint id and version of SDK being used? Updating to the latest version should improve many issues.

2 replies

RRunPod

•Created by spooky on 10/30/2024 in #⚡｜serverless

jobs queued for minuets despite lots of available idle worker

Rather than just the endpoint id, can you also tell us the request id? That would give us a better way to trace. Thanks

21 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

Yes. When you’re testing these with longer-running jobs, do they block like they are sync requests?

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

Despite that, are you able to send requests to the /openai/v1 path?

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

We refer you to that page because that’s where the answers are. Sending the requests to that specific path to your endpoint is the answer to your question. Have you tried it? Are you having issues with vllm blocking sync? It doesn’t do that. Our requests to vllm will always be async. You’ll notice that when you make requests to a vllm endpoint on our serverless. There is nothing we’re doing that changes that. We’re essentially proxying the requests. When you send sync requests, they come back immediately. It’s non-blocking due to vllm’s multi-concurrent processing. If there are issues with this, please file a support ticket so we can help you. CS will be asking for additional information that shouldn’t be divulged here on a public forum.

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

I see. Yes, the endpoint can be treated like an openai server. Your endpoint should have a value for OPENAI BASE URL You just have to make sure you send it to that path like so...

import openai
import asyncio

async def run():
  runpod_endpoint_id = "vllm-1234567890"
  runpod_api_key = "xxxxxxxxxxxx"
  runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

  openai_client = openai.AsyncOpenAI(
      base_url=runpod_base_url,
      api_key=runpod_api_key,
  )

  completion = openai_client.completions.create(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    messages=[
      {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={
      "guided_choice": ["positive", "negative"]
    },
  )
  
  return await asyncio.create_task(completion)

import openai
import asyncio

async def run():
  runpod_endpoint_id = "vllm-1234567890"
  runpod_api_key = "xxxxxxxxxxxx"
  runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

  openai_client = openai.AsyncOpenAI(
      base_url=runpod_base_url,
      api_key=runpod_api_key,
  )

  completion = openai_client.completions.create(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    messages=[
      {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={
      "guided_choice": ["positive", "negative"]
    },
  )
  
  return await asyncio.create_task(completion)

For more information, please go through https://github.com/runpod-workers/worker-vllm/

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

I’m not sure if I’m missing something here. “input” is what you provide us. We take anything inside that and put it inside “openai_input”.

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

"input" is key here

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

Just {"input":{"model": "...", "prompt": "..."}} to pass to /openai/* and that essentially gets passed to vllm as {"openai_input": {"model": "...", "prompt": "..."}, "openai_route": {}}

47 replies

RRunPod

•Created by 3WaD on 10/11/2024 in #⚡｜serverless

OpenAI Serverless Endpoint Docs

Can someone please confirm that this works in custom images too?

Yes, our serverless API transforms input to openai_route + openai_input as long as you send the request to /openai/*

Is there any way I can develop this locally?

This happens on our platform only. As of now, there is nothing in the SDK to simulate this during local development.

47 replies

RRunPod

•Created by vitalik on 10/10/2024 in #⚡｜serverless

Job retry after successful run

SDK 1.7.4 has been released. Thank you for your patience.

27 replies

RRunPod

•Created by zfmoodydub on 10/24/2024 in #⚡｜serverless

Worker frozen during long running process

SDK 1.7.4 has been released. Thank you for your patience.

38 replies

RRunPod

•Created by Keffisor21 on 10/3/2024 in #⚡｜serverless

Job timeout constantly (bug?)

Hi. Please file a support ticket and mention this thread so that you can share more info that would help us determine what's going on and how to fix it. Feel free to mention me on your tickets. Thank you.

23 replies

RRunPod

•Created by rougsig on 9/27/2024 in #⚡｜serverless

Stuck IN_PROGRESS but job completed and worker exited

I was referring to this "payload_size_bytes": 0 <-- seems sus? It's going to always be zero for GET requests. Payload only exists for post or put requests.

17 replies

RRunPod

•Created by rougsig on 9/27/2024 in #⚡｜serverless

Stuck IN_PROGRESS but job completed and worker exited

not the best of logs. That field actually refers to the size of the body payload on post or put requests. Get requests have none.

17 replies

RRunPod

•Created by rougsig on 9/27/2024 in #⚡｜serverless

Stuck IN_PROGRESS but job completed and worker exited

I have looked at the logs of your endpoint 74jm2u3liu0pcy. It still says it's using 1.6.2 all week. Could it be a different endpoint ID?

17 replies

RRunPod

•Created by rougsig on 9/27/2024 in #⚡｜serverless

Stuck IN_PROGRESS but job completed and worker exited

We recently fixed a bug and released it on 1.7.2. The bug caused our platform to disregard workers that are currently working a job. So if a job took longer than an endpoint's idle time (for example) it would put that worker to sleep. By the time the job is finished, it would have no worker to report back to.

17 replies

Gaming

Programming