RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Jobs Stays in In-Progress for forever

Sometimes I never get response when I make a request. It stays in progress and doesn't even show execution time.
No description

How to Get the Progress of the Processing job in serverless ?

When I use status/id, it only return like {delayTime: 873, id: 3e9eb0e4-c11d-4778-8c94-4d045baa99c1-e1, status: IN_PROGRESS, workerId: eluw70apx442ph}, no progress data. I want progress data just like screenshot on serverless console log。Please tell me how to get it in app client....
No description

Rundpod serverless Comfyui template

I couldn’t find any comfyui template on runpod serverless

Why is Runsync returning status response instead of just waiting for image response?

My runsync requests are getting messed up by runpod returning a sync equivalent response (with 'IN_PROGRESS' status and id showing). I need to just return the image, or a failure, not the status using runsync. If I want the status I would just use 'run'. Any idea why this is happening and how to prevent it? For reference, this is for request that generally run for 5-18 seconds to completion. delayTime: 196...

Worker Keeps running after idle timeout

Hi! I have observed that my worker keeps running even there is no request and idle time(60s) has been reached. Also when I make a new request in such a moment my request fails....
No description

May I deploy template ComfyUI with Flux.1 dev one-click to serverless ?emplate

When I click deploy, I only see 'Deploy GPU Pod' , no serverless .
No description

What is the real Serverless price?

In Serverless I have 2 gpu/worker and 1 active worker. The price it shows on the main page is $0.00046/s but in the endpoint edit page it shows $0.00152/s. What is the actual price?

Can't find juggernaut on list of models to download in Comfy UI manager

My workflow is deployed on runpod but i cant find my ckpt in the comfyui manager to download Error Prompt outputs failed validation Efficient Loader:...

comfy

getting message 'throttled waiting for GPU to become available' even though I have 4 endpoints selected with high and medium availability.

Incredibly long startup time when running 70b models via vllm

I have been trying to deploy 70b models as a serverless endpoint and observe start up times of almost 1 hour, if the endpoint becomes available at all. The attached screenshot shows an example of an endpoint that deploys cognitivecomputations/dolphin-2.9.1-llama-3-70b . I find it even weirder that the request ultimately succeeds. Logs and screenshot of the endpoint and template config are attached - if anyone can spot an issue or knows how to deploy 70b models such that they reliably work I would greatly appreciate it. Some other observations: - in support, someone told me that I need to manually set the env BASE_PATH=/workspace, which I am now always doing - I sometimes but not always see this in the logs: AsyncEngineArgs(model='facebook/opt-125m', served_model_name=None, tokenizer='facebook/opt-125m'..., even though I am deploying a completely different model...
No description

Mounting network storage at runtime - serverless

I am running my own docker container and at the moment, I’m using the runpod interface to select network storage which then presents at /runpod-volume This is OK, however, what I am hoping to do (instead) is mount the volume at runtime programmatically. Is this in anyway possible through libraries or API? Basically I would want to list the available volumes, and where the volume exists within the same region as the container / worker, it will mount it....

Serverless fails when workers arent manually set to active

As the title says, my requests to my serverless endpoint are retrying/failing at a much higher frequency when my workers arent set to active. Anyone experienced something like this before?

Chat completion (template) not working with VLLM 0.6.3 + Serverless

I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod. I ran into the following errors Client-side...

qwen2.5 vllm openwebui

I have deployed qwen2.5-7b-instruct using the vLLM quick deploy template (0.6.2). But when using openwebui connected by the OpenAI API the runpod workers log these errors: "code": 400, "message": "1 validation error for ChatCompletionRequest\nmax_completion_tokens\n Extra inputs are not permitted [type=extra_forbidden, input_value=50, input_type=int]\n For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden", "object": "error", "param": null,...

Rope scaling JSON not working

When I try to use rope scaling, with the JSON that works fine in my own vLLM... it errors out on serverless. I tried setting it to just 'type' also but this produces the same error. {"factor":4,"original_max_position_embeddings":32768,"rope_type":"yarn"} Here is the log:...

First attempt at serverless endpoint - "Initializing" for a long time

Hi. New to RunPod, trying to run a serverless endpoint with a worker based on https://github.com/blib-la/runpod-worker-comfy and not able to get it past the "Initializing" status. There are NO logs anywhere in the console Here's what I did:...

(Flux) Serverless inference crashes without logs.

Hi All! I've built a FLUX inference container on Runpods serverless. It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs. E.g. this is the response: ...

Same request running twice

Hi, My request finished a successful run and then the same worker received the same request again and ran it. How could I fix this issue?...