RunPod

R

RunPod

Join the community to ask questions about RunPod and get answers from other members.

Join

⚡|serverless

⛅|pods

worker exited with exit code 0

Hello team, I'm trying to host my remotion video rendering on Runpod serverless built with nodejs via docker. the build completes but when I shoot a request, it never moves out of job queue, worker starts and gives error worker exited with exit code 0 and never shuts down and the video didn't get's rendered every time I've to terminate the worker and purge the queue....

Help! Why do some of my workers report insufficient space when pulling images?

Hello everyone, I have a question for you. As shown in the picture, some of my workers will report insufficient space, but most of them will not.
No description

DeleayTime beeing really high

I am running a serverless worker with cpu only and have a really high delayTime. First boot takes ~ 8 seconds, after that i have around 1 seconds delay time for each request. My executionTime is only 0.1 seconds so my delayTime is 10x my executionTime. When i had a serverless gpu worker my delayTime was way lower than this, is there a fix for that? Thanks in advance...

Does "/runsync" return IN_PROGRESS if it doesn't complete with 2 minutes?

``` async with aiohttp.ClientSession() as http_session: async with http_session.post(url, headers=headers, json=data) as response: data = await response.json()...

How to deploy Multi-Modal Model on Serverless

I am trying to deploy meta-llama/Llama-3.2-11B-Vision (which is just 11B model) on vllm serverless . Using M = (P * 4B) / (32/Q) * 1.2 this formula ,I estimated that 26GB of VRAM should be enough to deploy it. But i tried hosting it with 48GB, 80GB, also tried allocataing two 48GB VRAM per node, but it never works. I am getting this error torch.OutOfMemoryError.
No description

Error when building serverless endpoint

What happened? I tried to build and deploy a Docker image from my GitHub repository, but it failed with these errors: - First, a 401 Unauthorized error when checking some files (blobs). - Then, a BLOB_UNKNOWN error saying it couldn’t find a required file (sha256:6e909acdb...)....

Can runpod bringup nodes faster than aws/gke ?

I might try to use runpod using virtual kubelet . my requirement was to have fast autoscaling...

Buil docker with environment variables

Hi, I try to build a docker from github repo:https://github.com/weaviate/multi2vec-clip-inference. And I setup env variables but I have the error: 2025-03-31 18:51:46 [INFO] > [8/9] RUN ./download.py:...
No description

Unable to deploy my LLM serverless with the vLLM template

I am trying to deploy a serverless LLM with the vLLM template. But I cannot get it to work. Is there something wrong with the configurations?
Ideally, I want to deploy the model I trained, but even deploying the "meta-llama/Llama-3.1-8B-Instruct" as shown in the tutorials didn't work....

Fastest cloud storage access from serverless?

Hi, I am trying to transcribe large files (100MB+) and can of course not use the payload for this (10MB/20MB limit). Any recommendations what cloud storage would provide the best speed/cost ratio?...

Hi, I'm new to runpod and try to debug this error

Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ttb9ho6dap8plv/job-done/qlj0hcjbm08kew/5824255c-1cfe-4f3c-8a5f-300026d3c4f5-e1?gpu=NVIDIA+RTX+A4500&isStream=false'
Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ttb9ho6dap8plv/job-done/qlj0hcjbm08kew/5824255c-1cfe-4f3c-8a5f-300026d3c4f5-e1?gpu=NVIDIA+RTX+A4500&isStream=false'
Is there any way to fetch more log details than this? I learned that the /logs endpoint is only for pods. ...

Length of output of serverless meta-llama/Llama-3.1-8B-Instruct

When I submit a request I get a response that is always 100 tokens. "max_tokens" or "max_new_tokens" have no effect. How do I control the number of output tokens? ...
Solution:
``` { "input": { "messages": [ {...

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)...

Rag on serverless LLM

I am running a server less LLM. I want to add to a model a series of pdf files to augment the model. I can do it on webui in a dedicated gpu by adding knowledge

Unexpected Infinite Retries Causing Unintended Charges

I recently ran my serverless workload using my custom Docker image on RunPod, and I encountered an issue that resulted in significant unexpected charges. My application experienced failures, and instead of stopping or handling errors appropriately, it kept retrying indefinitely. This resulted in: - $166.69 charged by OpenAI due to repeated API calls. - $14.27 charged on RunPod for compute usage....
No description

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status...
No description

Meaning of -u1 -u2 at the end of request id?

Would like to have what those means. I saw u2 on and u1 both sync and not sync requests, couldn't understand what is that.

Ambiguity of handling runsync cancel from python handler side

Hi. What's the best way I can handle "cancel" signal in serverless server/handler side? Is default cancel logic just stopping the container all together?

Enabling CLI_ARGS=--trust-remote-code

I am trying to run some of the SOTA models and the error logs tell me that I need to enable this CLI flag. How can I do that?
Next