RunPod

R

RunPod

Join the community to ask questions about RunPod and get answers from other members.

Join

⚡|serverless

⛅|pods

Can runpod bringup nodes faster than aws/gke ?

I might try to use runpod using virtual kubelet . my requirement was to have fast autoscaling...

Buil docker with environment variables

Hi, I try to build a docker from github repo:https://github.com/weaviate/multi2vec-clip-inference. And I setup env variables but I have the error: 2025-03-31 18:51:46 [INFO] > [8/9] RUN ./download.py:...
No description

Unable to deploy my LLM serverless with the vLLM template

I am trying to deploy a serverless LLM with the vLLM template. But I cannot get it to work. Is there something wrong with the configurations?
Ideally, I want to deploy the model I trained, but even deploying the "meta-llama/Llama-3.1-8B-Instruct" as shown in the tutorials didn't work....

Fastest cloud storage access from serverless?

Hi, I am trying to transcribe large files (100MB+) and can of course not use the payload for this (10MB/20MB limit). Any recommendations what cloud storage would provide the best speed/cost ratio?...

Hi, I'm new to runpod and try to debug this error

Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ttb9ho6dap8plv/job-done/qlj0hcjbm08kew/5824255c-1cfe-4f3c-8a5f-300026d3c4f5-e1?gpu=NVIDIA+RTX+A4500&isStream=false'
Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ttb9ho6dap8plv/job-done/qlj0hcjbm08kew/5824255c-1cfe-4f3c-8a5f-300026d3c4f5-e1?gpu=NVIDIA+RTX+A4500&isStream=false'
Is there any way to fetch more log details than this? I learned that the /logs endpoint is only for pods. ...

Length of output of serverless meta-llama/Llama-3.1-8B-Instruct

When I submit a request I get a response that is always 100 tokens. "max_tokens" or "max_new_tokens" have no effect. How do I control the number of output tokens? ...
Solution:
``` { "input": { "messages": [ {...

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)...

Rag on serverless LLM

I am running a server less LLM. I want to add to a model a series of pdf files to augment the model. I can do it on webui in a dedicated gpu by adding knowledge

Unexpected Infinite Retries Causing Unintended Charges

I recently ran my serverless workload using my custom Docker image on RunPod, and I encountered an issue that resulted in significant unexpected charges. My application experienced failures, and instead of stopping or handling errors appropriately, it kept retrying indefinitely. This resulted in: - $166.69 charged by OpenAI due to repeated API calls. - $14.27 charged on RunPod for compute usage....
No description

Serverless vLLM workers crash

Whenever I create a serverless vLLM (doesn't matter what model I use), the workers all end up crashing and having the status "unhealthy". I went on the vLLM supported models website and I use only models that are supported. The last time I ran a serverless vLLM, I used meta-llama/Llama-3.1-70B, and used a proper huggingface token that allows access to the model. The result of trying to run the default "Hello World" prompt on this serverless vLLM is in the attached images. A worker has the status...
No description

Meaning of -u1 -u2 at the end of request id?

Would like to have what those means. I saw u2 on and u1 both sync and not sync requests, couldn't understand what is that.

Ambiguity of handling runsync cancel from python handler side

Hi. What's the best way I can handle "cancel" signal in serverless server/handler side? Is default cancel logic just stopping the container all together?

Enabling CLI_ARGS=--trust-remote-code

I am trying to run some of the SOTA models and the error logs tell me that I need to enable this CLI flag. How can I do that?

CUDA profiling

Hey guys, how can I profile kernels on serverless GPUs Like I have a cuda kernal, how can I know it’s performance using serverless GPUs like RunPod gpus...

Serverless handler on Nodejs

Hi. I see there is official SDK for serverless handler, but for Python. I don't see any API for handler in js-sdk.

RunPod Serverless Inter-Service Communication: Gateway Authentication Issues

I'm developing an application with two RunPod serverless endpoints that need to communicate with each other: Service A: A Node.js/Express API that receives requests and dispatches processing tasks Service B: A Python processor that handles data and needs to notify Service A when complete ...

Runpod ComfyUI Serverless Huggingface Models does nothing

When deploying a ComfyUI serverless endpoint, the attached screen appears which asks for Hugging Face Models. However when I checked the repo, it is not utilized at all. https://github.com/search?q=repo%3Arunpod-workers%2Frunpod-worker-comfy%20MODEL_NAME&type=code How do I download required models (.safetensors) and comfy nodes when deploying an endpoint?...
Solution:
when you press next, until there is environment variable, you can check what is added there. then you can do add same env's with the same docker image template
No description

Serverless ComfyUI -> "error": "Error queuing workflow: HTTP Error 400: Bad Request",

I am running Serverless ComfyUI wirh Runpod and it is not working can someone please help ? i keep getting Job response: { "delayTime": 1009, "error": "Error queuing workflow: HTTP Error 400: Bad Request",...

Error 404 on payload download.

Hi guys! I'm tryin to download a file to my endpoint for processing, using the runpod download utility, and sometimes but no always I get the message: ```...
Next