RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

How to get `/stream` serverless endpoint to "stream"?

Example from official documentation: https://docs.runpod.io/sdks/javascript/endpoints#stream ``` from time import sleep import runpod...

jobs queued for minuets despite lots of available idle worker

for the past couple of days my jobs keep getting queued for a long time despite lots of available "idle" workers - no where near my max workers. sometimes there's 9 available workers but concurrent jobs still get queued... anyone have any insight on this?
No description

Request stuck because of exponential backoff, what does it mean?

relevant log line: "message":"b2http.py :600 2024-10-29 15:56:21,141 Pausing thread for 64 seconds because that is what the default exponential backoff is\n" this keeps repeating in serverless GPU logs and pod is on and request is stuck...

in serverless CPU, after upgrading to runpod sdk 1.7.4, getting lots of "kill worker" error.

This is severless CPU workers, not GPU. Initially serverless CPU was on 1.7.3 and was timing out if execution time was longer than a minute. So, I downgraded to 1.6.2 and it worked fine. Yesterday, I upgraded to 1.7.4 and getting "kill worker" error

Deploying bitsandbytes-quantized Models on RunPod Serverless using Custom Docker Image

Hey everyone 👋 Looking for tips from anyone who's worked with bitsandbytes-quantized models on RunPod's serverless setup. It's not available out of the box with vLLM, and I was wondering if anyone's got it working? Saw a post in the serverless forum about maybe using a custom Docker image for this. For context: I've fine-tuned LLaMA-3.1 70B-instruct using the unsloth library (which utilizes bitsandbytes for quantization) and am looking to deploy it. Any insights would be greatly appreciated! 🙏...

Delay times on requests

It is starting to become a habit for requests to have very high delay times which really reduces user experience. Runpod serverless used to work really well ... its great that cold boot is like 0 sec but if its to have 2 min of delay before a request starts ... its pointless
No description

just got hit with huge serverless bill

didnt look at the billing section for a few days and got hit for a huge serverless bill, turns out all the workers wouldnt time out for hours, can i get a credit refund for this?

Can u run fastapi gpu project on serverless runpod?

I have a fastapi project that was hosted on sagemaker. Now i plan to move it to runpod. Can someone guide how to do it?...

Execution Time Greater Than 30000s

why the Execution Time was so long, even greater than 30000s Image, I had to cancel manually Because the task queue is completely unable to run....
No description

Serverless tasks get stopped without a reason

Hey everyone! Im running a serverless function which starts a docker that runs a python program, and for some reason sometimes, while the python software is running, the container gets stopped: 2024-10-27T19:50:00Z start container for xxxxx begin 2024-10-27T19:50:39Z stop container 5f797326f14b2a255f5363623485299b4f911fbba8b8b60e3daf44908c43980f 2024-10-27T19:50:39Z remove container...

Serverless Real-World Billing (Cold Start, Execution, Idle)

I understand that RunPod Serverless compute is billed as: Cold Start Time + Execution Time + Idle Timeout Can you help clarify how this applies in real-world settings with sporadic usage? For an example:...

Cannot load symbol cudnnCreateTensorDescriptor

I encountered this error when I deployed my Whisper code on a serverless environment. What is the recommended image to use for running the Whisper models 'base' and 'large-v3'?
No description

How to send an image as a prompt to vLLM?

Hi there, I am new to Runpod and facing an issue in sending the image to the runpod serverless endpoint. In docs it's mentioned how an image could be received but I want to send it. I am using a Qwen2VL Model which accepts an image and a text prompt. I am able to send text but not the image. Please help me with this. Actually I am doing it for an assignment to be submitted before the deadline. Thank you any help would be appreciated....

Any good tutorials out there on setting up an sd model from civitai on runpod serverless?

I've been trying to set up an sd model from civitai on runpod serverless for days now but i've been coming across too much errors. Each time i fix , thers a new one , classic. Is there any good tutorials out there on setting up an sd model from civitai on runpod serverless ?...

Does VLLM support quantized models?

Trying to figure out how to deploy this, but I didn't see an option for selecting which quantization I wanted to run. https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Thanks!

Vllm error flash-attn

I get this error how to fix it and use vllm-flash-attn which is faster. Current Qwen2-VL implementation has a bug with vllm-flash-attn inside vision module, so we use xformers backend instead. You can run `pip install flash-attn to use flash-attention backend.

Frequent "[error] worker exited with exit code 0" logs

Hi! I’m working on a project where I'm using RunPod serverless to run my ComfyUI workflow within a Docker image. I attempted to update my Dockerfile and the ComfyUI workflow JSON to save the generated images to my RunPod network volume, but I keep receiving the following logs at the bottom. Any insights or suggestions on how to resolve this would be greatly appreciated! I’ve attached my Dockerfile for reference, and here’s the relevant part of the ComfyUI output path configuration: ```json "332": {...

Worker frozen during long running process

request ID: sync-f144b2f4-f9cd-4789-8651-491203e84175-u1 worker id: g9y8icaexnzrlr I have a process that should in theory take no longer than 90 seconds ...

Runpod GPU use when using a docker image built on mac

I am building serverless applications that are supposed to be using gpu, while testing locally, the pieces that kick off functions that are meant to be using gpu are denoted with the common: device: str = "cuda" if th.cuda.is_available() else "cpu" this is required so that when running locally on a mac, the cpu device is used. I would think that in a docker image built on a mac, but with a amd64 machine type specified in the build command, that when its deployed on a server that has a cuda base image, cuda gpu would be used. but that does not seem to be the case....