RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Cancelling job resets flashboot

For some reason whenever we cancel a job the next time the serverless worker cold boots it doesn't use flash boot and instead it reloads the llm model weights into the gpu from scratch. Any idea why cancelling jobs might be causing this problem? Is there maybe a more graceful solution for stopping jobs early than the /cancel/{job_id} endpoint?

RUNPOD_API_KEY and MAX_CONTEXT_LEN_TO_CAPTURE

We are also starting a vLLM project and I have two questions: 1) In the environment variables, do I have to define the RUNPOD_API_KEY with my own secret key to access the final vLLM OpenAI endpoint? 2) Isn't MAX_CONTEXT_LEN_TO_CAPTURE now deprecated? Do we still need to provide it, if MAX_MODEL_LEN is already set? ...

Do I need to allocate extra container space for Flashboot?

I'm planning to use Llama3 model that takes about 40 GB space. I believe Flashboot takes a snapshot of the worker and keeps it on the disk to load it within seconds when the worker becomes active. Do I need to allocate enough space on the container for this? In this case, since I'm planning to select a 48 GB vRAM GPU, do I need to allocate 40 GB Model + 48 GB for snapshot + 5 GB extra space = 93 GB container space?
Thanks...

When servless is used, does the machine reboot if it is executed consecutively? Currently seeing iss

When servless is used, does the machine reboot if it is executed consecutively? Currently seeing issues with last execution affecting the next

unusual usage

Hello ! we got billed weirdly this past weekend...
No description

Slow I/O

Hey, I am trying to download a 7GB file and run a ffmpeg process to extract an audio from that file (its a video). Locally it takes on average around 5 minutes, but when I try it on the cloud (I chose the CPU, general purpose since a GPU doesn't seem to give any advantage here) and it looks like the I/O is SUPER SLOW. Is there anything I can do to speed up the Disk I/O?...

Problem with RunPod cuda base image. Jobs stuck in queue forever

Hello, I'm trying to do a request to a serverless endpoint that uses this base image on its Dockerfile FROM runpod/base:0.4.0-cuda11.8.0 I want the serverside to run the input_fn function when I do the request. This is part of the server side code: ```model = model_fn('/app/src/tapnet/checkpoints/')...
Solution:
Hmm yeah I guess python 3.11 is missing from that runpod base image..
No description

runpod-worker-a1111 and loras

I dont think my loras are working with this worker? But it seems to be able to get loras with the /sdapi/va/loras https://github.com/ashleykleynhans/runpod-worker-a1111/blob/main/docs/api/a1111/get-loras.md so am i able to use loras with this worker or no?...

Intermittent connection timeouts to api.runpod.ai

```json { "endpointId":"oic105cyzlovnk" "workerId":"3cwou4m0x6hxl0" "level":"error"...

vLLM streaming ends prematurely

I'm having issues with my vLLM worker ending a generation early. When I send the same prompt to my API without "stream": true, the prompt returns fully. When "stream": true is added to the API, it stops early, sometimes right after {"user":"assistant"} gets sent. It was working earlier this AM, I see this in the system logs around the time that it stopped working: 2024-06-13T15:37:10Z create pod network 2024-06-13T15:37:10Z create container runpod/worker-vllm:stable-cuda12.1.0 2024-06-13T15:37:11Z start container...

Why no gpu in canada data center today?

My network volume is in ca-mtl-1, there is no any gpu now.
Solution:
Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨|incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.
No description

is there example code to access the runpod-worker-comfy serverless endpoint

Hi, I have managed to run the runpod-worker-comfy serverless endpoint. and I know it supports for 5 entries: RUN, RUNSYNC, STATUS, CANCEL, HEALTH. but I do not exactly know how to access the service from my python code. like how to prepare the api-key, the worker id, how to prepare the request for RUN, and how to check the status until it is finished, and download the generated image. anywhere exists a example code to do these basic operation from my python code? Previously I have python code to communicate directly with the comfyUI server, which will create a websocket, send the workflow with http post, keep checking the history, once the work is done, read the image from the output which passed through the websocket connection. but when wrapped with runpod-worker-comfy, indeed, the interface is more easy, and there is input validation which is great. but I do not know how to use it from my code, and did not find any example code to access it, sorry for my ignorance....

Backup plan for serverless network outage

Is this network outage affecting both serverless and on-demand pods? If these two services' outages don't occur simultaneously, can we use pods to mitigate the serverless network outage? What we need is a stable and reliable service....

delay time

I have a serverless worker, which is configured to have 15 max workers. However, I notice that only about three of them are actually usable. My workload is configured to timeout if it takes longer than a minute to process. The other workers randomly have issues such as timing out when attempting to return job data or completely failing to run and having to be retried on a different worker, leading to a delay/execution time of over 2-3 minutes Executing 6 different jobs all have very different delay times. Some worker ids are consistenly low delay time but some randomly take forever. Is there anything I can do to lower this randomness? Additionally can I delete/blacklist these workers that perform poorly...
No description

update worker-vllm to vllm 0.5.0

vLLM just got bumped to 0.5.0 with significant features being ready for production. @Alpay Ariyak FP8 is very significant but so is speculative decoding and prefix caching. - FP8 support is ready for testing. By quantizing the portion model weights to 8 bit precision float point, the inference speed gets 1.5x boost....
Solution:
For sure, already in progress!

SDXL Quick Deploy through Runpod Doesn't work

I sent a request in to test it such as below, and it threw an error. There are other alternatives, so this is not the end of the world for me, but I wanted to give feedback that I don't believe it works. ``` { input: { prompt: "A cute cat"...

Video processing

Hey, What are your approaches and/or recommendations for processing videos in serverless workers?...

can 3 different serverless workers running from same network volume?

Hi @digigoblin I have checked your answer about symbol linking network volume dir to serverless dir and run worker from the network volume as it did as if it was a separate pod instance. https://github.com/ashleykleynhans/runpod-worker-comfyui/blob/main/start.sh#L5-L7 ...

Can serverless endpoints make outbound TCP connections?

I know endpoints can make http/https requests but is there any limit on outbound connections? Is there a FW or are all ports open? What about bandwidth limitations, etc.? Thanks!