RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Async workers not running

When using the /run endpoint I will receive the usual response: ``` { "id": "d0e6d88c-8274-4554-bb6a-0a469361ae20-e1", "status": "IN_QUEUE"...

Docker login to a specific registry

Hi, I'd like to login to nvcr.io using my API token. However, Runpod only allows me to set a username and password, but not let me specify a registry. How can I set this? I'm following the instructions here: https://build.nvidia.com/nvidia/audio2face-2d/docker...
No description

suggestion to create templates for repositories

Make it so that can use github repos instead of only docker images for templates

Large delay time even with multiple available workers

We are having large delay time while workers sit idle. endpoint id: 9mzllv74e08hu4

CPU pod network volume

Will the issue with network volumes and cpu pods be fixed anytime soon? This missing feature is really slowing down my work 😦...

Unexpected Charges on serverless h100 80gb

Since a few hours we are being charged on serverless h100 even though there are no logs in our system nor have we been using the chip at all. Please advice!

Creating serverless instance

Hi! I have very large docker image (50 GB). How should i create it so i will have the lowest coldstart time? Thanks!...

fail: timeout ,exporting to oci image format. This takes a little bit of time. Please be patient.

#31 exporting to oci image format. This takes a little bit of time. Please be patient.
No description

Getting executiontimeout exceeded

logs not showing anything too
No description

Image build from github works fine but when i test with a request i get an error

ImportError: cannot import name 'TypeIs' from 'typing_extensions' (/usr/local/lib/python3.10/dist-packages/typing_extensions.py) I attached the error logs...

Updated workers to 10, now stuck in a loop

Hey, I upgraded the workers from 5 to 10, but now if I want to see how much money do I need for even more workers, I am stuck on this 10 -> 10 loop.
No description

Webhooks Stopped Working?

Anyone else noticing that their webhooks aren't being invoked after their workloads run? Noticing that i'm not getting any callbacks tonight.

30 minutes pending in serverless

wasnt liket this yesterday
No description

Is there a maximum Runtime?

Hi, when I try running a job on my handler locally, everything works just fine, the job runs for about 12 minutes. However when I test my job with a serverless worker, after around 10 minutes, my job fails just in the middle of processing without throwing any error and the worker gets killed. Is there a maximum time a worker can run a Job? I could not find anything related to this in the docs.

EUR-IS datacenter blacklisted by Elevenlabs?

I have a strange issue happening since yesterday, my serverless instance could not establish a wss communication with Elevenlabs API, it is throwing a 403 issue with the following link: https://help.elevenlabs.io/hc/en-us/articles/22497891312401-Do-you-restrict-access-to-the-service-and-platform-for-any-specific-countries Not sure if this is specific to Runpod or Elevenlabs but when I changed the datacenter to EUR-RO, the issue disappeared....

queue delay times

Hi , I'm seeing really long delay times . even though there's nothing in the queue , and this is a really small CPU serverless endpoint . Any idea what causes this ?
No description

serverless qwen-audio model deployment, can't see any error, getting workers exited with exit code 1

I have setuped the worker-template for processing some audio files with qwen-2-audio-7B instruct. the image build was sucessfull, but when i am making a request with my inputs, it is not changin the status of my input in the queue and also showing worker exited with exit code 1 in logs. Can't find what i am doing wrong. Please help!!!

How to Speed Up S3 Upload or Make it Async in RunPod Serverless Deployments

I am currently exploring using RunPod as our primary in-house model deployment platform instead of Replicate (our current preferred platform). Our in-house models mostly are txt2img/img2img custom models. One of the issues I'm facing while testing RunPod is long S3 upload times. For example, for one of our processes, the prediction time is ~1 second, but the S3 upload is taking up to 4-5 seconds (depending on image size), significantly increasing the overall prediction time. This causes two main problems:...

output is undefined on response

Hello, i am running the serverless endpoint and I get a return in the console for a request made on the site, but when i use the sdk with the runSync function it does not give me an output. Instead it just says that it succeds and is completed but no output object is present on the response. here is the response printed as a table
No description

Locally testing a worker where the consuming code relies on the job ID

Hi there, I'm working on a codebase where a RunPod worker is used to execute a workload that takes ~40 seconds, and then the result is sent to the webhook with the job ID and state. I have been attempting to test this worker locally for faster iteration, but I've been discovering that RunPod's development workers seem to have discrepancies with the production workers, and that these aren't really documented....