RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

US-NC-1 Failing to pull images

Just an FYI - Constantly having to kill these ones as they get stuck in Initializing error pulling image: Error response from daemon: Head "https://registry-1.docker.io/v2/runpod/worker-v1-vllm/manifests/v2.4.0stable-cuda12.1.0": Get "https://auth.docker.io/token?scope=repository%3Arunpod%2Fworker-v1-vllm%3Apull&service=registry.docker.io": read tcp 172.19.7.13:37010->98.85.153.80:443: read: connection reset by peer Worker ID - 45hzf7q7kf58sy...

Billing question

Hey, the math doesn't add up here... please check the images! How can I get the same amount through the API?
No description

stuck at waiting for build

because clone endpoint is not working on my end i have to recreate the endpoint
No description

Serverless instances are not assigned GPUs, resulting in job error in Production. Require Assist

Error Message 1 with Stack Trace: Task Failed [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=0220236a79a1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=177 ; expr=cudnnCreate(&cudnnhandle); \n Error Message 2: Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer...

How to know which graphics card worker was ran on?

Hello! How can I tell which graphics card a job ran on? There's no info about video card neither in callback nor /status endpoint. this is all i've got ``` {...

Has anyone successfully deployed a serverless instance using wan2.1 to generate i2v?

I tried the most comfyui+wan templates but they are all for RunPod. Resources for creating serverless instance for this purpose seem quite scarce too. Halp pls?

serverless instances having issue with caching container image?

We are seeing an increased egress cost a repo of Artifact Registry of GCP since a few days ago. Two serverless instances uses the container image from that repo. The repo is on US and the serverless instances are in Europe. The access to the repo is done with Registry Credentials configured on Docker Configuration of serverless....

Question about Serverless V2 API Payload for Automatic1111 Inpainting

Hi, I'm trying to perform inpainting using the RunPod V2 API (/runsync) with my Serverless endpoint ID (which runs an Automatic1111-compatible image). I'm sending a JSON payload in the input object that includes prompt, init_images (as a list with one base64 string), mask (as a base64 string), denoising_strength, inpainting_fill, inpaint_full_res, and mask_blur. However, the generation ignores the init image and mask. The response info from the backend shows "is_using_inpainting_conditioning": false....

Issue with Websocket latency over serverless http proxy since runpod outage

We have a runpod serverless endpoint which we have been using to stream frames over direct one-to-one websocket. We have a lightweight version of this endpoint we've been using that streams simple diagnostic images, and a production version that streams AI generated frames. Frames are configured to stream at 18fps in both cases to create an animation. We now see that both versions of this endpoint fail to stream frames at a reasonable rate, hovering around 1 fps. The lightweight diagnostic frames take virtually no time to generate, and we have confirmed with logging that the AI generated frames in the production version are not generating any slower, and should still be able to meet the 18 fps demand. But we see that the time to send frames over websocket is on the order of 1s per frame, and is very unstable. See below a snippet from our logs showing fast image generation times, but slow times for sending images over websocket ```...
No description

Runpod down?

Getting error 400 for all our routes

On-Demand vs. Spot Pod

Hi ! I was read FAQ about its but I have one more question- is Spot Pod billing is also based on actual usage like on-demand? With power off like described of course....

Unable to access Custom Container & Community Templates

Hi team, I have credits on my RunPod account and have already deployed a pod, but I cannot find the “Custom Container” option nor any community templates in my UI. I’ve tried different browsers, cleared cache, and checked all menus (Templates, Explore, Pods), but the options simply don’t appear. Could you please check my account settings and unlock these features? Thank you!...

Ai Malware detection

Hi am a student majoring in cs, i would like to know your policies regarding malware detection ai/ml training on the pods, am i allowed to upload binaries and dll if not are malware features and hashes fine?

Facing a Serverless run error that is not encountered in ComfyUI web UI.

Hi all, my Serverless Endpoint has the same custom nodes and models as my persistent pod. However, it encountered the following error when I tried to run my custom workflow on the Serverless Endpoint. Please see logs attached😢 and help me to fix.
Solution:
I've set COMFY_POLLING_MAX_RETRIES to 1000 and it works

Bad requests

I have just the normal 3.6.0 sdxl docker running on my serverless but i just cannot get it to request correctly with a workflow. According to github which is literally the only place with any documentation it says a workflow can be passed but nothing I've tried works its just endless 400 bad request... ``` { "input": { "prompt": prompt,...

Run a function once when a worker starts

I have some code that I only want to run once when my serverless worker starts (i.e not everytime a request is made). What is the best way of doing this? Just executing the code outside of the handler? ``` import runpod import time...

Comfyui serverless via fastapi python to generate an image

Does anyone have a good python demo code to generate an image against a serverless comfyui . I am running a hackathon for a ai Microsoft project and desperately looking for fire base that works. Anyone willing to share some code base?...

Serverless VLLM concurrency issue

Hello everyone, i deployed a serverless vllm (gemma 12b model) through runpod ui. withj 2 workers of A100 80GB vram. if i send two requests at the same time, they both become IN PROGRESS but i recieve the ouput stream of one first, the second always waits for the first to finish then i start recieveing the tokens stream. why is it behaving live this?...
Next