RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

Guray

3/12/2025

Can't access private Google Artifact Registry (gcr.io) images

Hello, I can't connect private Google Artifact Registry (aka. Google Container Registry) images, because Google simply doesn't provide a username and password credentials for us. Is there a way to pass credentials to serverless worker config like how circleci did https://discuss.circleci.com/t/authenticated-docker-pulls-for-gcp-artifact-registry/42194 ?

柠檬板烧鸡

3/12/2025

How to optimize batch processing performance？

Use serverless to deploy Qwen/Qwen2-7B model GPU: Nivada A40 48G Environment variables:...

batch_processing.py

Xqua

3/11/2025

Do you cache docker layers to avoid repulling ?

I have a question, does the current serverless system caches the docker layers to avoid repulling ? Let's assume I make a docker image with the first layers being the AI models, if I upload this, the dokcer repo will only get the new ones and won't update layers it already knows If I pull the same will happen ...

WaGaNaWa

3/10/2025

Using Multi process pool with serverless cpu sometime cannot stop.

for My Script speed increase. I use multiprocess for my LLM Translation (open-AI API Usage (LangChain)). but Sometime the worker stop with 0% cpu usage in very long time.

NexaS

3/10/2025

Using serverless to train on a face

Hello, I am trying to create a web application where people can submit a set of pictures of their face and then i can trigger a serverless request to train a stable diffusion model on their face, is this possible using serverless? I can do this pretty easily using automatic1111 and realistic vision. What would be the steps needed to accomplish this? I am completely new to this and trying to learn

Talion

3/10/2025

Long-running Serverless Requests on Runpod Execute Twice, Doubling Billing Costs

I’m experiencing an unexpected behavior when running long requests (with data payloads around 200k samples) (Clustering using berTopic). After the job finishes and posts the results, it unexpectedly triggers another execution of the same request. This behavior causes the job to run twice, effectively doubling my costs. Notably, when I run similar jobs with smaller payloads (e.g., 10k or 20k samples), they execute normally without any duplicate runs.

koop7450

3/9/2025

Use SDK to create Network Storage Volumes for Serverless Endpoints

Hello 👋 I am using the SDK to create a serverless endpoint. I know I can specify a volume ID when creating the endpoint via SDK, but is there a way to also programmatically create the network storage volume and push data to it (and then attach it to the endpoint)?

blue whale

3/8/2025

Historical jobs

Is there anyway I can call /status API to know the status of old jobs like 2-3 days old? Right now it only returns if its within 30 mins.

Aleksei Naumov

3/7/2025

How to retrieve account spends using GraphQL

Hey there! From documentation it's really unclear how to retrieve e.g. my daily account spends using GraphQL. Documentation really lack info on how to structure queries which is especially confusing for someone not familiar with GraphQL (like me lol). Can you please help? Thanks!...

bggai

3/7/2025

Runpod Servelerss really unreliable, delay time is way too high sometimes

I'm using a 24 GB vRAM serverless endpoint, the endpoint is way too unstable, 90% of the times the "QUEUE" takes a couple of seconds and then inference of the Omniparser v2 model takes between 3-8 seconds. This is a replicable result in Google Colab and other GPUs, nonetheless, every once on a while Runpod takes more than 40 seconds ofr even minutes to process a request. This happens when a specific worker bugs and then multiple request goes through it. The worker bugs for no reason and takes multiple minutes to do the job it should do in seconds. This only happens for some workers and when the same worker is used multiple times, it makes no sense and Runpod charges you multiple minutes of DELAY TIME, sometimes it does not even go through, meaning it says "IN_PROGRESS" as seen in the image for multiple minutes without finishing while Runpod charges you every second. In any other environment and even runpod this process takes seconds, the "IN_PROGRESS" print shows between 3-8 times only. This makes the endpoint highly unstable and way too expensive for a model that does not even use half of the vRAM....

Jaunty

3/7/2025

Worker other than Python

As I understand only python library is implemented for serverless workers? If I don't use python in my docker image (C# console app which is using cuda library) am I able just run http server in my worker application, listen to port set in "RUNPOD_REALTIME_PORT" (as far as I understand from runpod github) environment variable, and register some http routes for receiving job inputs, cancelling, etc.? If yes, where I can find list of routes that I need to implement on my http server to be able to act as worker?...

S1TH

3/6/2025

How to deploy a custom model in runpod?

I am planning to run a word2vec and other classification model in runpod. And I am kind of using tensorflow. Any idea how to deploy it in runpod serveless?

billchen

3/6/2025

Build fail:"code":"BLOB_UNKNOWN"

error show: 2025-03-07 00:09:40 [INFO] sha256:9b6fcd7cc5df8c4b6b83df5513c224b403b862ae741e7cd666dc045d995b49d1: 7282047 upload bytes left. 2025-03-07 00:09:42 [INFO] Pushed sha256:9b6fcd7cc5df8c4b6b83df5513c224b403b862ae741e7cd666dc045d995b49d1 2025-03-07 00:09:44 [ERROR] 476 | method: "PUT", 2025-03-07 00:09:44 [ERROR] 477 | });...

danomatic0117

3/6/2025

400 Errors with allenai-olmocr on Serverless SGLang - Need Payload Help!

I'm trying to deploy the allenai/olmOCR-7B-0225-preview model (fintuned Qwen/Qwen2-VL-7B model) RunPod using the Serverless SGLang endpoint template, but I'm consistently getting 400 Bad Request errors when sending requests. running on L40S. I'm trying to send PDF documents for OCR, and I hope the issue is with the input payload. I've tried various common input formats based on the RunPod documentation and examples, but no luck so far. I've tried sending as a pdf file & page number as well as what I originally tried (pdf anchor text and image). in the code below, I am using the retrieved https://molmo.allenai.org/paper.pdf I'm using the allenai-olmocr model (Hugging Face link: https://huggingface.co/allenai/olmOCR-7B-0225-preview), deployed as a Serverless SGLang endpoint on RunPod. I deployed it the lazy way, providing huggingface handle and mostly default settings, and am wondering if I need to set up a handler and deploy using docker to get to work?...

canxerian

3/5/2025

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda

Hey everyone 👋 I'm trying to use Runpod serverless to run the https://github.com/nerfstudio-project/gsplat/ gaussian-splatting implementation. However, when building the project from source (pip install . of my Dockerfile below), I get the error: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda...

hakankaan

3/5/2025

Can't get Warm/Cold status

I have tried "health" endpoint to retrieve Cold/Warm status of an endpoint but even having ready worker didn't mean the endpoint is warm. And it cold started. I need an indicator if the endpoint will cold start or it is still warm. Is it currently available in somewhere to retrieve that information and am I missing it? If not could you suggest a workaround if possible?...

amirh1541

3/5/2025

My job is in queue for over an hour

Does it reasonable?

ASWIN00000

3/5/2025

Serverless Deployment runpod request Issue

Im working on deploying the qwen_2.5_instruct model through RunPod using the vLLM direct deployment method. The qwen_2.5_instruct model is designed to more than one image at a time along with the prompt. However, with the vLLM method, RunPod only allows one image per request. I need to pass multiple images in the following format: messages = [...

柠檬板烧鸡

3/5/2025

how can I check the logs to see if my request uses the lora model

I deployed the qwen2-7B model using serverless and want to load the adapter checkpoint. My environment variable configuration is shown in the figure below, where LORA_MODULES={"name": "cn_writer", "path": "sinmu/cn-writer-qwen-7B-25w", "base_model_name": "Qwen/Qwen2-7B"}...

codyman4488

3/4/2025

how to run a quantized model on server less? I'd like to run the 4/8 bit version of this model:

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!