RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda

Hey everyone 👋 I'm trying to use Runpod serverless to run the https://github.com/nerfstudio-project/gsplat/ gaussian-splatting implementation. However, when building the project from source (pip install . of my Dockerfile below), I get the error: No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda...

Can't get Warm/Cold status

I have tried "health" endpoint to retrieve Cold/Warm status of an endpoint but even having ready worker didn't mean the endpoint is warm. And it cold started. I need an indicator if the endpoint will cold start or it is still warm. Is it currently available in somewhere to retrieve that information and am I missing it? If not could you suggest a workaround if possible?...

Serverless Deployment runpod request Issue

Im working on deploying the qwen_2.5_instruct model through RunPod using the vLLM direct deployment method. The qwen_2.5_instruct model is designed to more than one image at a time along with the prompt. However, with the vLLM method, RunPod only allows one image per request. I need to pass multiple images in the following format: messages = [...
No description

how can I check the logs to see if my request uses the lora model

I deployed the qwen2-7B model using serverless and want to load the adapter checkpoint. My environment variable configuration is shown in the figure below, where LORA_MODULES={"name": "cn_writer", "path": "sinmu/cn-writer-qwen-7B-25w", "base_model_name": "Qwen/Qwen2-7B"}...
No description

Troubles with answers

I use MistralAI 7B Instruct model and i use standart settings in serverless and im getting strange answers. I tried different temperament values, but it didn't help. Please tell me how to fix it...
No description

Adding parameters to Docker when running Serverless

Hi. I need to add limit_mm_per_prompt to my Serverless Endpoint. How can i do it?

Serverless git integration rollback

Hi team! I've recently switched over to runpod serverless for production with the git integration. I have a concern about reliability in a failure scenario. Firstly, I notice that the builds take roughly 1.5 hours to build out and then for a worker to download + extract the image. Consider a bad release that starts getting rolled out - I can't see a button to stop it. Now, say it rolled out because I missed the process for whatever reason - I can't see a button to roll back to the previous good version....

Async workers not running

When using the /run endpoint I will receive the usual response: ``` { "id": "d0e6d88c-8274-4554-bb6a-0a469361ae20-e1", "status": "IN_QUEUE"...

Docker login to a specific registry

Hi, I'd like to login to nvcr.io using my API token. However, Runpod only allows me to set a username and password, but not let me specify a registry. How can I set this? I'm following the instructions here: https://build.nvidia.com/nvidia/audio2face-2d/docker...
No description

suggestion to create templates for repositories

Make it so that can use github repos instead of only docker images for templates

Large delay time even with multiple available workers

We are having large delay time while workers sit idle. endpoint id: 9mzllv74e08hu4

CPU pod network volume

Will the issue with network volumes and cpu pods be fixed anytime soon? This missing feature is really slowing down my work 😦...

Unexpected Charges on serverless h100 80gb

Since a few hours we are being charged on serverless h100 even though there are no logs in our system nor have we been using the chip at all. Please advice!

Creating serverless instance

Hi! I have very large docker image (50 GB). How should i create it so i will have the lowest coldstart time? Thanks!...

fail: timeout ,exporting to oci image format. This takes a little bit of time. Please be patient.

#31 exporting to oci image format. This takes a little bit of time. Please be patient.
No description

Getting executiontimeout exceeded

logs not showing anything too
No description

Image build from github works fine but when i test with a request i get an error

ImportError: cannot import name 'TypeIs' from 'typing_extensions' (/usr/local/lib/python3.10/dist-packages/typing_extensions.py) I attached the error logs...

Updated workers to 10, now stuck in a loop

Hey, I upgraded the workers from 5 to 10, but now if I want to see how much money do I need for even more workers, I am stuck on this 10 -> 10 loop.
No description
Next