RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Reusing containers from Github integration registry

I was wondering if we could reuse the containers pushed in the Runpod Registry (e.g registry.runpod.net/rp-github-build-blabla:7e1ab3844). Use case: I wanted to create another serverless endpoint and using AS base a running endpoint's container...

embeddings endpoints

Hi, I have tried following the sparse documentation, but so far havent been able to get a non-error response or a helpful error message out of the embeddings endpoints. has anyone had any success actually using these, and if so, could you share a setup and exact request format that is known to work?

Delay time even when there are many workers available

Hi Team, We have a serverless flow that takes less than 15 seconds and we have over 20 workers assigned to this, we routinly get delay times of 10 to 15 seconds for jobs even though there are workers sitting idle. That almost doubles our total execution time. Is there something we can do to mitigate this?...

Runpod serverless for Comfyui with custom nodes

I want to use two custom nodes in ComfyUI in runpod serverless: ComfyUI_CatVTON_Wrapper It requires the following dependencies:...

How to deploy ModelsLab/Uncensored-llama3.1-nemotron?

I have tried to deploy this model https://huggingface.co/ModelsLab/Uncensored-llama3.1-nemotron Btw I am facing cude memory issue(I have tried 24gb, 48gb), it does not work, how to fix?...

Almost no 48GB Workers available in the EU

It looks like you're getting rid of A40's. There's no EU region that offers both the A40 and A6000, that's terrible if one stores stuff on Network Volumes. Is there more capacity coming soon?...

GitHub integration: "exporting to oci image format" takes forever.

It's been running for over 30 minutes on this step. Same image builds in less than 5 minutes in GitHub Actions. Why does it take so long? This is the first build. Would it be better for subsequent builds (assuming there's some caching involved?)? To me this is unusable and I much rather just do the build and push myself and just change the endpoint image version....

vllm worker OpenAI stream

Hi everyone, I followed the Runpod documentation to create a simple OpenAI client code using a serverless endpoint for the Llava model (llava-hf/llava-1.5-7b-hf). However, I encountered the following error:
ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message='As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', param=None, type='BadRequestError')
ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message='As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', param=None, type='BadRequestError')
...

Trying to work with: llama3-70b-8192 and I get out of memory

Hi I am trying to work with the model: llama3-70b-8192 but I cant deploy my serverless endpoint because out of memory. I have attached image config screenshot. please reccoment on other settings to make it work [rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU Thanks...
No description

Incrase serverless worker count after 30.

Dear Runpods team.   We are in the process of transitioning our inference operations for aisuitup.com (AI Image Generation service) from AWS to Runpod. To support our growing needs, we will require an increase in our serverless pod capacity after the current limit of 30.   Please let us know the steps needed to facilitate this increase and any additional information or configurations required on our end....

Consistently timing out after 90 seconds

I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds. Max. execution time is set to 300 seconds, so this shouldn't be the issue. Is this a known problem?...
No description

Upload files to network storage

i use network storage to storing lora files Can I automate the process of uploading the assets I need to the network storage? It is used in serverless I can't use my s3 storage because the speed will be much slower....

Serverless problems since 10.12

I am using serverless since few months quit stable, but since 10.12 all my requests execute after 25 seconds i already tried all different settings but at the end the process stops after 25 seconds and i get a error. I changed nothing on Docker ore on my files same settings since weeks. { "delayTime": 4967, "error": "Error queuing workflow: <urlopen error [Errno 111] Connection refused>",...
No description

Git LFS on Github integration

When using the new Github integration workflow, I noticed corrupted large files, so I wanted to make sure that you had Git LFS installed in the environment that pulls the Git repositories. Correct?

Using runpod serverless for HF 72b Qwen model --> seeking help

Hey all, I'm new to this and tried loading a HF Qwen 2.5 72b variant on Runpod serverless, and I'm having issues. Requesting help from runpod veterans please! Here's what i did:...

Docker Image EXTREMELY Slow to load on endpoint but blazing locally

This is the first time I'm encountering this issue with the serverless EP I've got a docker image, which loads the model (flux schnell) very fast, and it runs a job fairly fast on my local machine with a 4090. When I use a 4090 in RP though, the image gets stuck at loading the model ```self.pipeline = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)...

Constantly getting "Failed to return job results."

``` {5 items "endpointId":"mbx86r5bhruapo" "workerId":"r23nc1mgj01m13" "level":"error"...

Why is my serverless endpoint requests waiting in queue when theres free workers?

This has been happening,when two people try to make a request at the same time, the second users request will wait in queue until the first request is completed instead of trying to use another worker. I have 4 workers avaliable on my endpoint so thats not the issue. I set the queue delay to 1 second because thats the lowest possible but it doesn't do anything. Is the serverless endpoint suppose to work in production?

Github integration

@haris Trying the new github integration. It says it gives "Read and write access to code" permissions. Why does the github integration require WRITE access to code?

Is VLLM Automatic Prefix Caching enabled by default?

Hello! I setup a Serverless quick deployment for text generation and I was wondering if VLLM Automatic Prefix Caching is enabled by default? Also see: https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html ...
Next