RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Pod crashing due to 100 percent cpu usage

Hey I need help regarding runpod serverless My serverless pod using 100 percent cpu and then it crashes Is their a way to limit cpu usage of pod to not exceed certain point?...
No description

Service not ready yet. Retrying...

I always have many Stuck Requests, request delay infinitly ,Can anyone help me, thanks
No description

Asynchronous serverless endpoint failing with 400 Bad Request

I'm getting the following error when my serverless endpoint tried to return it's output object: "Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ne9y7bgqrpzcu6/job-done/asvftiq7ad2xzj/30238db1-1d48-4a80-8c5e-86f69acf3642-e1?gpu=$RUNPOD_GPU_TYPE_ID&isStream=false'" The payload is small, only a KiB or so. What can be the other causes of this "Bad Request", presumbly done by runpods python library?...

What environment variables are available in a serverless worker?

To be clear, this is distinct from the custom environment variables that are set on the template. We want to get the Worker ID from a given serverless worker. This is primarily to improve our logging capability, but it would also be useful to know of any other relevant environment variables exposed in a worker. Note I will be away for the next week so I might take a while to respond....

Worker keeps running after finishing job, burning money?

Hi! Since around one week ago, i've started seeing 3-4 examples of a worker displaying 10 hours, or 3 days running while there are no active jobs running. Is it spending the credits as well? I've caught an example :...
No description

Who to message to increase 80GB serverless endpoint to 3 GPUs/worker instead of 2?

Whenever I try to increase my serverless endpoint to more than 2x GPU/worker for the 80GB cards, it is grayed out. 8x 48GB cards does not fit 405b models but during testing 4x 80GB cards do.

RunPods Serverless - Testing Endpoint in Local with Docker and GPU

I’m creating a custom container to run FLUX and Lora on Runpods, using this Stable Diffusion example as a starting point. I successfully deployed my first pod on Runpods, and everything worked fine. However, my issue arises when I make code changes and want to test my endpoints locally before redeploying. Constantly deploying to Runpods for every small test is quite time-consuming. I found a guide for local testing in the Runpods documentation here. Unfortunately, it only provides a simple example that suggests running the handler function directly, like this:...

is runpod serverless experiencing issues?

Seeing alot of these errors today 2024-10-15T22:06:11.508889850Z connectionpool.py :870 2024-10-15 22:06:10,637 Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='api.runpod.ai', port=443): Read timed out. (read timeout=8)")': /v2/080ddk82a04i8f/ping/365s5mr8swww1x?gpu=NVIDIA+GeForce+RTX+4090&runpod_version=1.7.0...

How to go about applying for Runpod's creator program?

Hi, i'm KingTut, founder of Ainime, an AI-powered platform dedicated to creating high-quality, original anime. Our mission is to empower studios of all sizes to produce diverse and inclusive content, particularly focusing on underrepresented characters like Black characters. Runpod’s robust GPU infrastructure and scalable solutions are perfectly aligned with our goal to make anime production easier, more affordable, and faster. By leveraging your technology, we can enhance our platform’s capabilities and deliver exceptional results to our users. As our user base grows, we are facing financial challenges in maintaining the necessary infrastructure. Joining Runpod’s Creator Program would provide us with crucial access to your resources, allowing us to build a robust solution while promoting Runpod’s services to a dedicated community of anime creators....

Initializing...

Ive started a new serverless instance. Its been Initializing for the last few hours. How long before the server actually gets created?

Connection timeout to host

2024-10-14T14:17:51.194569601Z 0|app | {"requestId": "sync-dccee400-e082-4633-8c95-238d11a57c51-e1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/i03hdwuhbsfyo8/job-done/9lja6f6wu32dyr/sync-dccee400-e082-4633-8c95-238d11a57c51-e1?gpu=NVIDIA+A100+80GB+PCIe&isStream=false", "level": "ERROR"} I am facing this error...

No container logs, container stopped, worker unhealthy.

Hello everyone. We run custom images on runpod to serve our inference. We have been having a hard time getting Runpod to behave consistently. Our serverless workers go "unhealthy" with no indication nor logs whatsoever on why that happens. Some images can't be run on most GPUs, whilst running just fine on 3090s....

Streaming LLM output via a Google Cloud Function

Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.

Serverless and Azure.

So I'm new to Runpod and the docs are not very helpful. I'm creating a comfy UI workflow that I want to run on Runpod eventually. But I wanted to try Runpod first. So I started a preconfigured serverless option. ...

Testing Endpoint in Local with Docker and GPU

I’m working on creating a custom container to run FLUX and Lora on Runpods, using this Stable Diffusion example as a starting point. I successfully deployed my first pod on Runpods, and everything worked fine. However, my issue arises when I make code changes and want to test my endpoints locally before redeploying. Constantly deploying to Runpods for every small test is quite time-consuming. I found a guide for local testing in the Runpods documentation here (https://docs.runpod.io/serverless/workers/development/local-testing). Unfortunately, it only provides a simple example that suggests running the handler function directly, like this:...

Chat template error for mistral-7b

``` 2024-10-14T10:19:42.283509829Z --- Starting Serverless Worker | Version 1.7.0 --- 2024-10-14T10:19:42.283511520Z ERROR 10-14 10:19:42 serving_chat.py:155] Error in applying chat template from request: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one. 2024-10-14T10:19:42.283814574Z /src/engine.py:183: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited 2024-10-14T10:19:42.283849707Z response_generator = await generator_function(request, raw_request=dummy_request)...

When are multiple H100s cores on a single node available ?

10 x 48Gb GPUs cannot host all the model weights. Is RunPod planning to upgrade their platform ?

H100 NVL

If I've understood the docs correctly, H100 NVL is not available on serverless. Are there any plans to bring it to serverless? The extra 14GB of VRAM over the other GPUs is pretty useful for 70(ish)B parameter LLMs.

RunPod Header Timing is Off

These responses from the stream endpoint are coming in every second, yet the header date is saying, at points, that each response is 10 seconds apart. When this happens, nothing is shown in the stream despite the fact that I yield something every second. Is there a way to force the timing or another way I can get around this?
No description

Jobs randomly dropping - {'error': 'request does not exist'}

RunPod worker errors: ``` 2024-10-12T18:25:21.522075786Z {"requestId": "51124010-27f8-4cfa-b737-a50e6d436623-u1", "message": "Started.", "level": "INFO"} 2024-10-12T18:25:22.723756821Z {"requestId": "51124010-27f8-4cfa-b737-a50e6d436623-u1", "message": "Finished.", "level": "INFO"} 2024-10-12T18:27:09.433322101Z {"requestId": null, "message": "Failed to get job, status code: 404", "level": "ERROR"}...