RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Huge sudden delay times in serverless

I'm using a webui forge serverless template for my endpoint, with a network volume attached and sometimes the results are very inconsistent. For example in the last two results you can see I use the same worker but one has a delay time of 3s and another 80.39s, the second request was submitted 4-5 seconds later after the first request, so there was no long time gap either. I know the forge/automatic1111 templates usually take time to load but all this time up until this monday/tuesday or so it only took about 10-20 second delay time, but now I'm having 80-90 second delays. Didn't make any change in my code either. Anyone know the reason for this?...
No description

Testing Async Handler Locally

Hi, I am trying to test the async handler locally. I am following the documentation very closely (attached is an example of something I tried). However, I cannot seem to get the /status/{job_id} endpoint to return anything while the job is in progress (I expected it to return IN_PROGRESS and perhaps the values that have thus far been yielded) -- that is, I recieve no response. I am testing locally by running the handler with the --rp_serve_api flag. Is there something I am doing wrong? Why can't I see the status of a job in progress? ...
No description

OpenAI Serverless Endpoint Docs

Hello. From what I could find in the support threads here, you should be able to make a standard openAI request not wrapped in the "input" param if you hit your endpoint at https://api.runpod.ai/v2/<ENDPOINT ID>/openai/... The handler should then receive two new params, "openai_route" and "openai_input," but it's been a couple of months since the threads, and I can't find any official docs about this or the ability to test this locally with the RunPod lib. Can someone please confirm that this works in custom images too? If so, what is the structure of the parameters received? Does "input" in handler(input) contain "openai_input" and "openai_route" params directly? Is there any way I can develop this locally?...

Will there be a charge for delay time?

What is the charging model in runpod's serverless? Do I only need to pay for execution time + idel timeout, or do I need to pay for delay time + execution time + idel timeout?...

Some serverless requests are Hanging forever

I'm not sure why but I "often" (often enough) have jobs that just ... hang there even if multiple gpus are available on my serverless endpoint. new jobs might come it and go through while the old job just "stalls" there. any idea why ?...

Job retry after successful run

My endpoint started to have retries for every request even though the first run is successful without any errors. Don't understand why that is happening. That is what I see in the logs when first run finishes, and retry starts 2024-10-10T11:51:52.937738320Z {"requestId": null, "message": "Jobs in queue: 1", "level": "INFO"}...

Why too long delay time even if I have active worker ?

I have set the active worker to 1. I am manually testing the response delay. I submit the next task only after the previous task is completed, so there is no waiting time. However, many times, the delay time is still very long, sometimes even reaching more than 4 seconds. Why is this? In my code, the model has been loaded before runpod.serverless.start({"handler": run})...
No description

Keeping Flashboot active?

It is my understanding that Flashboot is only active for "a while" after each request, and then it is disabled as the instance goes to a deeper sleep. Sadly for me it takes a whopping 70-90 seconds of just delay to cold start after a long delay (running llama-2-13b-chat-hf off the 48GB GPUs e.g. A40), I don't know if I am doing something wrong there as I see others on this forum are getting much much faster start times. However, on consecutive jobs, the delay drops down to 1-3 seconds. What is t...

Hugging face token not working

Hello! Has anyone had issues getting their hugging face token to work on a serverless vLLM instance? I have used hugging face before and their tokens work for me locally, but I keep getting access denied log entries on the console logs when trying to send a request even though I give it the token key...

Pod stuck when starting container

Yesterday I updated my serverless endpoint with "New release" button. However when the new request came the worker stuck when trying to start container and sucked the remaining funds from my account. In the logs I see multiple worker exited with exit code 0 errors Probably something wrong with my container, but would be nice if after multiple failed attempts to start container the worker stopped automatically and didn't drain money....

Local Testing: 405 Error When Fetching From Frontend

Hi, I am trying to test my handler function by fetching data with my frontend (running on localhost:3000). I am running the local RunPod test server (FastAPI) locally and am trying to make requests to it. However, I keep running into a 405 error. My curl requests are working great; however, I need to test my backend from the frontend. I can't find documentation that demonstrates how I can allow requests from localhost:3000 -- normally I would just add a relaxed CORS policy, but I am not sure how to do that with RunPod. I have tried running quite a few different fetch requests, including with my API key, but nothing is working. For reference, here is what I am currently doing on my Next.js frontend: const header_data = { input: { subjob: "root",...

Automatic1111 upscaling through API

I have an Automatic1111 endpoint, and I am trying to run the following request using the SD upscale script: `{ "input": { "sd_model_checkpoint": "", "sd_vae": "",...

Can we run Node.js on a Serverless Worker?

According to the Serverless Overview doc page (https://docs.runpod.io/serverless/workers/overview), we can write functions in the language you're most comfortable with. There's a Runpod SDK on NPM (https://www.npmjs.com/package/runpod-sdk), but that looks like it's meant to call existing endpoints, not to create handler functions. Is this possible? If so, are there any templates available to create the handler function in Node.js?...

Microsoft Florence-2 model in serverless container doesn't work

I'm trying to use Florence 2 models in ComfyUI workflow with serverless container and it returns with error: raise RuntimeError(f'{node_type}: {exception_message}')\nRuntimeError: DownloadAndLoadFlorence2Model: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate Accelerate library already installed in venv in network storage where comfyui runs, also I installed it in docker container. Maybe anyone know how to solve this problem? Thanks in advance...

Terrible performance - vLLM serverless for MIstral 7B

Hello, When I serve Mistral-7B quantized in AWQ using a model such as "TheBloke/Mistral-7B-v0.1-AWQ" in the vLLM serverless instance of runpod, I get terrible performance (accuracy) compared to running Mistral 7B on my CPU using ollama (which uses GGUF quantization and Q4_0), could this be due to a misconfiguration by me in the parameters, although I kept the defaults, or is AWQ quantization known to drop the performance that low? Thank you...

New release on frontend changes ALL endpoints

When I publish a new release to one endpoint through the runpod website, I noticed that it will push the release to all other endpoints as well. This has messed up a few of my workflows.

Endpoints vs. Docker Images vs. Repos

Hi, I am new to both Docker and RunPod, so my apologies if this question is overly obvious. I am trying to convert a FastAPI app into a RunPod serverless endpoint. My question is, given that my FastAPI app has many endpoints, how can I access all those endpoints from just one RunPod serverless endpoint? Does it make more sense to create a serverless endpoint for every RESTful endpoint in my FastAPI app? Would I then need to create a different docker image for each endpoint? I've spent a good amount of time looking through the docs, and most of the examples seem to use only one endpoint. Any resources you could point me to would be greatly appreciated. Thanks for your help!...

Serverless Streaming Documentation

I'm using the runpod github template for my model and it's working- but how would I set it up and make my model stream to runpod?