Hello
RRunPod
•Created by Hello on 9/16/2024 in #⚡|serverless
worker exited with exit code 137
My serverless worker seems to get the error,
worker exited with exit code 137
after multiple consecutive requests (around 10 or so). Seems like the container is running out of memory. Does anyone know what could be the issue as the script runs gc.collect()
to free up resources already but the issue still persists.4 replies
RRunPod
•Created by Hello on 9/15/2024 in #⚡|serverless
Speeding up loading of model weights
Hi guys, I have setup my severless docker image to contain all my required model weights. My handler script also loads the weights using the diffusers library's
.from_pretrained
with local_files_only=true
so we are loading everything locally. I notice that during cold starts, loading those weights still take around 25 seconds till the logs display --- Starting Serverless Worker | Version 1.6.2 ---
.
Anyone has experience optimising the time needed tp load weights? Could we pre-load it on ram or something (I may be totally off)?7 replies
RRunPod
•Created by Hello on 9/5/2024 in #⚡|serverless
Offloading multiple models
Hi guys, anyone has experience with a inference pipeline that uses multiple models? Wondering how best to manage loading of models that exceed a worker's vram if everything is on vram. Any best practices / examples on how to keep model load time as minimal as possible.
Thanks!
3 replies
RRunPod
•Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
Hi, I have updated my serverless endpint release version but some of my workers are stuck on "loading container image from cache" even though its a new version that shouldn't exists in the cache to begin with.
Any advice on how to solve this issue?
24 replies
RRunPod
•Created by Hello on 9/1/2024 in #⚡|serverless
How to deal with multiple models?
Anyone has a good deployment flow for deploying severless endpoints with multiple large models? Asking because building and pushing a docker image with the model weights takes forever.
2 replies
RRunPod
•Created by Hello on 7/12/2024 in #⚡|serverless
Failed to return job results
I keep getting "Failed to return job results" errors on 16GB serverless endpoints. After terminating one of the workers, it worked but now my other workers keep getting the same errors as well.
4 replies
RRunPod
•Created by Hello on 3/19/2024 in #⚡|serverless
No module "runpod" found
Hi, I am trying to run a serverless runpod instance with a docker image.
This is my dockerfile:
When the handler runs,
import runpod
errors out as ModuleNotFoundError: No module named 'runpod'
Anyone experienced this before?4 replies
RRunPod
•Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
Running inference on severless endpoint and this line of code:
Throws
OSError: No such device (os error 19)
Running on RTX5000. Attached a network volume. The path used in the safe_open
leads to a safetensor file in /runpod-volume/example.safetensor
Anyone got this error before?10 replies