Hello
Hello
RRunPod
Created by Hello on 9/16/2024 in #⚡|serverless
worker exited with exit code 137
My serverless worker seems to get the error, worker exited with exit code 137 after multiple consecutive requests (around 10 or so). Seems like the container is running out of memory. Does anyone know what could be the issue as the script runs gc.collect() to free up resources already but the issue still persists.
4 replies
RRunPod
Created by Hello on 9/15/2024 in #⚡|serverless
Speeding up loading of model weights
Hi guys, I have setup my severless docker image to contain all my required model weights. My handler script also loads the weights using the diffusers library's .from_pretrained with local_files_only=true so we are loading everything locally. I notice that during cold starts, loading those weights still take around 25 seconds till the logs display --- Starting Serverless Worker | Version 1.6.2 ---. Anyone has experience optimising the time needed tp load weights? Could we pre-load it on ram or something (I may be totally off)?
7 replies
RRunPod
Created by Hello on 9/5/2024 in #⚡|serverless
Offloading multiple models
Hi guys, anyone has experience with a inference pipeline that uses multiple models? Wondering how best to manage loading of models that exceed a worker's vram if everything is on vram. Any best practices / examples on how to keep model load time as minimal as possible. Thanks!
3 replies
RRunPod
Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
Hi, I have updated my serverless endpint release version but some of my workers are stuck on "loading container image from cache" even though its a new version that shouldn't exists in the cache to begin with. Any advice on how to solve this issue?
24 replies
RRunPod
Created by Hello on 9/1/2024 in #⚡|serverless
How to deal with multiple models?
Anyone has a good deployment flow for deploying severless endpoints with multiple large models? Asking because building and pushing a docker image with the model weights takes forever.
2 replies
RRunPod
Created by Hello on 7/12/2024 in #⚡|serverless
Failed to return job results
I keep getting "Failed to return job results" errors on 16GB serverless endpoints. After terminating one of the workers, it worked but now my other workers keep getting the same errors as well.
4 replies
RRunPod
Created by Hello on 3/19/2024 in #⚡|serverless
No module "runpod" found
Hi, I am trying to run a serverless runpod instance with a docker image. This is my dockerfile:
# Base image -> https://github.com/runpod/containers/blob/main/official-templates/base/Dockerfile
# DockerHub -> https://hub.docker.com/r/runpod/base/tags
FROM runpod/base:0.6.2-cuda12.2.0

# The base image comes with many system dependencies pre-installed to help you get started quickly.
# Please refer to the base image's Dockerfile for more information before adding additional dependencies.
# IMPORTANT: The base image overrides the default huggingface cache location.


# --- Optional: System dependencies ---
# COPY builder/setup.sh /setup.sh
# RUN /bin/bash /setup.sh && \
# rm /setup.sh


# Python dependencies
COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
rm /requirements.txt

# NOTE: The base image comes with multiple Python versions pre-installed.
# It is reccommended to specify the version of Python when running your code.

# Add src files (Worker Template)
ADD src .

RUN python3.11 -m pip install runpod

CMD python3.11 -u /handler.py
# Base image -> https://github.com/runpod/containers/blob/main/official-templates/base/Dockerfile
# DockerHub -> https://hub.docker.com/r/runpod/base/tags
FROM runpod/base:0.6.2-cuda12.2.0

# The base image comes with many system dependencies pre-installed to help you get started quickly.
# Please refer to the base image's Dockerfile for more information before adding additional dependencies.
# IMPORTANT: The base image overrides the default huggingface cache location.


# --- Optional: System dependencies ---
# COPY builder/setup.sh /setup.sh
# RUN /bin/bash /setup.sh && \
# rm /setup.sh


# Python dependencies
COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
rm /requirements.txt

# NOTE: The base image comes with multiple Python versions pre-installed.
# It is reccommended to specify the version of Python when running your code.

# Add src files (Worker Template)
ADD src .

RUN python3.11 -m pip install runpod

CMD python3.11 -u /handler.py
When the handler runs, import runpod errors out as ModuleNotFoundError: No module named 'runpod' Anyone experienced this before?
4 replies
RRunPod
Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
Running inference on severless endpoint and this line of code:
with safetensors.safe_open(path, framework="pt", device="cpu") as f:
with safetensors.safe_open(path, framework="pt", device="cpu") as f:
Throws OSError: No such device (os error 19) Running on RTX5000. Attached a network volume. The path used in the safe_open leads to a safetensor file in /runpod-volume/example.safetensor Anyone got this error before?
10 replies