RunPod•12mo ago

Delay Time

Hello, I'm wondering if those Delay Time are normal? If not, what should I do?

36 Replies

Seems pretty low to me. Depends on what your worker does.

MinozarOP•12mo ago

It takes a bunch of image urls and do ML inference on them Maybe one question, does the delay time increase if the "active and supposed to be warm" worker hasn't actually been working for a while?

MinozarOP•12mo ago

it kinda feels unreliable atm

MinozarOP•12mo ago

@Papa Madiator

Madiator2011•12mo ago

MinozarOP•12mo ago

I'm experiencing extreme delay time, how can I get them back to normal?

Madiator2011•12mo ago

I'm not sure what are you doing and running

MinozarOP•12mo ago

atm it just takes a bunch of image urls and calculate their embeddings using a specific ML model

Madiator2011•12mo ago

did you bake models into docker image?

MinozarOP•12mo ago

my dockerfile looks like this:

FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu20.04

RUN apt-get update && apt-get install -y \
    python3-pip \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN pip install runpod==1.6.2 torch torchvision torchaudio sentence-transformers==2.7.0

# caching the model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('clip-ViT-B-32', device='cpu')"

COPY builder/requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY src/ .

CMD ["python3", "handler.py"]

FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu20.04

RUN apt-get update && apt-get install -y \
    python3-pip \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN pip install runpod==1.6.2 torch torchvision torchaudio sentence-transformers==2.7.0

# caching the model
RUN python3 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('clip-ViT-B-32', device='cpu')"

COPY builder/requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY src/ .

CMD ["python3", "handler.py"]

Jason•12mo ago

Woah the python line im not sure its caching it in runtime run it on CMDor ENTRYPOINTfile not on the build but on the runtime

MinozarOP•12mo ago

This python line does two things: - it downloads the model and save it somewhere - it loads it into the ram If I remove the line and only do that on run time, I'd have to download it each time no ? (or maybe I misunderstood something sorry for that)

Jason•12mo ago

Do it on the runtime too its fine Yess but it loads on your build machine not on the runtime then on the runtime its gonna be different machine so its not cached yet

MinozarOP•12mo ago

Usually whats the proper way to bake models into docker images? Any examples?

Jason•12mo ago

Like on the docs It loads but on the runtime using a python file / .sh file

Madiator2011•12mo ago

I would check logs when worker is starting

Jason•12mo ago

🚧 If your handler requires external files such as model weights, be sure to cache them into your docker image. You are striving for a completely self-contained worker that doesn't need to download or fetch external files to run. Oh on the docs its only like that I guess running the python line that loads the model or creates the pipeline works in handler.py works

MinozarOP•12mo ago

exactly, so that's not very clear to me how to cache models in the docker image, I thought my implementation would work

import numpy as np

# Can I load the model here??

def handler(job):
    return True

runpod.serverless.start({"handler": handler})

import numpy as np

# Can I load the model here??

def handler(job):
    return True

runpod.serverless.start({"handler": handler})

Is that what you meant?

Jason•12mo ago

Yep

MinozarOP•12mo ago

So everything put outside of the handler function will be cached in the docker image??

Jason•12mo ago

before the start i think serverless.start

MinozarOP•12mo ago

Yes but the documentation says "be sure to cache them into your docker image." How to do that correctly? (the doc doesn't provide enough information I think)

Jason•12mo ago

Just before the start() line so it has to be loaded before that i think

MinozarOP•12mo ago

If I have no active worker, each time I'll spawn a new one, It'll dowload the model, that's not what I want, I'd like the model to be cached in the docker image so when I spawn a new worker, the model is already almost ready to use

Jason•12mo ago

cached means it has to be stored somewhere

Madiator2011•12mo ago

also good idea is that in handler you load model to vram

Jason•12mo ago

Either you store it in the image or network storage first then load it into the vram in runtime thats how you cache it so if you want to download the model, download it into network volume @Papa Madiator so like this is fine right?

Madiator2011•12mo ago

yup

Madiator2011•12mo ago

examples https://github.com/runpod-workers/worker-sdxl/blob/main/src/rp_handler.py

GitHub

worker-sdxl/src/rp_handler.py at main · runpod-workers/worker-sdxl

RunPod worker for Stable Diffusion XL. Contribute to runpod-workers/worker-sdxl development by creating an account on GitHub.

MinozarOP•12mo ago

but this is only to load the model in VRAM, first I'd need to download it from somewhere after reading the documentation I understand it has to be in from the cache of the image, so it has to be linked with the dockerfile somehow ok update, I just checked the dockerfile of the project you shared

# Base image
FROM runpod/base:0.4.2-cuda11.8.0

ENV HF_HUB_ENABLE_HF_TRANSFER=0

# Install Python dependencies (Worker Template)
COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
    python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
    rm /requirements.txt

# Cache Models
COPY builder/cache_models.py /cache_models.py
RUN python3.11 /cache_models.py && \
    rm /cache_models.py

# Add src files (Worker Template)
ADD src .

CMD python3.11 -u /rp_handler.py

# Base image
FROM runpod/base:0.4.2-cuda11.8.0

ENV HF_HUB_ENABLE_HF_TRANSFER=0

# Install Python dependencies (Worker Template)
COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
    python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
    rm /requirements.txt

# Cache Models
COPY builder/cache_models.py /cache_models.py
RUN python3.11 /cache_models.py && \
    rm /cache_models.py

# Add src files (Worker Template)
ADD src .

CMD python3.11 -u /rp_handler.py

------------- I think this is what I was looking for

# Cache Models
COPY builder/cache_models.py /cache_models.py
RUN python3.11 /cache_models.py && \
    rm /cache_models.py

# Cache Models
COPY builder/cache_models.py /cache_models.py
RUN python3.11 /cache_models.py && \
    rm /cache_models.py

Jason•12mo ago

Nono dont use RUN.. use CMD or Entrypoint

MinozarOP•12mo ago

well I'm even more confused now... that's on the official repo what's wrong with this dockerfile? I don't understand this part, that's the opposite of what shown on the repo

Jason•12mo ago

It's loading models at build time To the vram What you're looking is to download models at the build time then cache the models in vram on runtime No don't use run, use a script because run isn't gonna be run on the runtime Or on your runpod serverless

mr47•11mo ago

@Minozar 1. enable snapshot mode，reduce cold start time 2. Consider speeding up model loading and replacing a model with one that loads faster For cold start, the model will be reloaded every time, so it is normal to be slow.

flash-singh•11mo ago

enable flashboot if you haven't, if your using queue delay scale, set a lower time to scale faster

Gaming

Programming

Delay Time

Did you find this page helpful?