RunPod•15mo ago

Docker image cache

Hi there, I am quite new to RunPod so I could be wrong but my Docker image is quite large and before my serverless endpoint actually runs, the endpoint is in the 'Initializing' state for quite long. Is there a way to cache this image across endpoints or does this already happen? This is the first request I am doing so it might already be cached for this endpoint but not quite sure. I'd appreciate it! I am not using the network volume/storage so maybe that's also why.

81 Replies

ashleyk•15mo ago

Serverless images are already cached on workers, only Pod images are not cached unless someone recently used the same image on the same machine.

smokeOP•15mo ago

Ah I see! That's great. Is this across multiple endpoints?

ashleyk•15mo ago

No, workers, not endpoints. Different endpoints can have different workers, sometimes they are shared but not always.

smokeOP•15mo ago

Okay I see. It looks like my Delay Time is very high, because of this large Docker file. Is this also charged?

ashleyk•15mo ago

Don't send requests to your endpoint before the workers are ready

smokeOP•15mo ago

Ah that could be why. I was probably too fast. I am doing some benchmarks and I sent the request too fast

ashleyk•15mo ago

Yeah wait for it to say Ready not Initializing.

smokeOP•15mo ago

Yep that's my issue here.. Also @ashleyk , I saw in another thread that you said that webhooks are unreliable. Is this really true? I wanted to build my logic around webhooks to avoid any polling..

ashleyk•15mo ago

They are only as reliable as your webhook receiver. If its down for an extended period of time, it won't receive the webhook. I guess if you don't have an extended period of downtime its file because I assume there is a retry and backoff mechanism. I am am actually also busy changing my architecture to use webhooks because my IPs get rate limited when I make too many requests to poll the status.

smokeOP•15mo ago

Ahh I see, that does make sense! Thanks a lot!

smokeOP•15mo ago

I just re-created my endpoint and waited until the status at the top right was green and marked as "Ready". Then I sent a request to the endpoint and it went from Ready to Initializing again and then after some time, the process was actually running. Which resulted in quite a high delay time again..

smokeOP•15mo ago

(This is a different GPU though but still took quite long, the delay time)

ashleyk•15mo ago

Delay time includes the time it takes for your worker to load models etc before it actually calls runpod.serverless.start(). Your delay time can also be heavily impacted if all of your workers are throttled.

smokeOP•15mo ago

Hmm I am not quite sure. I don't think any of my workers were throttled, since I just re-created the endpoint as new basically

ashleyk•15mo ago

Workers can become throttled at any time

smokeOP•15mo ago

Hmmm I get what you mean but I don't really understand. Because that results into a higher credit consumption for me :/ And it sounds like I am just 'unlucky' because my workers were apparently throttled.

ashleyk•15mo ago

You don't get charged while your requests are in the queue, only while the worker is actually running.

smokeOP•15mo ago

I see but my delay time is still quite high.. Even though I sent the request once it was 'Ready'

ashleyk•15mo ago

Thats either because of throttling or cold start time. Check your cold start graph.

smokeOP•15mo ago

The execution time is normal, I measured that before aswell But my delay time was not that high before

ashleyk•15mo ago

your cols start time is 4 seconds which is the time it takes for your worker to load everything before calling runpod.serverless.start() 4 seconds cold start time is actually pretty decent. So the rest was because your workers are throttled. You can either change GPU tier to a different tier, or add an active worker. But you are charged for active workers because they are always running.

smokeOP•15mo ago

Yeah I see. I was also thinking of active workers but I think that will be a very high monthly bill haha, I don't think I am able to afford that just yet For the GPU tier, I'm not sure. I was running each GPU tier as a benchmark to see how long each GPU would take and how much it would cost. But the throttling stuff ruins my benchmarks haha, doesn't seem that reliable if I want to use it in production, if they can be throttled without doing anything

ashleyk•15mo ago

I switched all my endpoints to 48GB tier because too many workers were throttled with 24GB tier Throttling happens when demand is high because workers are shared between customers. If you use it in production and need high availbaility, its better to set at least 1 active worker.

smokeOP•15mo ago

Hmm yeah exactly. Is there some calculator of how much it would cost me? Since it is 40% cheaper but I think it will still be a lot every month, which is a big thing for me.

ashleyk•15mo ago

There is a calculator on this page: https://www.runpod.io/serverless-gpu

Serverless GPUs for AI Inference and Training

Serverless GPUs to deploy your ML models to production without worrying about infrastructure or scale.

ashleyk•15mo ago

Its a bit basic though, doesn't seem to count active workers.

zfmoodydub•15mo ago

i have a large image as well - 18GB, it builds without error locally, but when i initialize it in an endpoint it never gets past initialization. i may have tried to hit it while it was in initialization phase, could that cause it to never fully initialize? is my best course of action to try again and wait however long for it to say ready before i try to hit the endpoint? using FROM runpod/base:0.4.0-cuda11.8.0 base image

ashleyk•15mo ago

Sounds like a problem with your docker image, did you build it on a Mac? Also check the logs for your worker to look for any potential issues.

zfmoodydub•15mo ago

no logs in endpoint logs - i did build it on mac. should i retry building the image specifying --platform linux/amd64 in the cli command? would have thought the base image in the FROM line in the dockerfile would cover the platform.

ashleyk•15mo ago

Yes, you definitely need to add --platform linux/amd64.

zfmoodydub•15mo ago

thx will retry i rebuilt and redeployed an endpoint with the platform tag, now at >30 mins of initialization. any other tips on how to proceed? can reach out to the official support line on runpod's website too

ashleyk•15mo ago

Don't reuse the same tag, best practice is to use a different tag for each release, otherwise you break your existing workers. Using a different tag so you can do a new release and not affect your existing workers. Then your existing workers will go into a Stale state and your new workers will start up and gradually replace the stale ones, without causing any down time on your endpoint.

zfmoodydub•15mo ago

since im a runpod newbie, and its not critical that i have downtime, ive been deleting the endpoints that have failed (not gotten past initialization), and starting from scratch, using a different tag per new docker container i push. the tag i was referring to in my previous question (shouldnt have used the word tag) meant platform. as far as the docker tag, i am using a different tag each new try should i expect multi-hour initialization periods if the image is over 10GB? im trying to standup my first serverless endpoint here if it wasnt obvious by my questions.

ashleyk•15mo ago

10GB is a small image, if its taking hours to initialise, there is probably something wrong with your image, check worker logs as I mentioned previously.

zfmoodydub•15mo ago

by worker logs do you mean local docker container logs? there are no logs in my endpoint config/status page. local docker container logs:

2024-03-01 09:13:46 CUDA Version 11.8.0
2024-03-01 09:13:46 
2024-03-01 09:13:46 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2024-03-01 09:13:46 
2024-03-01 09:13:46 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2024-03-01 09:13:46 By pulling and using the container, you accept the terms and conditions of this license:
2024-03-01 09:13:46 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2024-03-01 09:13:46 
2024-03-01 09:13:46 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2024-03-01 09:13:46 
2024-03-01 09:13:46 WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
2024-03-01 09:13:46    Use the NVIDIA Container Toolkit to start this container with GPU support; see
2024-03-01 09:13:46    https://docs.nvidia.com/datacenter/cloud-native/ .
2024-03-01 09:13:46 
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Starting gunicorn 21.2.0
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Using worker: gthread
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [122] [INFO] Booting worker with pid: 122

ashleyk•15mo ago

Click on the boxes for your workers and view their logs, not the logs tab, you can only view logs in the logs tab when your endpoint is actually able to receive requests.

zfmoodydub•15mo ago

got it 2024-03-01T16:27:09Z error pulling image: Error response from daemon: pull access denied for username/imagename, repository does not exist or may require 'docker login': denied: requested access to the resource is denied riddled with this. docker image/container is on docker hub. it is set to private. im assuming first step is to set to public and try to re-deploy?

ashleyk•15mo ago

Yes, you either need to make it public or else add credentials.

ashleyk•15mo ago

Add your dockerhub username and an auth token.

ashleyk•15mo ago

Then select the credentials on your serverless template.

ashleyk•15mo ago

Then scale your workers down to zero and back up again so the change can take effect.

zfmoodydub•15mo ago

just turned dockerhub image to public, tried to re-deploy, getting this now: 2024-03-01T16:32:40Z error pulling image: Error response from daemon: manifest for username/imagename:latest not found: manifest unknown: manifest unknown before i go and add credentials, could this be pointing to a different problem?

ashleyk•15mo ago

add a tag to your image in the template then set workers to 0 and back again.

zfmoodydub•15mo ago

after doing both of those two tasks, active and idle workers are no longer showing up for me to inspect their logs

zfmoodydub•15mo ago

that is with 1 active and 3 max workers configured in the endpoint settings

ashleyk•15mo ago

This means nothing, what do worker logs say? Refresh the page if you don't see any workers.

zfmoodydub•15mo ago

after refresh no workers (no solid or dotted blue squares), cannot see the worker logs as there are no workers to select and view logs for

ashleyk•15mo ago

Try set them down to zero and back up again, don't know why this is happening.

zfmoodydub•14mo ago

yeah, still nothing ufortunately. will give it another 5 mins, try again from scratch. hey @ashleyk so ive done a bit more cleanup and i have a container, working locally, that has a server.py with a route name like /dosomething. this process works locally and ive built in a runpod base image. ive deployed this to a serverless runpod endpoint and i have ready workers. but when i execute the runsync command exactly as provided, the workers run indefinitely and the worker logs do not show anything past "worker is ready" i also have a question around the endpoint, the local container works as expected when i hit http://localhost:5000/dosomething. when i append /dosomething to the end of the runsync endpoint url, i get a 404 not found error. any chance you know why, or is there any documentation talking about how to handle server routes with runpod serverless endpoint urls? @flash-singh sorry if youre not the right person to ask... this is a minimal flask app btw - hence the server.py and app route

ashleyk•14mo ago

Read up on serverless docs, you are doing it wrong. You don't use routes and must use the RunPod SDK in serverless.

J.•14mo ago

https://blog.runpod.io/serverless-create-a-basic-api/

RunPod Blog

Serverless | Create a Custom Basic API

RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish

J.•14mo ago

Just make sure when u build it up do a —platform linux/amd64 something like that can google to verify

zfmoodydub•14mo ago

so i have been able to get the container to run successfully with test_input.json, i have been trying to move to this step: https://blog.runpod.io/workers-local-api-server-introduced-with-runpod-python-0-9-13/ when i start my container, with the ending dockerfile CMD: CMD ["python", "handler.py", "--rp_serve_api", "--rp_api_host", "0.0.0.0"] i get the following in the docker logs: 2024-03-06 07:23:01 --- Starting Serverless Worker | Version 1.6.2 --- 2024-03-06 07:23:01 INFO | Starting API server. 2024-03-06 07:23:01 DEBUG | Not deployed on RunPod serverless, pings will not be sent. 2024-03-06 07:23:01 INFO: Started server process [1] 2024-03-06 07:23:01 INFO: Waiting for application startup. 2024-03-06 07:23:01 INFO: Application startup complete. 2024-03-06 07:23:01 INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) when i should be getting: --- Starting Serverless Worker --- INFO: Started server process [32240] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit) based on the documentation. am i missing something? primarily concerned with the debug existing in my logs and not the documentation's. i can worry about the localhost address later

ashleyk•14mo ago

Documentation is outdated, what you are getting is correct for latest SDK version.

zfmoodydub•14mo ago

when i build the image and run the container with CMD ["python", "handler.py", "--rp_serve_api"] and go to http://localhost:8000/docs i do not see the API documentation page if that helps i just see page does not exist

ashleyk•14mo ago

There is no docs page oh I see the blog says there should be one, not sure why its not working I don't bother with it because the endpoint is pretty simple Just send a /runsync request to it

zfmoodydub•14mo ago

ok so when i execute the api against the url given in the container: http://localhost:8000/runsync i get Error: read ECONNRESET would you guess that thats a me problem specifically and not a runpod process problem? tried all different ports, 127.0.0.0, etc. port is not being used by anything else

ashleyk•14mo ago

Yeah something wrong with your local dev environment, it works fine

zfmoodydub•14mo ago

cool will keep debugging. thanks for your help

ashleyk•14mo ago

Try this to make it bind to all interfaces:

python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0

python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0

zfmoodydub•14mo ago

just wanted to make sure i dont need a runpod api key or something else in my handler that im missing

ashleyk•14mo ago

And if you are running it on a different machine than the one you are accessing it from, you obviously can't use localhost or 127.0.0.1 to access it.

zfmoodydub•14mo ago

clear on the last one

ashleyk•14mo ago

You do once its deployed but not for local testing.

zfmoodydub•14mo ago

ok i figured out the local port issue and can test successfully through postman. when i deploy to an endpoint on runpod, what should my container start field be filled with if i want to continue to test from postman, but want to hit the runpod endpoint instead? like what should the command be to overwrite the command in the dockerfile if python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0 worked for local testing? should i just rip the --rp_api_port 8000 from the command and just do --rp_serve_api --rp_api_host='0.0.0.0'?

ashleyk•14mo ago

Don't add a docker start command or call that rp_api stuff, its for local testing only

zfmoodydub•14mo ago

got it, so in the runpod endpoint config, i shouldnt put anything in that field. what about in the dockerfile for the deployed container, should i change that to anything else and re-build and deploy?

ashleyk•14mo ago

What does your dockerfile look like currently?

zfmoodydub•14mo ago

CMD python3 -u handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0 is the last line in the file proceeding with what i have, with a worker ready, when i execute a request via postman to the provided URL, the task is queued, and remains in queue and never proceeds until it hits my timeout. streaming on the job ID that pops up in the request tab in my endpoint is empty request ID and delay time currently: sync-ce26ad31-3d66... 570.38s

ashleyk•14mo ago

This is wrong. Should be:

CMD python3 -u handler.py

CMD python3 -u handler.py

The other stuff is for local testing only and should not be part of your docker image.

zfmoodydub•14mo ago

@ashleyk i got it to work all the way through and can now replicate that process accross the other microservices i am building, thank you so much for your help my friend cannot seem to find the root cause of an error in one of my tests. getting back the following error: Processing error: Expecting value: line 2 column 1 (char 1) assuming for the time being that since most of my tests were successful aside from this one, and this one being an outlier because it is a relatively longer running test, is this a common error to be received with runpod? is there a way to let runpod know i want to expect the response with Content-Type application/json?

ashleyk•14mo ago

It's this for Serverless or GPU cloud?

zfmoodydub•14mo ago

serverless

ashleyk•14mo ago

Which API were you calling when you got the error?

zfmoodydub•14mo ago

happening both in run and runsync - we figured out the root of the error: Processing error: Expecting value: line 2 column 1 (char 1) its a file size problem with our api that we are calling from runpod... different question now is how do we get better logs from runpod? seems as though when a job is failing/has failer, the worker logs will not open to show the log. not a big deal but hard to perform a traceback when the logs disappear, they are not in the endpoint logs either.

flash-singh•14mo ago

better serverless logs is a big priority for us, its in development and plan to roll that out by early april, its a complete rewrite of it

zfmoodydub•14mo ago

hey @ashleyk you mind kindly letting me shoot you a dm about another endpoint try?

Gaming

Programming

Docker image cache

Did you find this page helpful?