Docker image cache
Hi there,
I am quite new to RunPod so I could be wrong but my Docker image is quite large and before my serverless endpoint actually runs, the endpoint is in the 'Initializing' state for quite long. Is there a way to cache this image across endpoints or does this already happen? This is the first request I am doing so it might already be cached for this endpoint but not quite sure.
I'd appreciate it! I am not using the network volume/storage so maybe that's also why.
81 Replies
Serverless images are already cached on workers, only Pod images are not cached unless someone recently used the same image on the same machine.
Ah I see! That's great. Is this across multiple endpoints?
No, workers, not endpoints.
Different endpoints can have different workers, sometimes they are shared but not always.
Okay I see. It looks like my Delay Time is very high, because of this large Docker file. Is this also charged?
Don't send requests to your endpoint before the workers are ready
Ah that could be why. I was probably too fast.
I am doing some benchmarks and I sent the request too fast
Yeah wait for it to say
Ready
not Initializing
.Yep that's my issue here..
Also @ashleyk , I saw in another thread that you said that webhooks are unreliable. Is this really true?
I wanted to build my logic around webhooks to avoid any polling..
They are only as reliable as your webhook receiver. If its down for an extended period of time, it won't receive the webhook. I guess if you don't have an extended period of downtime its file because I assume there is a retry and backoff mechanism.
I am am actually also busy changing my architecture to use webhooks because my IPs get rate limited when I make too many requests to poll the status.
Ahh I see, that does make sense! Thanks a lot!
I just re-created my endpoint and waited until the status at the top right was green and marked as "Ready".
Then I sent a request to the endpoint and it went from Ready to Initializing again and then after some time, the process was actually running. Which resulted in quite a high delay time again..
(This is a different GPU though but still took quite long, the delay time)
Delay time includes the time it takes for your worker to load models etc before it actually calls
runpod.serverless.start()
.
Your delay time can also be heavily impacted if all of your workers are throttled.Hmm I am not quite sure. I don't think any of my workers were throttled, since I just re-created the endpoint as new basically
Workers can become throttled at any time
Hmmm
I get what you mean but I don't really understand. Because that results into a higher credit consumption for me :/
And it sounds like I am just 'unlucky' because my workers were apparently throttled.
You don't get charged while your requests are in the queue, only while the worker is actually running.
I see but my delay time is still quite high..
Even though I sent the request once it was 'Ready'
Thats either because of throttling or cold start time. Check your cold start graph.
The execution time is normal, I measured that before aswell
But my delay time was not that high before
your cols start time is 4 seconds which is the time it takes for your worker to load everything before calling runpod.serverless.start()
4 seconds cold start time is actually pretty decent.
So the rest was because your workers are throttled.
You can either change GPU tier to a different tier, or add an active worker.
But you are charged for active workers because they are always running.
Yeah I see. I was also thinking of active workers but I think that will be a very high monthly bill haha, I don't think I am able to afford that just yet
For the GPU tier, I'm not sure. I was running each GPU tier as a benchmark to see how long each GPU would take and how much it would cost.
But the throttling stuff ruins my benchmarks haha, doesn't seem that reliable if I want to use it in production, if they can be throttled without doing anything
I switched all my endpoints to 48GB tier because too many workers were throttled with 24GB tier
Throttling happens when demand is high because workers are shared between customers.
If you use it in production and need high availbaility, its better to set at least 1 active worker.
Hmm yeah exactly. Is there some calculator of how much it would cost me? Since it is 40% cheaper but I think it will still be a lot every month, which is a big thing for me.
There is a calculator on this page:
https://www.runpod.io/serverless-gpu
Serverless GPUs for AI Inference and Training
Serverless GPUs to deploy your ML models to production without worrying about infrastructure or scale.
Its a bit basic though, doesn't seem to count active workers.
i have a large image as well - 18GB, it builds without error locally, but when i initialize it in an endpoint it never gets past initialization. i may have tried to hit it while it was in initialization phase, could that cause it to never fully initialize? is my best course of action to try again and wait however long for it to say ready before i try to hit the endpoint?
using FROM runpod/base:0.4.0-cuda11.8.0
base image
Sounds like a problem with your docker image, did you build it on a Mac?
Also check the logs for your worker to look for any potential issues.
no logs in endpoint logs - i did build it on mac. should i retry building the image specifying --platform linux/amd64 in the cli command? would have thought the base image in the FROM line in the dockerfile would cover the platform.
Yes, you definitely need to add
--platform linux/amd64
.thx will retry
i rebuilt and redeployed an endpoint with the platform tag, now at >30 mins of initialization. any other tips on how to proceed? can reach out to the official support line on runpod's website too
Don't reuse the same tag, best practice is to use a different tag for each release, otherwise you break your existing workers. Using a different tag so you can do a new release and not affect your existing workers. Then your existing workers will go into a
Stale
state and your new workers will start up and gradually replace the stale ones, without causing any down time on your endpoint.since im a runpod newbie, and its not critical that i have downtime, ive been deleting the endpoints that have failed (not gotten past initialization), and starting from scratch, using a different tag per new docker container i push.
the tag i was referring to in my previous question (shouldnt have used the word tag) meant platform. as far as the docker tag, i am using a different tag each new try
should i expect multi-hour initialization periods if the image is over 10GB?
im trying to standup my first serverless endpoint here if it wasnt obvious by my questions.
10GB is a small image, if its taking hours to initialise, there is probably something wrong with your image, check worker logs as I mentioned previously.
by worker logs do you mean local docker container logs? there are no logs in my endpoint config/status page.
local docker container logs:
2024-03-01 09:13:46 CUDA Version 11.8.0
2024-03-01 09:13:46
2024-03-01 09:13:46 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2024-03-01 09:13:46
2024-03-01 09:13:46 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2024-03-01 09:13:46 By pulling and using the container, you accept the terms and conditions of this license:
2024-03-01 09:13:46 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2024-03-01 09:13:46
2024-03-01 09:13:46 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2024-03-01 09:13:46
2024-03-01 09:13:46 WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
2024-03-01 09:13:46 Use the NVIDIA Container Toolkit to start this container with GPU support; see
2024-03-01 09:13:46 https://docs.nvidia.com/datacenter/cloud-native/ .
2024-03-01 09:13:46
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Starting gunicorn 21.2.0
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Using worker: gthread
2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [122] [INFO] Booting worker with pid: 122
Click on the boxes for your workers and view their logs, not the logs tab, you can only view logs in the logs tab when your endpoint is actually able to receive requests.
got it
2024-03-01T16:27:09Z error pulling image: Error response from daemon: pull access denied for username/imagename, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
riddled with this. docker image/container is on docker hub. it is set to private. im assuming first step is to set to public and try to re-deploy?
Yes, you either need to make it public or else add credentials.
Add your dockerhub username and an auth token.
Then select the credentials on your serverless template.
Then scale your workers down to zero and back up again so the change can take effect.
just turned dockerhub image to public, tried to re-deploy, getting this now:
2024-03-01T16:32:40Z error pulling image: Error response from daemon: manifest for username/imagename:latest not found: manifest unknown: manifest unknown
before i go and add credentials, could this be pointing to a different problem?
add a tag to your image in the template
then set workers to 0 and back again.
after doing both of those two tasks, active and idle workers are no longer showing up for me to inspect their logs
that is with 1 active and 3 max workers configured in the endpoint settings
This means nothing, what do worker logs say?
Refresh the page if you don't see any workers.
after refresh no workers (no solid or dotted blue squares), cannot see the worker logs as there are no workers to select and view logs for
Try set them down to zero and back up again, don't know why this is happening.
yeah, still nothing ufortunately. will give it another 5 mins, try again from scratch.
hey @ashleyk so ive done a bit more cleanup and i have a container, working locally, that has a server.py with a route name like /dosomething. this process works locally and ive built in a runpod base image. ive deployed this to a serverless runpod endpoint and i have ready workers. but when i execute the runsync command exactly as provided, the workers run indefinitely and the worker logs do not show anything past "worker is ready"
i also have a question around the endpoint, the local container works as expected when i hit http://localhost:5000/dosomething. when i append /dosomething to the end of the runsync endpoint url, i get a 404 not found error. any chance you know why, or is there any documentation talking about how to handle server routes with runpod serverless endpoint urls?
@flash-singh sorry if youre not the right person to ask...
this is a minimal flask app btw - hence the server.py and app route
Read up on serverless docs, you are doing it wrong. You don't use routes and must use the RunPod SDK in serverless.
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
Just make sure when u build it up do a —platform linux/amd64
something like that can google to verify
so i have been able to get the container to run successfully with test_input.json, i have been trying to move to this step:
https://blog.runpod.io/workers-local-api-server-introduced-with-runpod-python-0-9-13/
when i start my container, with the ending dockerfile CMD:
CMD ["python", "handler.py", "--rp_serve_api", "--rp_api_host", "0.0.0.0"]
i get the following in the docker logs:
2024-03-06 07:23:01 --- Starting Serverless Worker | Version 1.6.2 ---
2024-03-06 07:23:01 INFO | Starting API server.
2024-03-06 07:23:01 DEBUG | Not deployed on RunPod serverless, pings will not be sent.
2024-03-06 07:23:01 INFO: Started server process [1]
2024-03-06 07:23:01 INFO: Waiting for application startup.
2024-03-06 07:23:01 INFO: Application startup complete.
2024-03-06 07:23:01 INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
when i should be getting:
--- Starting Serverless Worker ---
INFO: Started server process [32240]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
based on the documentation. am i missing something? primarily concerned with the debug existing in my logs and not the documentation's. i can worry about the localhost address later
Documentation is outdated, what you are getting is correct for latest SDK version.
when i build the image and run the container with CMD ["python", "handler.py", "--rp_serve_api"] and go to http://localhost:8000/docs i do not see the API documentation page if that helps
i just see page does not exist
There is no docs page
oh I see the blog says there should be one, not sure why its not working
I don't bother with it because the endpoint is pretty simple
Just send a /runsync request to it
ok so when i execute the api against the url given in the container:
http://localhost:8000/runsync
i get
Error: read ECONNRESET
would you guess that thats a me problem specifically and not a runpod process problem? tried all different ports, 127.0.0.0, etc. port is not being used by anything else
Yeah something wrong with your local dev environment, it works fine
cool will keep debugging. thanks for your help
Try this to make it bind to all interfaces:
just wanted to make sure i dont need a runpod api key or something else in my handler that im missing
And if you are running it on a different machine than the one you are accessing it from, you obviously can't use
localhost
or 127.0.0.1
to access it.clear on the last one
You do once its deployed but not for local testing.
ok i figured out the local port issue and can test successfully through postman.
when i deploy to an endpoint on runpod, what should my container start field be filled with if i want to continue to test from postman, but want to hit the runpod endpoint instead? like what should the command be to overwrite the command in the dockerfile if
python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0
worked for local testing?
should i just rip the --rp_api_port 8000 from the command and just do --rp_serve_api --rp_api_host='0.0.0.0'?
Don't add a docker start command or call that rp_api stuff, its for local testing only
got it, so in the runpod endpoint config, i shouldnt put anything in that field. what about in the dockerfile for the deployed container, should i change that to anything else and re-build and deploy?
What does your dockerfile look like currently?
CMD python3 -u handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0
is the last line in the file
proceeding with what i have, with a worker ready, when i execute a request via postman to the provided URL, the task is queued, and remains in queue and never proceeds until it hits my timeout.
streaming on the job ID that pops up in the request tab in my endpoint is empty
request ID and delay time currently:
sync-ce26ad31-3d66...
570.38s
This is wrong.
Should be:
The other stuff is for local testing only and should not be part of your docker image.
@ashleyk i got it to work all the way through and can now replicate that process accross the other microservices i am building, thank you so much for your help my friend
cannot seem to find the root cause of an error in one of my tests.
getting back the following error:
Processing error: Expecting value: line 2 column 1 (char 1)
assuming for the time being that since most of my tests were successful aside from this one, and this one being an outlier because it is a relatively longer running test, is this a common error to be received with runpod?
is there a way to let runpod know i want to expect the response with Content-Type application/json?
It's this for Serverless or GPU cloud?
serverless
Which API were you calling when you got the error?
happening both in run and runsync - we figured out the root of the error:
Processing error: Expecting value: line 2 column 1 (char 1)
its a file size problem with our api that we are calling from runpod...
different question now is how do we get better logs from runpod?
seems as though when a job is failing/has failer, the worker logs will not open to show the log.
not a big deal but hard to perform a traceback when the logs disappear, they are not in the endpoint logs either.
better serverless logs is a big priority for us, its in development and plan to roll that out by early april, its a complete rewrite of it
hey @ashleyk you mind kindly letting me shoot you a dm about another endpoint try?