R
RunPod9mo ago
smoke

Docker image cache

Hi there, I am quite new to RunPod so I could be wrong but my Docker image is quite large and before my serverless endpoint actually runs, the endpoint is in the 'Initializing' state for quite long. Is there a way to cache this image across endpoints or does this already happen? This is the first request I am doing so it might already be cached for this endpoint but not quite sure. I'd appreciate it! I am not using the network volume/storage so maybe that's also why.
81 Replies
ashleyk
ashleyk9mo ago
Serverless images are already cached on workers, only Pod images are not cached unless someone recently used the same image on the same machine.
smoke
smokeOP9mo ago
Ah I see! That's great. Is this across multiple endpoints?
ashleyk
ashleyk9mo ago
No, workers, not endpoints. Different endpoints can have different workers, sometimes they are shared but not always.
smoke
smokeOP9mo ago
Okay I see. It looks like my Delay Time is very high, because of this large Docker file. Is this also charged?
No description
ashleyk
ashleyk9mo ago
Don't send requests to your endpoint before the workers are ready
smoke
smokeOP9mo ago
Ah that could be why. I was probably too fast. I am doing some benchmarks and I sent the request too fast
ashleyk
ashleyk9mo ago
Yeah wait for it to say Ready not Initializing.
smoke
smokeOP9mo ago
Yep that's my issue here.. Also @ashleyk , I saw in another thread that you said that webhooks are unreliable. Is this really true? I wanted to build my logic around webhooks to avoid any polling..
ashleyk
ashleyk9mo ago
They are only as reliable as your webhook receiver. If its down for an extended period of time, it won't receive the webhook. I guess if you don't have an extended period of downtime its file because I assume there is a retry and backoff mechanism. I am am actually also busy changing my architecture to use webhooks because my IPs get rate limited when I make too many requests to poll the status.
smoke
smokeOP9mo ago
Ahh I see, that does make sense! Thanks a lot!
smoke
smokeOP9mo ago
I just re-created my endpoint and waited until the status at the top right was green and marked as "Ready". Then I sent a request to the endpoint and it went from Ready to Initializing again and then after some time, the process was actually running. Which resulted in quite a high delay time again..
No description
smoke
smokeOP9mo ago
(This is a different GPU though but still took quite long, the delay time)
ashleyk
ashleyk9mo ago
Delay time includes the time it takes for your worker to load models etc before it actually calls runpod.serverless.start(). Your delay time can also be heavily impacted if all of your workers are throttled.
smoke
smokeOP9mo ago
Hmm I am not quite sure. I don't think any of my workers were throttled, since I just re-created the endpoint as new basically
ashleyk
ashleyk9mo ago
Workers can become throttled at any time
smoke
smokeOP9mo ago
Hmmm I get what you mean but I don't really understand. Because that results into a higher credit consumption for me :/ And it sounds like I am just 'unlucky' because my workers were apparently throttled.
ashleyk
ashleyk9mo ago
You don't get charged while your requests are in the queue, only while the worker is actually running.
smoke
smokeOP9mo ago
I see but my delay time is still quite high.. Even though I sent the request once it was 'Ready'
ashleyk
ashleyk9mo ago
Thats either because of throttling or cold start time. Check your cold start graph.
smoke
smokeOP9mo ago
No description
smoke
smokeOP9mo ago
The execution time is normal, I measured that before aswell But my delay time was not that high before
ashleyk
ashleyk9mo ago
your cols start time is 4 seconds which is the time it takes for your worker to load everything before calling runpod.serverless.start() 4 seconds cold start time is actually pretty decent. So the rest was because your workers are throttled. You can either change GPU tier to a different tier, or add an active worker. But you are charged for active workers because they are always running.
smoke
smokeOP9mo ago
Yeah I see. I was also thinking of active workers but I think that will be a very high monthly bill haha, I don't think I am able to afford that just yet For the GPU tier, I'm not sure. I was running each GPU tier as a benchmark to see how long each GPU would take and how much it would cost. But the throttling stuff ruins my benchmarks haha, doesn't seem that reliable if I want to use it in production, if they can be throttled without doing anything
ashleyk
ashleyk9mo ago
I switched all my endpoints to 48GB tier because too many workers were throttled with 24GB tier Throttling happens when demand is high because workers are shared between customers. If you use it in production and need high availbaility, its better to set at least 1 active worker.
smoke
smokeOP9mo ago
Hmm yeah exactly. Is there some calculator of how much it would cost me? Since it is 40% cheaper but I think it will still be a lot every month, which is a big thing for me.
ashleyk
ashleyk9mo ago
There is a calculator on this page: https://www.runpod.io/serverless-gpu
Serverless GPUs for AI Inference and Training
Serverless GPUs to deploy your ML models to production without worrying about infrastructure or scale.
ashleyk
ashleyk9mo ago
Its a bit basic though, doesn't seem to count active workers.
zfmoodydub
zfmoodydub9mo ago
i have a large image as well - 18GB, it builds without error locally, but when i initialize it in an endpoint it never gets past initialization. i may have tried to hit it while it was in initialization phase, could that cause it to never fully initialize? is my best course of action to try again and wait however long for it to say ready before i try to hit the endpoint? using FROM runpod/base:0.4.0-cuda11.8.0 base image
ashleyk
ashleyk9mo ago
Sounds like a problem with your docker image, did you build it on a Mac? Also check the logs for your worker to look for any potential issues.
zfmoodydub
zfmoodydub9mo ago
no logs in endpoint logs - i did build it on mac. should i retry building the image specifying --platform linux/amd64 in the cli command? would have thought the base image in the FROM line in the dockerfile would cover the platform.
ashleyk
ashleyk9mo ago
Yes, you definitely need to add --platform linux/amd64.
zfmoodydub
zfmoodydub9mo ago
thx will retry i rebuilt and redeployed an endpoint with the platform tag, now at >30 mins of initialization. any other tips on how to proceed? can reach out to the official support line on runpod's website too
ashleyk
ashleyk9mo ago
Don't reuse the same tag, best practice is to use a different tag for each release, otherwise you break your existing workers. Using a different tag so you can do a new release and not affect your existing workers. Then your existing workers will go into a Stale state and your new workers will start up and gradually replace the stale ones, without causing any down time on your endpoint.
No description
zfmoodydub
zfmoodydub9mo ago
since im a runpod newbie, and its not critical that i have downtime, ive been deleting the endpoints that have failed (not gotten past initialization), and starting from scratch, using a different tag per new docker container i push. the tag i was referring to in my previous question (shouldnt have used the word tag) meant platform. as far as the docker tag, i am using a different tag each new try should i expect multi-hour initialization periods if the image is over 10GB? im trying to standup my first serverless endpoint here if it wasnt obvious by my questions.
ashleyk
ashleyk9mo ago
10GB is a small image, if its taking hours to initialise, there is probably something wrong with your image, check worker logs as I mentioned previously.
zfmoodydub
zfmoodydub9mo ago
by worker logs do you mean local docker container logs? there are no logs in my endpoint config/status page. local docker container logs: 2024-03-01 09:13:46 CUDA Version 11.8.0 2024-03-01 09:13:46 2024-03-01 09:13:46 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-03-01 09:13:46 2024-03-01 09:13:46 This container image and its contents are governed by the NVIDIA Deep Learning Container License. 2024-03-01 09:13:46 By pulling and using the container, you accept the terms and conditions of this license: 2024-03-01 09:13:46 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 2024-03-01 09:13:46 2024-03-01 09:13:46 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. 2024-03-01 09:13:46 2024-03-01 09:13:46 WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. 2024-03-01 09:13:46 Use the NVIDIA Container Toolkit to start this container with GPU support; see 2024-03-01 09:13:46 https://docs.nvidia.com/datacenter/cloud-native/ . 2024-03-01 09:13:46 2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Starting gunicorn 21.2.0 2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1) 2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [1] [INFO] Using worker: gthread 2024-03-01 09:13:46 [2024-03-01 14:13:46 +0000] [122] [INFO] Booting worker with pid: 122
No description
ashleyk
ashleyk9mo ago
Click on the boxes for your workers and view their logs, not the logs tab, you can only view logs in the logs tab when your endpoint is actually able to receive requests.
zfmoodydub
zfmoodydub9mo ago
got it 2024-03-01T16:27:09Z error pulling image: Error response from daemon: pull access denied for username/imagename, repository does not exist or may require 'docker login': denied: requested access to the resource is denied riddled with this. docker image/container is on docker hub. it is set to private. im assuming first step is to set to public and try to re-deploy?
ashleyk
ashleyk9mo ago
Yes, you either need to make it public or else add credentials.
ashleyk
ashleyk9mo ago
No description
ashleyk
ashleyk9mo ago
No description
ashleyk
ashleyk9mo ago
Add your dockerhub username and an auth token.
ashleyk
ashleyk9mo ago
Then select the credentials on your serverless template.
No description
ashleyk
ashleyk9mo ago
Then scale your workers down to zero and back up again so the change can take effect.
zfmoodydub
zfmoodydub9mo ago
just turned dockerhub image to public, tried to re-deploy, getting this now: 2024-03-01T16:32:40Z error pulling image: Error response from daemon: manifest for username/imagename:latest not found: manifest unknown: manifest unknown before i go and add credentials, could this be pointing to a different problem?
ashleyk
ashleyk9mo ago
add a tag to your image in the template then set workers to 0 and back again.
zfmoodydub
zfmoodydub9mo ago
after doing both of those two tasks, active and idle workers are no longer showing up for me to inspect their logs
zfmoodydub
zfmoodydub9mo ago
No description
zfmoodydub
zfmoodydub9mo ago
that is with 1 active and 3 max workers configured in the endpoint settings
ashleyk
ashleyk9mo ago
This means nothing, what do worker logs say? Refresh the page if you don't see any workers.
zfmoodydub
zfmoodydub9mo ago
after refresh no workers (no solid or dotted blue squares), cannot see the worker logs as there are no workers to select and view logs for
ashleyk
ashleyk9mo ago
Try set them down to zero and back up again, don't know why this is happening.
zfmoodydub
zfmoodydub9mo ago
yeah, still nothing ufortunately. will give it another 5 mins, try again from scratch. hey @ashleyk so ive done a bit more cleanup and i have a container, working locally, that has a server.py with a route name like /dosomething. this process works locally and ive built in a runpod base image. ive deployed this to a serverless runpod endpoint and i have ready workers. but when i execute the runsync command exactly as provided, the workers run indefinitely and the worker logs do not show anything past "worker is ready" i also have a question around the endpoint, the local container works as expected when i hit http://localhost:5000/dosomething. when i append /dosomething to the end of the runsync endpoint url, i get a 404 not found error. any chance you know why, or is there any documentation talking about how to handle server routes with runpod serverless endpoint urls? @flash-singh sorry if youre not the right person to ask... this is a minimal flask app btw - hence the server.py and app route
ashleyk
ashleyk9mo ago
Read up on serverless docs, you are doing it wrong. You don't use routes and must use the RunPod SDK in serverless.
justin
justin9mo ago
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
justin
justin9mo ago
Just make sure when u build it up do a —platform linux/amd64 something like that can google to verify
zfmoodydub
zfmoodydub9mo ago
so i have been able to get the container to run successfully with test_input.json, i have been trying to move to this step: https://blog.runpod.io/workers-local-api-server-introduced-with-runpod-python-0-9-13/ when i start my container, with the ending dockerfile CMD: CMD ["python", "handler.py", "--rp_serve_api", "--rp_api_host", "0.0.0.0"] i get the following in the docker logs: 2024-03-06 07:23:01 --- Starting Serverless Worker | Version 1.6.2 --- 2024-03-06 07:23:01 INFO | Starting API server. 2024-03-06 07:23:01 DEBUG | Not deployed on RunPod serverless, pings will not be sent. 2024-03-06 07:23:01 INFO: Started server process [1] 2024-03-06 07:23:01 INFO: Waiting for application startup. 2024-03-06 07:23:01 INFO: Application startup complete. 2024-03-06 07:23:01 INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit) when i should be getting: --- Starting Serverless Worker --- INFO: Started server process [32240] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit) based on the documentation. am i missing something? primarily concerned with the debug existing in my logs and not the documentation's. i can worry about the localhost address later
ashleyk
ashleyk9mo ago
Documentation is outdated, what you are getting is correct for latest SDK version.
zfmoodydub
zfmoodydub9mo ago
when i build the image and run the container with CMD ["python", "handler.py", "--rp_serve_api"] and go to http://localhost:8000/docs i do not see the API documentation page if that helps i just see page does not exist
ashleyk
ashleyk9mo ago
There is no docs page oh I see the blog says there should be one, not sure why its not working I don't bother with it because the endpoint is pretty simple Just send a /runsync request to it
zfmoodydub
zfmoodydub9mo ago
ok so when i execute the api against the url given in the container: http://localhost:8000/runsync i get Error: read ECONNRESET would you guess that thats a me problem specifically and not a runpod process problem? tried all different ports, 127.0.0.0, etc. port is not being used by anything else
ashleyk
ashleyk9mo ago
Yeah something wrong with your local dev environment, it works fine
zfmoodydub
zfmoodydub9mo ago
cool will keep debugging. thanks for your help
ashleyk
ashleyk9mo ago
Try this to make it bind to all interfaces:
python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0
python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0
zfmoodydub
zfmoodydub9mo ago
just wanted to make sure i dont need a runpod api key or something else in my handler that im missing
ashleyk
ashleyk9mo ago
And if you are running it on a different machine than the one you are accessing it from, you obviously can't use localhost or 127.0.0.1 to access it.
zfmoodydub
zfmoodydub9mo ago
clear on the last one
ashleyk
ashleyk9mo ago
You do once its deployed but not for local testing.
zfmoodydub
zfmoodydub9mo ago
ok i figured out the local port issue and can test successfully through postman. when i deploy to an endpoint on runpod, what should my container start field be filled with if i want to continue to test from postman, but want to hit the runpod endpoint instead? like what should the command be to overwrite the command in the dockerfile if python3 -u rp_handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0 worked for local testing? should i just rip the --rp_api_port 8000 from the command and just do --rp_serve_api --rp_api_host='0.0.0.0'?
ashleyk
ashleyk9mo ago
Don't add a docker start command or call that rp_api stuff, its for local testing only
zfmoodydub
zfmoodydub9mo ago
got it, so in the runpod endpoint config, i shouldnt put anything in that field. what about in the dockerfile for the deployed container, should i change that to anything else and re-build and deploy?
ashleyk
ashleyk9mo ago
What does your dockerfile look like currently?
zfmoodydub
zfmoodydub9mo ago
CMD python3 -u handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0 is the last line in the file proceeding with what i have, with a worker ready, when i execute a request via postman to the provided URL, the task is queued, and remains in queue and never proceeds until it hits my timeout. streaming on the job ID that pops up in the request tab in my endpoint is empty request ID and delay time currently: sync-ce26ad31-3d66... 570.38s
ashleyk
ashleyk9mo ago
This is wrong. Should be:
CMD python3 -u handler.py
CMD python3 -u handler.py
The other stuff is for local testing only and should not be part of your docker image.
zfmoodydub
zfmoodydub9mo ago
@ashleyk i got it to work all the way through and can now replicate that process accross the other microservices i am building, thank you so much for your help my friend cannot seem to find the root cause of an error in one of my tests. getting back the following error: Processing error: Expecting value: line 2 column 1 (char 1) assuming for the time being that since most of my tests were successful aside from this one, and this one being an outlier because it is a relatively longer running test, is this a common error to be received with runpod? is there a way to let runpod know i want to expect the response with Content-Type application/json?
ashleyk
ashleyk9mo ago
It's this for Serverless or GPU cloud?
zfmoodydub
zfmoodydub9mo ago
serverless
ashleyk
ashleyk9mo ago
Which API were you calling when you got the error?
zfmoodydub
zfmoodydub9mo ago
happening both in run and runsync - we figured out the root of the error: Processing error: Expecting value: line 2 column 1 (char 1) its a file size problem with our api that we are calling from runpod... different question now is how do we get better logs from runpod? seems as though when a job is failing/has failer, the worker logs will not open to show the log. not a big deal but hard to perform a traceback when the logs disappear, they are not in the endpoint logs either.
flash-singh
flash-singh9mo ago
better serverless logs is a big priority for us, its in development and plan to roll that out by early april, its a complete rewrite of it
zfmoodydub
zfmoodydub9mo ago
hey @ashleyk you mind kindly letting me shoot you a dm about another endpoint try?
Want results from more Discord servers?
Add your server