Can't get Warm/Cold status

I have tried "health" endpoint to retrieve Cold/Warm status of an endpoint but even having ready worker didn't mean the endpoint is warm. And it cold started. I need an indicator if the endpoint will cold start or it is still warm. Is it currently available in somewhere to retrieve that information and am I missing it? If not could you suggest a workaround if possible?
9 Replies
3WaD
3WaD2mo ago
A new feature that will automatically pre-warm your workers is in development/testing. Before it's released, I'm adding a prewarm method to my workers. If you have initialization of your app outside the handler (as recommended), it's very easy to add.
async def handler(job):
job_input = process_input(job["input"])

# Prewarm Flashboot request
if job_input == "prewarm":
yield {"warm": True}

if __name__ == "__main__":
initialize_engine() # init outside
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
async def handler(job):
job_input = process_input(job["input"])

# Prewarm Flashboot request
if job_input == "prewarm":
yield {"warm": True}

if __name__ == "__main__":
initialize_engine() # init outside
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
You can then send prewarm requests as needed or periodically to keep the workers warm.
"input": { "prewarm": true }
"input": { "prewarm": true }
hakankaan
hakankaanOP2mo ago
Thank you for sharing this. I think this levels up the usage of my application. I'm concerning this might have some drawbacks. I need to experiment with this. I will save this to my backlog and definetly look at it in very soon. In the mean time my need for checking warm cold status of endpoint still here.
3WaD
3WaD2mo ago
You can't target specific workers. Jobs are dynamically assigned to them, and they're shifting around constantly. You're not guaranteed a warm worker even when you've recently run a job and should have one. The only solution currently is keeping as many of them warm as possible. If you want to check if the whole endpoint has a warm worker available at that exact moment before you send the job itself, you could theoretically edit the code a bit to return the worker info regardless of the state. However, I am not sure how reliable it would be due to the active nature of the job balancer.
# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
global engine
job_input = process_input(job["input"])

# Check the warm status
if job_input == "prewarm":
yield {"warm": True if engine else False}
else: # normal request
engine = initialize_engine()
# ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__":
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
global engine
job_input = process_input(job["input"])

# Check the warm status
if job_input == "prewarm":
yield {"warm": True if engine else False}
else: # normal request
engine = initialize_engine()
# ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__":
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
hakankaan
hakankaanOP2mo ago
I was thinking about storing timestamp of last run of the endpoints in my app and give it a threshold to assume if it is warm or not. But from your latest answer which also stated in the following sentence that workaround wont work. " You're not guaranteed a warm worker even when you've recently run a job and should have one." Thank you. I'll implement pre warming then experiment with engine checking.
flash-singh
flash-singh2mo ago
we do prioritize warm / flashbooted workers first over cold ones as far as for this request, we have it in backlog to allow you to get warm state along with running, idle, etc
3WaD
3WaD2mo ago
They're prioritized but not guaranteed, right? Since I regularly send requests to a warm worker in testing, and after a few requests, suddenly, a cold start happens on a different one even though the warm worker is still marked as idle and ready in the endpoint.
flash-singh
flash-singh2mo ago
yes prioritized but not guaranteed, are you using queue delay scale?
3WaD
3WaD2mo ago
Yes, queue delay. Does the request count behave differently in deciding which worker to choose? I didn't think about that.
flash-singh
flash-singh2mo ago
nope its similar but scale for request count is easily determined so we can scale up faster if it falls behind since its simple math with count

Did you find this page helpful?