Can't get Warm/Cold status

I have tried "health" endpoint to retrieve Cold/Warm status of an endpoint but even having ready worker didn't mean the endpoint is warm. And it cold started. I need an indicator if the endpoint will cold start or it is still warm. Is it currently available in somewhere to retrieve that information and am I missing it? If not could you suggest a workaround if possible?
8 Replies
3WaD
3WaD4d ago
A new feature that will automatically pre-warm your workers is in development/testing. Before it's released, I'm adding a prewarm method to my workers. If you have initialization of your app outside the handler (as recommended), it's very easy to add.
async def handler(job):
job_input = process_input(job["input"])

# Prewarm Flashboot request
if job_input == "prewarm":
yield {"warm": True}

if __name__ == "__main__":
initialize_engine() # init outside
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
async def handler(job):
job_input = process_input(job["input"])

# Prewarm Flashboot request
if job_input == "prewarm":
yield {"warm": True}

if __name__ == "__main__":
initialize_engine() # init outside
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
You can then send prewarm requests as needed or periodically to keep the workers warm.
"input": { "prewarm": true }
"input": { "prewarm": true }
hakankaan
hakankaanOP4d ago
Thank you for sharing this. I think this levels up the usage of my application. I'm concerning this might have some drawbacks. I need to experiment with this. I will save this to my backlog and definetly look at it in very soon. In the mean time my need for checking warm cold status of endpoint still here.
3WaD
3WaD4d ago
You can't target specific workers. Jobs are dynamically assigned to them, and they're shifting around constantly. You're not guaranteed a warm worker even when you've recently run a job and should have one. The only solution currently is keeping as many of them warm as possible. If you want to check if the whole endpoint has a warm worker available at that exact moment before you send the job itself, you could theoretically edit the code a bit to return the worker info regardless of the state. However, I am not sure how reliable it would be due to the active nature of the job balancer.
# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
global engine
job_input = process_input(job["input"])

# Check the warm status
if job_input == "prewarm":
yield {"warm": True if engine else False}
else: # normal request
engine = initialize_engine()
# ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__":
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
global engine
job_input = process_input(job["input"])

# Check the warm status
if job_input == "prewarm":
yield {"warm": True if engine else False}
else: # normal request
engine = initialize_engine()
# ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__":
runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})
hakankaan
hakankaanOP3d ago
I was thinking about storing timestamp of last run of the endpoints in my app and give it a threshold to assume if it is warm or not. But from your latest answer which also stated in the following sentence that workaround wont work. " You're not guaranteed a warm worker even when you've recently run a job and should have one." Thank you. I'll implement pre warming then experiment with engine checking.
flash-singh
flash-singh3d ago
we do prioritize warm / flashbooted workers first over cold ones as far as for this request, we have it in backlog to allow you to get warm state along with running, idle, etc
3WaD
3WaD3d ago
They're prioritized but not guaranteed, right? Since I regularly send requests to a warm worker in testing, and after a few requests, suddenly, a cold start happens on a different one even though the warm worker is still marked as idle and ready in the endpoint.
flash-singh
flash-singh3d ago
yes prioritized but not guaranteed, are you using queue delay scale?
3WaD
3WaD3d ago
Yes, queue delay. Does the request count behave differently in deciding which worker to choose? I didn't think about that.

Did you find this page helpful?