RunPod•2mo ago

Can't get Warm/Cold status

I have tried "health" endpoint to retrieve Cold/Warm status of an endpoint but even having ready worker didn't mean the endpoint is warm. And it cold started. I need an indicator if the endpoint will cold start or it is still warm. Is it currently available in somewhere to retrieve that information and am I missing it? If not could you suggest a workaround if possible?

9 Replies

3WaD•2mo ago

A new feature that will automatically pre-warm your workers is in development/testing. Before it's released, I'm adding a prewarm method to my workers. If you have initialization of your app outside the handler (as recommended), it's very easy to add.

async def handler(job):
  job_input = process_input(job["input"])

  # Prewarm Flashboot request
  if job_input == "prewarm": 
    yield {"warm": True}

if __name__ == "__main__":
  initialize_engine() # init outside
  runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})

async def handler(job):
  job_input = process_input(job["input"])

  # Prewarm Flashboot request
  if job_input == "prewarm": 
    yield {"warm": True}

if __name__ == "__main__":
  initialize_engine() # init outside
  runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})

You can then send prewarm requests as needed or periodically to keep the workers warm.

"input": { "prewarm": true }

"input": { "prewarm": true }

hakankaanOP•2mo ago

Thank you for sharing this. I think this levels up the usage of my application. I'm concerning this might have some drawbacks. I need to experiment with this. I will save this to my backlog and definetly look at it in very soon. In the mean time my need for checking warm cold status of endpoint still here.

3WaD•2mo ago

You can't target specific workers. Jobs are dynamically assigned to them, and they're shifting around constantly. You're not guaranteed a warm worker even when you've recently run a job and should have one. The only solution currently is keeping as many of them warm as possible. If you want to check if the whole endpoint has a warm worker available at that exact moment before you send the job itself, you could theoretically edit the code a bit to return the worker info regardless of the state. However, I am not sure how reliable it would be due to the active nature of the job balancer.

# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
  global engine
  job_input = process_input(job["input"])

  # Check the warm status
  if job_input == "prewarm":
    yield {"warm": True if engine else False}
  else: # normal request
    engine = initialize_engine()
    # ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__": 
  runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})

# ------------------- #
#   RunPod Handler    #
# ------------------- #
engine = None
async def handler(job):
  global engine
  job_input = process_input(job["input"])

  # Check the warm status
  if job_input == "prewarm":
    yield {"warm": True if engine else False}
  else: # normal request
    engine = initialize_engine()
    # ...
# ------------------ #
#   Entrypoint       #
# ------------------ #
if __name__ == "__main__": 
  runpod.serverless.start({"handler": handler, "concurrency_modifier": concurrency_modifier, "return_aggregate_stream": True})

hakankaanOP•2mo ago

I was thinking about storing timestamp of last run of the endpoints in my app and give it a threshold to assume if it is warm or not. But from your latest answer which also stated in the following sentence that workaround wont work. " You're not guaranteed a warm worker even when you've recently run a job and should have one." Thank you. I'll implement pre warming then experiment with engine checking.

flash-singh•2mo ago

we do prioritize warm / flashbooted workers first over cold ones as far as for this request, we have it in backlog to allow you to get warm state along with running, idle, etc

3WaD•2mo ago

They're prioritized but not guaranteed, right? Since I regularly send requests to a warm worker in testing, and after a few requests, suddenly, a cold start happens on a different one even though the warm worker is still marked as idle and ready in the endpoint.

flash-singh•2mo ago

yes prioritized but not guaranteed, are you using queue delay scale?

3WaD•2mo ago

Yes, queue delay. Does the request count behave differently in deciding which worker to choose? I didn't think about that.

flash-singh•2mo ago

nope its similar but scale for request count is easily determined so we can scale up faster if it falls behind since its simple math with count

Gaming

Programming

Can't get Warm/Cold status

Did you find this page helpful?