RunPod•14mo ago

Status endpoint only returns "COMPLETED" but no answer to the question

I'm currently using the v2/model_id/status/run_id endpoint and the results I get is follows: {"delaytime": 26083, "executionTime":35737, "id": **, "status": "COMPLETED"} My stream endpoint works fine but for my purposes I'd rather wait longer and retrieve the entire result at once, how am I supposed to do that? Thank you

Solution:

Okay… 1) What is deployed to runpod is: https://github.com/hommayushi3/exllama-runpod-serverless/blob/master/handler.py ...

GitHub

exllama-runpod-serverless/handler.py at master · hommayushi3/exllam...

Contribute to hommayushi3/exllama-runpod-serverless development by creating an account on GitHub.

Jump to solution

132 Replies

ashleyk•14mo ago

What kind of endpoint are you running. This is an issue with your endpoint not with the status API.

J.•14mo ago

https://docs.runpod.io/serverless/endpoints/invoke-jobs Run and status should be correct

Invoke a Job | RunPod Documentation

Asynchronous Endpoints

J.•14mo ago

Ur main issue is maybe not returning properly

J.•14mo ago

If u want reference to functions that I made to make a /run call, and just keep polling their status: https://github.com/justinwlin/runpod_whisperx_serverless_clientside_code/blob/main/runpod_client_helper.py

GitHub

runpod_whisperx_serverless_clientside_code/runpod_client_helper.py ...

Helper functions for Runpod to automatically poll my WhisperX API. Can be adapted to other use cases - justinwlin/runpod_whisperx_serverless_clientside_code

kingclimax7569OP•14mo ago

I was using runsync instead of run, is that incorrect? I changed it to run and now I'm receiving IN_QUEUE instead So I'm supposed to keep polling that?

ashleyk•14mo ago

Yes, /run is asynchronous, but changing it will most likely not make any difference if it does, then /runsync is broken Just tested and both work fine for me.

J.•14mo ago

/run is great b/c /runsync I find I get a network timeout :))) but certaintly /runsync is also great if it short enough but also /run gives u a 30 min cache on runpod's end to store ur answer vs /runsync I forget how long but its <1 min i think so i find the 30 min cache nice also u can add a /webhook if u want it to call back to ur webhook when done with the response instead of polling

kingclimax7569OP•14mo ago

Yea im still not getting the output, just a value that says "COMPLETED"

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

ashleyk•14mo ago

How do you get a network timeout with runsync? you are doing something wrong, it eventually goes to IN_QUEUE or IN_PROGRESS if the request takes too long, it doesn't time out.

kingclimax7569OP•14mo ago

response: {"delayTime":662,"executionTime":9823,"id":"1d227fac-78f9-4e22-bb2e-1ff79718704a-u1","status":"COMPLETED"}

ashleyk•14mo ago

Yes, I knew it would not make a difference Your worker is most likely throwing an error, and you are most likely capturing a dict in the error key which causes this to happen error only accepts an str and not a dict, RunPod made a shitty breaking change to the SDK that causes this. So now you have to do something like:

{
   "error": "Some error message",
   "output: someDict
}

{
   "error": "Some error message",
   "output: someDict
}

I had this exact same issue and had to change my error handling to fix it.

kingclimax7569OP•14mo ago

Sorry where does this change need to be made? thank you for the response

ashleyk•14mo ago

in your endpoint handler file

kingclimax7569OP•14mo ago

Sorry I don't think I've ever modified that file, do I need the runpod python package to use it? I only have an endpoint that I set up

ashleyk•14mo ago

Are you using the vllm worker?

kingclimax7569OP•14mo ago

Im not sure, how can I find that out?

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

Not sure if that makes sense?

kingclimax7569OP•14mo ago

kingclimax7569OP•14mo ago

I get that response repeatedly

Toxibunny•14mo ago

I can share my code, but as far as I can see looking from what you’ve posted, your output should be in the ’tokens’ part of the json that you get back. Try just printing everything you get back. If it’s completed, it should be there… elif status == "COMPLETED": tokens = json_response['output'][0]['choices'][0]['tokens'] return tokens here's the relevant part of mine. if the status is COMPLETED, the output you want is in 'tokens'. hope this helps! ...so if I'm reading yours right, you'll want something like

LLM_response = json.loads(get_status.text)['tokens']

LLM_response = json.loads(get_status.text)['tokens']

I think, lol …unless the problem really is that all you’re getting back is ’completed’ and no tokens at all anywhere. In which case forget all I said 😅

Gaming

Programming

Status endpoint only returns "COMPLETED" but no answer to the question

Did you find this page helpful?