RunPod•4mo ago

Some basic confusion about the `handlers`

Hi everyone! 👋 I'm currently using RunPod's serverless option to deploy an LLM. Here's my setup: - I've deployed the vLLM with a serverless endpoint (runpod.io/v2/<endpoint>/run). - I built a FastAPI backend that forwards frontend requests to the RunPod endpoint. - This works fine since FastAPI is async and handles requests efficiently. However, I came across the Handler feature in the RunPod docs and am unsure if I should switch to using it. My questions are: 1. Is using the Handler feature necessary, or is it okay to stick with FastAPI as the middleware? 2. Are there any advantages to adopting Handlers, such as reduced latency or better scaling, compared to my current setup? 3. Would switching simplify my architecture, or am I overcomplicating things by considering it? Basically my architecture is: 1. Frontend 2. FastAPI (different endpoints and pre/post processing -- async requests) 3. Runpod vLLM 4. FastAPI (final processing) 5. Return to frontend I am not able to grasp the handler feature, is it a replacement of such FastAPI like frameworks or is it handelled automatically on the runpod side? Any advice or insights would be much appreciated! Thanks in advance. 😊

7 Replies

3WaD•4mo ago

Do you mean this Handler? That's what the serverless containers (including your vLLM) are running on. When you want to develop a container image for RunPod serverless, you use their SDK and put the execution code inside the Handler functions. So you are already using it just didn't have to write it since you're using someone else's image (the VLLM template).

jackson holeOP•4mo ago

Yes, that handler. So I am basically utilizing that already!? Wow. Actually my code structure looks like this:

app = FastAPI()

@app.post("/get_response")
async def get_response(request: Request) -> Response:
    request_dict = await request.json()
    print("JSON:", request_dict)
    
    payload = {"topic": request_dict["story_topic"]}
    request_id = random_uuid()
    story = await functions.generate_story(payload)
   
    return {"response":story}

app = FastAPI()

@app.post("/get_response")
async def get_response(request: Request) -> Response:
    request_dict = await request.json()
    print("JSON:", request_dict)
    
    payload = {"topic": request_dict["story_topic"]}
    request_id = random_uuid()
    story = await functions.generate_story(payload)
   
    return {"response":story}

And that calls the appropriate async function:

async def generate_story(payload):
  
  sampling_params =  SamplingParams(temperature=0.42,
                                    max_tokens=2048,
                                    top_p=0.734,
                                    repetition_penalty=1.0,
                                    stop=["Note", "note"])
  
  prompt = "<>"

  ### the below will call the runpod openai endpoint ###
  completion = client.completions.create(
          model="iqbalamo93",
          prompt=prompt,
          **sampling_params)
return completion.choices[0].text

async def generate_story(payload):
  
  sampling_params =  SamplingParams(temperature=0.42,
                                    max_tokens=2048,
                                    top_p=0.734,
                                    repetition_penalty=1.0,
                                    stop=["Note", "note"])
  
  prompt = "<>"

  ### the below will call the runpod openai endpoint ###
  completion = client.completions.create(
          model="iqbalamo93",
          prompt=prompt,
          **sampling_params)
return completion.choices[0].text

So, basically this FastAPI is implemented in this async way and I guess it should do that job!

3WaD•4mo ago

Yes, the handler runs on the serverless worker. Once it's built and deployed, you don't see or interact with it. It helps to think about your serverless endpoint as a stateless computation process that expects some input and returns some output. That's it. And you can call it in many ways. With raw http requests via CURL or Postman, from your frontend with Fetch/Ajax or backend with any http library. RunPod can do both async (with /run), sync with (/runsync), or even stream the responses if the handler utilizes the Generator Handler. It also offers an /openai path which allows you to receive raw outputs, thanks to this the VLLM image can work plug-and-play with OpenAI clients.

xeith_•4mo ago

@jackson hole switching to using runpod handler is pretty straight forward. It will take you less than an hour 🙂 Make your Dockerfile point to a new runpod.handler.py file.

# Command to run the app
CMD ["python3", "-u", "runpod.handler.py"]

# Command to run the app
CMD ["python3", "-u", "runpod.handler.py"]

#runpod.handler.py
import runpod

from backgroundremover import remove_background # Some Python library
from image import crop_image, load_image # Local File (./image.py)

async def handler(event):
    try:
        input_data = event.get("input", {})
        callback = input_data.get("callback", {})
        image_url = input_data.get("image_url",input_data.get("image_data_url",""))

        # Load the image  
        image = load_image(image_url)

        processed_image, mask = await remove_background(image)

        # Crop the image
        image = crop_image(processed_image)

        return {
            "url": image['dataUrl'],
            "position": image['position'],
            "callback": callback
        }
        
    except Exception as e:
        return {'error': str(e)}
    

if __name__ == '__main__':
    runpod.serverless.start({'handler': handler})

#runpod.handler.py
import runpod

from backgroundremover import remove_background # Some Python library
from image import crop_image, load_image # Local File (./image.py)

async def handler(event):
    try:
        input_data = event.get("input", {})
        callback = input_data.get("callback", {})
        image_url = input_data.get("image_url",input_data.get("image_data_url",""))

        # Load the image  
        image = load_image(image_url)

        processed_image, mask = await remove_background(image)

        # Crop the image
        image = crop_image(processed_image)

        return {
            "url": image['dataUrl'],
            "position": image['position'],
            "callback": callback
        }
        
    except Exception as e:
        return {'error': str(e)}
    

if __name__ == '__main__':
    runpod.serverless.start({'handler': handler})

Here is a small example

3WaD•4mo ago

Just to clarify the "switching", the handler topic is relevant only if you want to develop your custom container images. When using the official RunPod VLLM template, there's no need to care about it.

xeith_•4mo ago

Yea of course, using runpod handler is only relevant for serverless. Otherwise the API structure is great!

jackson holeOP•4mo ago

EXACTLY! Thanks @3WaD !! And of course @xeith_ for your detailed walkthrough!! 🤗

Gaming

Programming

Some basic confusion about the `handlers`

Did you find this page helpful?