RunPod•13mo ago

is stream a POST endpoint or GET endpoint (locally)?

Is stream a POST endpoint or GET endpoint. Trying to run the handler locally before hosting it on runpod with streaming but it's not working. Noticed the example here: https://doc.runpod.io/reference/llama2-13b-chat#streaming-token-outputs where /stream is a GET endpoint But when I check the fastapi swagger API it shows it's a POST endpoint, which ofc doesn't stream but rather return all results together. Here is my simplified handler:

import runpod
import asyncio

async def handler(job):
    for count in range(3):
        result = f"This is the {count} generated output."
        yield result
        await asyncio.sleep(5)


runpod.serverless.start(
    {
        "handler": handler,
    }
)

import runpod
import asyncio

async def handler(job):
    for count in range(3):
        result = f"This is the {count} generated output."
        yield result
        await asyncio.sleep(5)


runpod.serverless.start(
    {
        "handler": handler,
    }
)

and by client which fails with /stream (GET) (method not allowed error) locally.

import requests

def benchmark_response():
    headers = {
        "Content-Type": "application/json"
    }
    response = requests.post(
        url='http://localhost:8000/run', 
        headers=headers, json={'whatever': 1}, timeout=600,
    )
    
    job_id = response.json()['id']
    print(job_id)
    url = f'http://localhost:8000/stream/{job_id}'

    while True:
        get_status = requests.get(url, headers=headers)
        print(get_status.text)


if __name__ == '__main__':
    benchmark_response()

import requests

def benchmark_response():
    headers = {
        "Content-Type": "application/json"
    }
    response = requests.post(
        url='http://localhost:8000/run', 
        headers=headers, json={'whatever': 1}, timeout=600,
    )
    
    job_id = response.json()['id']
    print(job_id)
    url = f'http://localhost:8000/stream/{job_id}'

    while True:
        get_status = requests.get(url, headers=headers)
        print(get_status.text)


if __name__ == '__main__':
    benchmark_response()

12 Replies

gokuOP•13mo ago

@ashleyk can you help?

ashleyk•13mo ago

Nope

gokuOP•13mo ago

do you know anyone who can? runpod team? @flash-singh ?

ashleyk•13mo ago

The RunPod devs are in the US so they will probably only be online in a few hours. @Merrell will probably be able to advise.

flash-singh•13mo ago

both GET and POST work

gokuOP•13mo ago

none of them works for me

ashleyk•13mo ago

Don't think it works locally.

gokuOP•13mo ago

Then I'll try to deploy it. But if so, I guess it is not an ideal behavior since testing on Prem is not very convenient

J.•13mo ago

https://blog.runpod.io/runpod-dockerless-cli-innovation/ @goku Can try this.it uses gpu pods to live reload against a developer env / have ur endpoints there for testing. havent tried for /stream, but might work for ur situation

RunPod Blog

RunPod's Latest Innovation: Dockerless CLI for Streamlined AI Devel...

Discover the future of AI development with RunPod's Dockerless CLI tool. Experience seamless deployment, enhanced performance, and intuitive design, revolutionizing how you bring AI projects from concept to reality.

nerdylive•13mo ago

what does dockerless means? it builds on dev env then uploads?

J.•13mo ago

It a bad naming, in the docs it called “Projects” under runpodctl Docs still need to be drastically improved imo. But all it means is they want u to be able to develop a handler.py without worrying too much about setting up ur own custom docker image. Tho arguably I think it still not the best workflow if u need dependencies outside of python they dont handle it the best rn , either requiring u to do it through another terminal to apt-get install, or “your own custom docker image” as a base with like other special dependencies installed. But overall i think it is a nice development flow since they have the endpoints automatically there to test when i played with it before, just personally my workflows are simple so I just test against gpu cloud manually The idea is u write a handler.py and it live refreshes against a pod in the background for u to test against and the pod also auto stops / shutdown if u disconnect after a certain amt of time

nerdylive•13mo ago

Ah ye

Gaming

Programming

is stream a POST endpoint or GET endpoint (locally)?

Did you find this page helpful?