RunPod•6mo ago

Streaming LLM output via a Google Cloud Function

Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.

1 Reply

VidimusWolfOP•6mo ago

Just in case anyone looks for this in the future, it is possible to do it using python's requests library.

Gaming

Programming

Streaming LLM output via a Google Cloud Function

Did you find this page helpful?