Streaming LLM output via a Google Cloud Function

Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.
1 Reply
VidimusWolf
VidimusWolfOP5mo ago
Just in case anyone looks for this in the future, it is possible to do it using python's requests library.

Did you find this page helpful?