Streaming LLM output via a Google Cloud Function

Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.
1 Reply
VidimusWolf
VidimusWolf3w ago
Just in case anyone looks for this in the future, it is possible to do it using python's requests library.
Want results from more Discord servers?
Add your server