Streaming LLM output via a Google Cloud Function
Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.
1 Reply
Just in case anyone looks for this in the future, it is possible to do it using python's requests library.