"Error decoding stream response" on Completed OpenAI compatible stream requests
Context
I have a custom worker on serverless, I am streaming a response from async OpenAI python client.
Error
When making requests on the OpenAI compatible API endpoint, non-streaming is fine, but stream requests always return with:
- Response code: 200
- Body just text:
"Error decoding stream response"
I attached the run status results, which show the expected output and Completed status
Example Command
Response data
Relevant Code
Here's a snippet of the handler
I cannot reproduce the error when running the pod locally and making requests to /runsync
with the matching input data, any insight would be helpful, not sure if there's an additional layer of decoding or deserializing in the API that isn't happy with the streaming responses 🥲
2 Replies
Have a look here:
https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/wrapper/src/engine.py#L88
I compared the default OpenAI response (from vllm) to the one i did, the difference was that not a real json was returned in stream but a "data: <json string>"
With that it worked for me.
GitHub
runpod-worker-ollama/wrapper/src/engine.py at master · SvenBrnn/run...
A serverless ollama worker for runpod.io. Contribute to SvenBrnn/runpod-worker-ollama development by creating an account on GitHub.
thank you!! did not realize it was returning strings