Sink Posts - Answer Overflow

Sink

•Created by Sink on 1/17/2025 in #questions

T3 Chat state

I'm building an AI chat application where LLM responses are streamed to the frontend using web sockets. The frontend concatenates these responses to the existing AI message using React's useState. However, I’ve encountered an issue with faster responses. The LLM can stream up to 25+ tokens once, but the frontend is unable to update the state multiple times in one go. As a result, only the last token is concatenated, and earlier tokens are missed. I've explored two potential solutions, but neither feels optimal: Frontend Solution 1: Using react-use-websocket, I can buffer incoming messages (which can handle 25+ payloads at once) and concatenate them into a single string every X milliseconds before updating the state. The downside is that if the interval is too short, some browsers may struggle with performance and potentially crash. If the interval is too long, the chat experience becomes less smooth and responsive. Backend Solution 2: I could move the buffering logic to the backend, but this might become computationally expensive when handling high traffic. Any suggestions? I’m curious how T3 Chat handles rendering these messages efficiently.

2 replies

Gaming

Programming