madiator Posts - Answer Overflow

madiator

•Created by madiator on 8/21/2024 in #⚡｜serverless

Long latencies

I have a 7B model that is supposed to be very fast (it checks if a claim is supported by a context, and gives a yes/no answer). If I rent a H100, I can process my prompt and get a response in 100ms (for a prompt that's about 1400 words). But a very short prompt (about 200 words) when using serverless takes about 1.3 to 1.5 seconds. I tried to have "active workers" but that didn't help. Any tips on how to reduce the latency?

5 replies

Gaming

Programming