Load Balancer Least Outstanding Requests Steering,How does cloudflare know when request is finished?

I have 3 endpoints. traffic steering is Least Outstanding Requests. Each request takes around 2 minutes to process after being accepted by the server(generative AI). But often, I see that 1 endpoint has 3 requests (1 active, 2 in queue), while the other 2 endpoints don't have any active requests. How is outstanding requests being measured exactly? I see this from the documentation: "LORS uses the number of unanswered HTTP requests to influence steering" Does this mean that as soon as the server accepts the request, it's considered "complete", even if the server hasn't finished processing it? https://developers.cloudflare.com/reference-architecture/architectures/load-balancing/#least-outstanding-requests-steering-lors
Cloudflare Docs
Load Balancing Reference Architecture | Cloudflare Reference Archit...
Cloudflare Load Balancing is a SaaS offering that allows organizations to host applications for a global user base while vastly reducing concerns of maintenance, failover, resiliency, and scalability. Using Cloudflare Load Balancing allows organizations to address the following challenges:
No description
1 Reply
Chaika
Chaika2mo ago
The dashboard and docs say "In-Flight requests" which would include pending and such. The docs you detail there explain more as well,
In situations with lighter load conditions, there will be more variation in the steering results, which may not precisely match the configured weights. However, as the load increases, the actual steering results will closely match the configured weights.
Think you need way more requests then 3 and I would keep in mind too all that data is likely to be colo or pop specific, so if you have requests hitting different colos it's not going to know that another location already has x amount of pending requests
Want results from more Discord servers?
Add your server