geomaster
CDCloudflare Developers
•Created by geomaster on 2/20/2024 in #workers-help
15s delays when writing to R2 from Workers
We're seeing R2 writes from a Worker to buckets in WEUR and EEUR sporadically take 15000+ ms. There doesn't seem to be a normal distribution - it's strongly bimodal i.e. they either take ~800ms or ~15800ms. This sounds like there is some additional delay being applied to our requests for some reason.
How to debug this? This is incredibly disruptive as it affects a user-facing API call.
3 replies
CDCloudflare Developers
•Created by geomaster on 1/29/2024 in #workers-help
Poor Worker<->R2 performance even within same region
We’re trying to serve assets stored in R2 via a Worker. The assets are distributed in 4 R2 buckets, each in a different region, and we even tried issuing 2 read requests to the 2 nearest regions in parallel to smooth out the latency.
None of this seems to help much. We’re still seeing R2 TTFB latencies to the nearest region of up to 400-600ms in about 10% of requests, often after a long period of no requests (even though the worker boots up quickly - almost as if the storage itself is “cold”). Since it’d be highly unlikely that both R2 regions we’re reading from would take a slow path or get congested at the same time, it’s likely that there is an issue on the workers side that stalls our R2 requests.
Are these numbers expected? They seem really high. In comparison, S3 operations from an AWS Lambda in the same region have 10x lower TTFB (based on my crude eyeballing).
These assets are on the critical path for page loads, and with client RTT + CF cache lookups sporadically taking close to 100ms, we often see end-user response times over 1s. This is obviously really high.
Has anyone had any experience with this behavior with R2 when accessed from a Worker and has some advice on how to mitigate? Relying only on the Cloudflare cache for performance has proven tricky since it evicts quickly and very often doesn’t even seem to commit new data into the cache, causing a miss on a subsequent request, for example.
1 replies
CDCloudflare Developers
•Created by geomaster on 1/19/2024 in #workers-help
Unpredictable latency spikes for Workers, R2 and KV
I'm currently looking at a request to a Worker which had a TTFB of 693ms measured on my side. On the worker side, the time is measured as 492ms, the majority of which is a same-region R2 request. It's a network overhead of almost 200ms (to not say anything about almost half a second to just get the metadata of a stored object)!
(It seems that I was randomly reassigned to a different colo, to which I have a very high RTT. https://speed.cloudflare.com still shows the "optimal" colo for me (RTT=27ms), but workers go through a different one. I did observe R2 latencies in the neighborhood of 100-200ms with the old colo as well.)
There is a cache in front of these requests and it helps a lot; however, sometimes the cache is poorly-behaved in the sense that it takes 5 or 6 writes before it actually stores a response, and evicts even frequently-requested resources.
I've observed KV read latencies of upwards of 300ms, and just half an hour ago I had two requests 2 same-region R2 requests in a row have a latency of 900ms (as measured by the worker).
These occurences make workers very difficult to use for anything user-facing, as combining even a few of these requests can lead to requests that take seconds, which is very noticeable.
Are these kinds of numbers expected, and is there anything that can be done to improve them?
1 replies
CDCloudflare Developers
•Created by geomaster on 12/12/2023 in #workers-help
Any way to abort a chunked HTTP response such that the receiving end knows an error occurred?
Hi all! When streaming an unknown-length response to the client, calling
WritableStream.abort()
will still gracefully end the stream, and so the client will not see any error. Instead, it'll just receive a truncated response. Is there any way to forcibly close the connection without concluding the stream, so that the other side can interpret this as an error? As it stands, streaming is very risky to use due to the potential for data corruption, but is also the only viable approach for large payloads.1 replies