CDN-introduced data corruption for 206 responses
We have noticed corrupted data being returned from the CDN for 206 responses. We have extensively validated that the upstream server is healthy and has valid data, by checking against known-good backup archives and RAID redundancy.
The URL in question is: https://cjrtnc.leaningtech.com/3_20250225_603/8/jre/lib/resources.jar, but other copies of the same file in different builds seems to be affected too.
The responses contain, as expected,
Content-Range: bytes 0-131071/1121910
, since the first 128k of the file were requested.
Intermittently we have noticed data being returned that correspond to a range of the same length, but not starting at 0, but at offset 0x7E22 which is cannot justify off the top of head. This offset was identified by searching for the data in the original file, the header always reports the starting offset being 0.
Please find in attachment a text dump that contains all the relevant information: the file URL, the requested range and the returned data for the good and bad case in BASE64 format.19 Replies
Hi!
I know this error has already been reported, and even though there's no response on the topic, someone should already be looking into it.
I think it would be best if you could add your report to the same community topic.
https://community.cloudflare.com/t/r2-range-responses-rarely-0-5-off-by-64kb/772463
The issue does not seem to be the same. In particular the offset in my case is not a power-of-two, otherwise I would have assumed a cosmic ray or other random bit flip. In the linked community discussion the offset is 64k.
Moreover, I tried using the same strategy to reproduce the issue by downloading the range 1000s of times in a row, but I am not able to trigger the bad behavior.
The issue also refers to R2, but I am unsure how much infrastructure is shared between the CDN and R2
The same error for another URL and another offset has now being reported from a different user in a different location.
There seems to be serious data corruption problems that we suspect are connected to the handling of byte ranges request when the data needs to be first pulled from the upstream server. The upstream request is always without byte ranges, which make sense, but this manipulation could be connected to the problem.
We also have a bit of suspicion that the whole problem might be connected to QUIC, which we have now disabled to continue monitoring
have you got a repro curl? From my tests, data always seems intact
I don't have a repro, unfortunately. I could see it happen twice today with different resources so it must be relatively common. I think it could connected to cache misses, so maybe trying with randomized query strings could help. I also suspect QUIC to be part of the problem, and I am not sure curl can use it
Maybe I can actually provide a repro, and I can also see that QUIC is not the problem
There is something reproducible and really peculiar that I can report and could be connected.
fetch("https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?hack="+((Math.random()*100000)|0), {"headers": { "range": "bytes=0-131071", }, "method": "GET", }).then(async function(r){var data=new Uint8Array(await r.arrayBuffer());console.log(String.fromCharCode(data[0],data[1]));})
This is JS oneliner than I am using from Chrome console. The resource is CORS enabled, so you can do this from anywhere. For reproducibility I am using the about:blank
page.
Notice that there is hack
query string parameter to force a cache miss
When the range header is enabled, 20% of the downloads more or less fails with a net::ERR_HTTP2_PROTOCOL_ERROR 206 (Partial Content)
error. I can see that there are no errors on the upstream server.
On the other hand if the byte range header is removed I cannot ever trigger a single failure
This seems to confirm that the upstream server is healthy and able to provide the whole data, but somehow the CDN trips on byte ranges.
I'd like to stress that this is not the original data corruption issue, but I wonder if it could be connected. Something is definitely amiss in the handling of byte ranges.
Another user has reported the same type of issue, which confirms the problem is serious and widespreadwith the corruption?
Yes. I am judging from symptoms since I don't have a HAR file from the user.
It also dawned on me that another intermittent issue reported ~2 weeks ago could have the same origin
Did you have the opportunity of reproducing the intermittent errors with the script above?
I'd like to add that I understand that from your perspective it's more likely that the origin server is at fault, I'd honestly prefer that too since it would be easy to fix, but I've checked everything I can think of and everything seems healthy. I am more than willing to test more things if you have suggestions.
I was able to reproduce the h2 protocol err but not dug into what is happening there yet
Glad to hear you could repro, it's a first step. Let me know if you need further info.
I am trying to gather additional data points that could help in debugging this/these problems.
I can reproduce the I had unfortunately no luck yet in consistently reproducing the data corruption issue, but our users have provided more logs that confirm the problem exists.
h2 protocol err
part while pointing the CDN (via another domain) to a completely different upstream server located elsewhere. This seems to confirm the issue is introduced by the CDN and not by a faulty upstream server.
I have found a oneliner that can often demonstrate the data corruption issue.
The
count
parameter can be freely altered. This should always print 'PK' on the console, since the file is JAR/ZIP file. See screenshot for an example of a failure.
The problem seems to happen more frequently on smaller download size, the new test case is based on 16k requests, instead of the original 128k requests.
I have also added this test case in my
community.cloudflare.com
report for future record.
I can also confirm this oneliner can trigger corruption from a different domain/host combination on the CDN.
The corruption is indeed an unexpected offset, I already stated this in the first entry of this report.
Since two completely independent origins, with different hardware, different software and different location show the problem I think it's safe to exclude an upstream issue.
The offset is not always the same across different URLs.
I might be the same for the same URL, I don't have strong data about this.
https://d3415aa6bfa4.leaningtech.com/jce.jar Is my second test URL. This one is not CORS-enabled though, so testing from Chrome is not as straightforward. I have done my testing by disabling CORS checks at the browser level.
This domain points to a completely different upstream server.
This seem to suggest the offset is actually random or garbage data coming from somewhere
Using node is good approach to get rid of the CORS issue, nice idea 👍
It's an interesting discovery, but I don't see what that would mean
It's important to note that the upstream server never actually receives byte ranges headers, Cloudflare will download the whole resource and then provide the range to the user.
It's a sensible design
(Working on it)
Yep, I think I have a working setup.
URLs starting with https://d3415aa6bfa4.leaningtech.com/jce.jar
(so to include any query string) now bypass the cache
I have also added another copy of the file jce2.jar
without this rule for comparison.
The one bybassing the cache work reliably in my tests so far
Can you clarify which ones have the header?
To the best of my understanding this is also indeed a fairly serious cache problem.
I think this is also connected to another issue I mentioned above that also only happens when the data is first loaded from the origin. https://discord.com/channels/595317990191398933/1344748548247392387/1345048200503103518
Ok, thanks for your help in clarifying the problem. Hopefully cloudflare engineers will take the issue seriously thanks to the data we have gathered so far.repro script:
this was a very helpful discussion this morning, thank you both!
I've escalated this
Thanks, appreciated
Is there a ticket/reference number that you can share in case I need to ask about this problem later on?
CUSTESC-49209
is the internal ref
did you have a support ticket open already? if so i can linkhttps://community.cloudflare.com/t/serious-cdn-introduced-silent-data-corruption-for-byte-range-requests/774359
Unsure if this is what you mean by a support ticket
nah i meant a proper ticket - https://developers.cloudflare.com/support/contacting-cloudflare-support/#getting-help-with-an-issue
I'll try to quickly write one up
I suspect I can't open a ticket since we don't have a paid plan for the CDN, although we have one for Workers
open it as an account ticket
ty