Cloudflare Developers•2mo ago

CDN-introduced data corruption for 206 responses

We have noticed corrupted data being returned from the CDN for 206 responses. We have extensively validated that the upstream server is healthy and has valid data, by checking against known-good backup archives and RAID redundancy. The URL in question is: https://cjrtnc.leaningtech.com/3_20250225_603/8/jre/lib/resources.jar, but other copies of the same file in different builds seems to be affected too. The responses contain, as expected, Content-Range: bytes 0-131071/1121910, since the first 128k of the file were requested. Intermittently we have noticed data being returned that correspond to a range of the same length, but not starting at 0, but at offset 0x7E22 which is cannot justify off the top of head. This offset was identified by searching for the data in the original file, the header always reports the starting offset being 0. Please find in attachment a text dump that contains all the relevant information: the file URL, the requested range and the returned data for the good and bad case in BASE64 format.

cloudflare_corruptio...

20 Replies

Laudian•2mo ago

Hi! I know this error has already been reported, and even though there's no response on the topic, someone should already be looking into it. I think it would be best if you could add your report to the same community topic. https://community.cloudflare.com/t/r2-range-responses-rarely-0-5-off-by-64kb/772463

apignottiOP•2mo ago

The issue does not seem to be the same. In particular the offset in my case is not a power-of-two, otherwise I would have assumed a cosmic ray or other random bit flip. In the linked community discussion the offset is 64k. Moreover, I tried using the same strategy to reproduce the issue by downloading the range 1000s of times in a row, but I am not able to trigger the bad behavior. The issue also refers to R2, but I am unsure how much infrastructure is shared between the CDN and R2 The same error for another URL and another offset has now being reported from a different user in a different location. There seems to be serious data corruption problems that we suspect are connected to the handling of byte ranges request when the data needs to be first pulled from the upstream server. The upstream request is always without byte ranges, which make sense, but this manipulation could be connected to the problem. We also have a bit of suspicion that the whole problem might be connected to QUIC, which we have now disabled to continue monitoring

Walshy•2mo ago

have you got a repro curl? From my tests, data always seems intact

apignottiOP•2mo ago

I don't have a repro, unfortunately. I could see it happen twice today with different resources so it must be relatively common. I think it could connected to cache misses, so maybe trying with randomized query strings could help. I also suspect QUIC to be part of the problem, and I am not sure curl can use it ~~Maybe I can actually provide a repro, and I can also see that QUIC is not the problem~~ There is something reproducible and really peculiar that I can report and could be connected.

fetch("https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?hack="+((Math.random()*100000)|0), {"headers": { "range": "bytes=0-131071", }, "method": "GET", }).then(async function(r){var data=new Uint8Array(await r.arrayBuffer());console.log(String.fromCharCode(data[0],data[1]));})

This is JS oneliner than I am using from Chrome console. The resource is CORS enabled, so you can do this from anywhere. For reproducibility I am using the about:blank page. Notice that there is hack query string parameter to force a cache miss When the range header is enabled, 20% of the downloads more or less fails with a net::ERR_HTTP2_PROTOCOL_ERROR 206 (Partial Content) error. I can see that there are no errors on the upstream server. On the other hand if the byte range header is removed I cannot ever trigger a single failure This seems to confirm that the upstream server is healthy and able to provide the whole data, but somehow the CDN trips on byte ranges. I'd like to stress that this is not the original data corruption issue, but I wonder if it could be connected. Something is definitely amiss in the handling of byte ranges. Another user has reported the same type of issue, which confirms the problem is serious and widespread

Walshy•2mo ago

with the corruption?

apignottiOP•2mo ago

Yes. I am judging from symptoms since I don't have a HAR file from the user. It also dawned on me that another intermittent issue reported ~2 weeks ago could have the same origin Did you have the opportunity of reproducing the intermittent errors with the script above? I'd like to add that I understand that from your perspective it's more likely that the origin server is at fault, I'd honestly prefer that too since it would be easy to fix, but I've checked everything I can think of and everything seems healthy. I am more than willing to test more things if you have suggestions.

Walshy•2mo ago

I was able to reproduce the h2 protocol err but not dug into what is happening there yet

apignottiOP•2mo ago

Glad to hear you could repro, it's a first step. Let me know if you need further info. I am trying to gather additional data points that could help in debugging this/these problems. I can reproduce the h2 protocol err part while pointing the CDN (via another domain) to a completely different upstream server located elsewhere. This seems to confirm the issue is introduced by the CDN and not by a faulty upstream server. ~~I had unfortunately no luck yet in consistently reproducing the data corruption issue, but our users have provided more logs that confirm the problem exists.~~

apignottiOP•2mo ago

I have found a oneliner that can often demonstrate the data corruption issue.

async function hack(count){for(var i=0;i<count;i++){var token=((Math.random()*100000)|0);try{var r = await fetch("https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?hack2="+token, {"headers": { "range": "bytes=0-16384", }, "method": "GET", });var data=new Uint8Array(await r.arrayBuffer()); console.log(i, String.fromCharCode(data[0],data[1]));}catch(e){console.log(token,"ERROR");}}}hack(50)

async function hack(count){for(var i=0;i<count;i++){var token=((Math.random()*100000)|0);try{var r = await fetch("https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?hack2="+token, {"headers": { "range": "bytes=0-16384", }, "method": "GET", });var data=new Uint8Array(await r.arrayBuffer()); console.log(i, String.fromCharCode(data[0],data[1]));}catch(e){console.log(token,"ERROR");}}}hack(50)

The count parameter can be freely altered. This should always print 'PK' on the console, since the file is JAR/ZIP file. See screenshot for an example of a failure.

apignottiOP•2mo ago

The problem seems to happen more frequently on smaller download size, the new test case is based on 16k requests, instead of the original 128k requests. I have also added this test case in my community.cloudflare.com report for future record. I can also confirm this oneliner can trigger corruption from a different domain/host combination on the CDN. The corruption is indeed an unexpected offset, I already stated this in the first entry of this report. Since two completely independent origins, with different hardware, different software and different location show the problem I think it's safe to exclude an upstream issue. The offset is not always the same across different URLs. I might be the same for the same URL, I don't have strong data about this. https://d3415aa6bfa4.leaningtech.com/jce.jar Is my second test URL. This one is not CORS-enabled though, so testing from Chrome is not as straightforward. I have done my testing by disabling CORS checks at the browser level. This domain points to a completely different upstream server. This seem to suggest the offset is actually random or garbage data coming from somewhere Using node is good approach to get rid of the CORS issue, nice idea 👍 It's an interesting discovery, but I don't see what that would mean It's important to note that the upstream server never actually receives byte ranges headers, Cloudflare will download the whole resource and then provide the range to the user. It's a sensible design (Working on it) Yep, I think I have a working setup. URLs starting with https://d3415aa6bfa4.leaningtech.com/jce.jar (so to include any query string) now bypass the cache I have also added another copy of the file jce2.jar without this rule for comparison. The one bybassing the cache work reliably in my tests so far Can you clarify which ones have the header? To the best of my understanding this is also indeed a fairly serious cache problem. I think this is also connected to another issue I mentioned above that also only happens when the data is first loaded from the origin. https://discord.com/channels/595317990191398933/1344748548247392387/1345048200503103518 Ok, thanks for your help in clarifying the problem. Hopefully cloudflare engineers will take the issue seriously thanks to the data we have gathered so far.

Walshy•2mo ago

repro script:

import { execSync } from 'node:child_process';
import { createHash } from 'node:crypto';

async function test() {
  const cacheKey = createHash('sha256').update(crypto.getRandomValues(new Uint8Array(32)).toString()).digest('hex');

  const url = 'https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?cacheKey=' + cacheKey;
  console.log(`Testing: ${url}`);

  // Initial cache insert
  const initialCacheRes = await fetch(url, {
    headers: {
      range: "bytes=0-16384",
    },
  });
  const initialCacheBody = new Uint8Array(await initialCacheRes.arrayBuffer());
  console.log(`${initialCacheBody[0]}${initialCacheBody[1]}`, initialCacheRes.headers);

  // Cache hit
  const cacheHitRes = await fetch(url, {
    headers: {
      range: "bytes=0-16384",
    },
  });
  const cacheHitBody = new Uint8Array(await cacheHitRes.arrayBuffer());
  console.log(`${cacheHitBody[0]}${cacheHitBody[1]}`, cacheHitRes.headers);

  // Did we have bad data?
  if (initialCacheBody[0] !== cacheHitBody[0] || initialCacheBody[1] !== cacheHitBody[1]) {
    console.error('Cache hit did not match initial cache insert');
    return false;
  }
  return true;
}

async function run() {
  while (true) {
    const result = await test();
    if (!result) {
      process.exit(1);
    } else {
      console.log('\n\n');
    }
  }
}

run();

import { execSync } from 'node:child_process';
import { createHash } from 'node:crypto';

async function test() {
  const cacheKey = createHash('sha256').update(crypto.getRandomValues(new Uint8Array(32)).toString()).digest('hex');

  const url = 'https://cjrtnc.leaningtech.com/3.0/8/jre/lib/jce.jar?cacheKey=' + cacheKey;
  console.log(`Testing: ${url}`);

  // Initial cache insert
  const initialCacheRes = await fetch(url, {
    headers: {
      range: "bytes=0-16384",
    },
  });
  const initialCacheBody = new Uint8Array(await initialCacheRes.arrayBuffer());
  console.log(`${initialCacheBody[0]}${initialCacheBody[1]}`, initialCacheRes.headers);

  // Cache hit
  const cacheHitRes = await fetch(url, {
    headers: {
      range: "bytes=0-16384",
    },
  });
  const cacheHitBody = new Uint8Array(await cacheHitRes.arrayBuffer());
  console.log(`${cacheHitBody[0]}${cacheHitBody[1]}`, cacheHitRes.headers);

  // Did we have bad data?
  if (initialCacheBody[0] !== cacheHitBody[0] || initialCacheBody[1] !== cacheHitBody[1]) {
    console.error('Cache hit did not match initial cache insert');
    return false;
  }
  return true;
}

async function run() {
  while (true) {
    const result = await test();
    if (!result) {
      process.exit(1);
    } else {
      console.log('\n\n');
    }
  }
}

run();

this was a very helpful discussion this morning, thank you both! I've escalated this

apignottiOP•2mo ago

Thanks, appreciated Is there a ticket/reference number that you can share in case I need to ask about this problem later on?

Walshy•2mo ago

CUSTESC-49209is the internal ref did you have a support ticket open already? if so i can link

apignottiOP•2mo ago

https://community.cloudflare.com/t/serious-cdn-introduced-silent-data-corruption-for-byte-range-requests/774359 Unsure if this is what you mean by a support ticket

Walshy•2mo ago

nah i meant a proper ticket - https://developers.cloudflare.com/support/contacting-cloudflare-support/#getting-help-with-an-issue

apignottiOP•2mo ago

I'll try to quickly write one up I suspect I can't open a ticket since we don't have a paid plan for the CDN, although we have one for Workers

Walshy•2mo ago

open it as an account ticket

apignottiOP•2mo ago

https://www.support.cloudflare.com/support/s/case/500Nv00000KMibcIAD/cdnintroduced-silent-data-corruption-for-byte-range-requests Ticket 01410108

Walshy•2mo ago

ty This should now be fixed @apignotti

apignottiOP•2mo ago

Great to hear, I'll test as soon as I can

Gaming

Programming

CDN-introduced data corruption for 206 responses

Did you find this page helpful?