Email obfuscated when using HTMLRewriter
Hi, I'm using HTMLRewriter to parse some html data, when deployed I see parsed text has email obfuscated but on my laptop the email shows just fine. I couldn't find this behavior documented anywhere, is it possible to disable this? This is using workers.dev domain. The exact text is
[email protected]
5 Replies
This is fetching data from 3rd party website.. I tried disabling scrape shield but still not luck.
It seems like the fetch is being proxied through cloudflare cache
If the 3rd party site is using Cloudflare then you can’t override their scrape shield setting on your fetch.
They are not using cloudflare judging from the response headers
and I can see the email in plain text when running locally
The difference between local and deployed is likely due to the IP address being used. When local, it uses your local IP address which is less likely to flagged as automated as Cloudflare’s. If you run with
—remote
then you’ll probably see the email address as protected because the request will come from Cloudflare.
I’d imagine there are other service that protect email addresses from scraping which they could be using.interesting, I didn't expect that.. I tried using colab too and it grabs the email.