juanferreras
CDCloudflare Developers
•Created by juanferreras on 9/3/2024 in #general-help
Intermittent slowness on one specific ISP using Worker as Origin (proxy) / maybe all Cloudflare
Hi - we've got two reports now (myself being one) of https://bgp.tools/as/7303 working surprisingly slow intermittently when users browse through our sites with that specific ISP (and no reports yet on any other ISPs/regions). This has been the case for a few weeks now – not a recent event.
In general we've replicated it mostly over our sites (hosted on Pages, we have a Worker as origin which is a very simple proxy that just passes through the request and assigns a header with the hostname for routing purposes).
Here's a video showing more details https://www.loom.com/share/26eaa5343cf54fa2a66b36539e795206.
Things I've tried:
- Using 1.1.1.1 as resolver or not (said WARP but it wasn't enabled) – same result
- Proxying against our own zone vs *.pages.dev – same result (so not zone-specific settings)
- Not related to any Cache HIT/MISS. F.e. for media we use Cache API and send a server-timing header (which actually also includes the worker startup time) – you can see the time wasn't spent on any app logic of ours
- When things work as expected, most of those requests that took seconds resolve in XX/XXX ms.
- Not seeing any security events (e.g. thought we could be triggering some) when proxying against our zone
- Not seeing anything abnormal doing traceroute / ping to either our site, cloudflare.com or 1.1.1.1 (example in the video)
- Not seeing related errors nor traces anywhere (a few Client Disconnected on the proxy which could be related but also could be normal)
- Not seeing anything odd when opening speed.cloudflare.com (e.g. 0% packet loss)
What'd be recommended ways to debug or understand this issue deeper? Is there any known security mechanism that could make requests stuck as opposed as to 429 / challenge / etc? We're funneling quite a few requests to the same worker but it's still odd to see it only happening with one specific ISP so far.
Has the time come where I'll need to learn how to use Wireshark? 😄
Thanks!
9 replies
CDCloudflare Developers
•Created by juanferreras on 6/6/2024 in #general-help
Custom Hostname DCV Delegation does not work (Pending Validation TXT) for domain with DNSSEC
47 replies
CDCloudflare Developers
•Created by juanferreras on 5/16/2024 in #general-help
With Delegated DCV validation, do I need just the root CNAME or one per each hostname?
If I want to point
site.com
, www.site.com
and subdomain.site.com
to my CF for SaaS (cname.zone.com
).
A. Would adding CNAME _acme-challenge
be enough for the certs?
B. Would I need 3, _acme-challenge
, _acme-challenge.www
and _acme-challenge.subdomain
?
If www.site.com
is currently using a CNAME to somewhere else, would the sequence of steps to minimize downtime...
1. Add the DCV _acme-challenge
(or DCVs depending A or B)
2. Add the hostname/s to the Cloudflare Dashboard
3. Wait for Cloudflare Dashboard to say the Certificate is Active
4. (Potentially) add any missing CAA records necessary (although, if it's currently CNAMEing somewhere else, are you even able to add CAA records or not until you change the CNAME on step 6 and it propagates)?
5. Otherwise, www.site.com
continues reaching the old service provider successfully
6. Consider doing pre-validation on the hostname (e.g. https://developers.cloudflare.com/cloudflare-for-platforms/cloudflare-for-saas/domain-support/hostname-validation/pre-validation/) before DNS changes.
7. Change the www.
CNAME to cname.zone.com
, and with no downtime, www.site.com
will point to our zone.
Is there any alternative workflows anyone's using? DCV seems invasive but the automatic renew makes it a great option. Curious to see in practice how many times the old/current service provider can prevent it from working (outside of CAA differences)
Thanks!7 replies
CDCloudflare Developers
•Created by juanferreras on 4/19/2024 in #workers-help
Small % of 500 errors logged in Cache Analytics but 0 Errors in Workers/Pages metrics
Hi all!
We have a very basic Worker proxying a Pages project (to use CF for SaaS) and everything works great. However, yesterday during a (very loose) stress test of a certain workflow, we've seen ~66 requests out of 15k fail with status code 500.
1. Zone > Cache analytics (
<zone>/caching?status-code=500
) shows the errors https://share.cleanshot.com/B1YVRcRp
2. Zone > Workers Route also shows the same errors (<zone>/caching?status-code=500
) https://share.cleanshot.com/VdL4W6G2
3. Workers & Pages > Workers (the simple proxy) however shows 0 errors during the time (/workers/services/view/nyla-site-proxy/production
) https://share.cleanshot.com/s0SjL9hC
4. Workers & Pages > Pages analytics also shows 0 errors (pages/view/<pages-project-name>/analytics/production
) https://share.cleanshot.com/58Bz9w2Z
Oddly enough, Worker Trace Events is configured on the simple proxy and I do see those requests in our logs (https://share.cleanshot.com/vZvDRPy7).
The only hint of what's the error category is that the Pages project does show this in the real time logs area (https://share.cleanshot.com/F9TfKGq1) - but why would it not appear as an Error under metrics? 🤔
Is there any easy way to detect who's swallowing the error/more information? Not seeing any odd firewall events/threats (e.g. no sign this was a blocked request due to misdetection). The zone also has basically nothing besides this setup.
Thanks!6 replies