@Chaika Hey! I dont know if you remember
@Chaika Hey! I dont know if you remember, but you helped me with serving wildcard SSL certs on custom domains for customers a few weeks ago. I've recently been having an issue where if you constantly reload the custom domain site without cache (ctrl + shift + r), then there's ~50% chance for an "Invalid SSL Cert" error. Do you have any idea why this could be happening? I can DM you the website that this happens with if needed.
27 Replies
@x03 I don't have that bad of a memory lol, you don't have more then one server/traffik container right? This is happening on the same cf for saas setup on a custom domain of a customer?
Apologies for the late reply, something came up. I only have one server and use a single container with the catch-all traefik config.
Yes this is happening on the same exact one that we configured.
The only variable that changed since previously is that the domain got heavy loads of traffic
On launch there was 20,000 unique visitors to the domain, nowadays its ~300
I dont know if the traffic is what caused this, but nothing else really changed
oh actually its about 1k daily, not 300
just for that one domain
and the issue is on any custom hostname using cf for saas/the fallback cert, or any domain using traffik at all, or just your own hostnames you have other configs for?
So this is only for custom saas domains.
My *.tsar.app domains use the same traefik config too and dont have this issue
hmm yea that is interesting, do you see anything in logs about the certificate? If you bypass proxy and hit traffik directly, do you see the cert being served reliability? https://discord.com/channels/595317990191398933/1268644381418848326/1268647595421601874
weird that it would suddenly change to not reliably serving it
I'll test this out in a bit
@Chaika sorry for the long wait, I had a busy week and I'm only now taking a look at this.
Since my last time checking (same day as my initial message), I have not tested this issue at all. Looking at it today, I cannot reproduce the issue that I was having. Normally when I would reload without cache, there would be ~50% chance of it throwing an SSL error. Now it's 0%, every reload succeeds.
I dont remember changing any settings on cloudflare OR the server. The only thing I've done was deploy a few updates which should not have affected anything SSL related. The usage is about the same, with 1k unique visitors and 30k requests, with 60% cached.
I also went through all my Coolify configs, and everything for my traefik proxy settings is default besides the container-specific config that we set up for the .app domain.
The wildest thing is that even though the custom domains started working properly again, for whatever reason my https://tsar.dev domain now throws SSL errors 100% of the time 😠😠I've literally changed NOTHING and this domain does not even use any fancy reverse proxy configs, its literally the same as all my other domains. The only thing special about this domain is that it uses CF Zero Trust. This .dev domain literally worked a few days ago and now it's not working for whatever reason.
My domain settings for the .dev seem to be fine, with the mode being set to strict.
I'm very confused as to why I get these random SSL errors out of nowhere, not too sure if its a problem on my end or perhaps Coolify. I'll definitely keep a lookout for anything related to this from now on.
Ah, seems like my .dev certificate expired. Do you know of any way to avoid this, or at least automate renewal?
Also here's the output for the (now working) custom domain:
Looking a bit more into this issue I'm like 80% sure it's Coolify. I've opened a post in their Discord to hopefully get some help with this.
Coolify/Traefik should automate renewal for you unless it's failing, would have to check logs and see why it did fail
I did try warning/cf origin certs are still an option if you can figure out how to get them to work with traefik lol https://discord.com/channels/595317990191398933/1268644381418848326/1268655682966782125
i might have to look into that
@Chaika so I did a bit of digging, and I've set up Cloudflare as my traefik cert provider (not sure if this relates to the origin cert stuff or not).
This still didnt work, so I kept digging through docs and found this, so I guess I'm gonna try and set this up
Nevermind, there's no "setup" for this, I think it should work out of the box after I set my provider. Sadly it's still not working though.
Here's what I added to my global proxy config:
Any way to check if the provider change worked? Running the curl command still shows 'Let's Encrypt', I assume because the old cert still hasn't expired.
I'm looking into CF's origin cert stuff, and is there any way to allow all the SaaS domains to use the cert or do I need to add them all manually
Man working with this stuff is not fun...
Okay so the issue with the .dev domain turned out to be the fact that it was behind Zero Trust so ACME failed to refresh the certificate 😅
This took me way too long to realize
@Chaika okay so the first domain I had issues with was because of Zero Trust blocking ACME requests, the second domain was because I had "under attack" mode on (I use it as a bootleg anti-scrape on newer projects) and that mode was also blocking the ACME requests.
Turns out this issue was pretty simple, but I had no idea how any of this certificate stuff worked and didnt even know what ACME was until I read deeper into it.
DNS Challenges like through the CF API would get around both of those
would be same as your certs rn, you'd just need one covering your own zone like
example.com,*.example.com
Oh I see, thanks for the clarification
I'll look into setting up the cloudflare origin certs later, im just glad everything works at the moment
Wdym by this?
if you still wanted to use the CF Resolver, might need to set that for the default as well, something like
- "traefik.tls.stores.default.defaultgeneratedcert.resolver=cloudflare"
I assume you could test it by changing the sans and adding something else/forcing it to refresh but might cause downtime
ACME has two main verification modes. HTTP and DNS. HTTP to a special path (.well-known/acme-challenge/ttt) can be blocked by things as you've noticed and is generally a bit more unstable. DNS adds TXT Records via the Cloudflare API (or whatever DNS you use) and doesn't care about firewall/http reachability
https://letsencrypt.org/docs/challenge-types/Ohhhh interesting
So I'd need to add TXT records to all my domains
What about SaaS domains
well not manually lol, you'd have the integration do that for you using the cf api
oh
and you wouldn't need to verify your saas domains in this context, just need certs for your own
Alright I see, I'll read some docs to find out how to swap to DNS instead of HTTP
Traefik Let's Encrypt Documentation - Traefik
Learn how to configure Traefik Proxy to use an ACME provider like Let's Encrypt for automatic certificate generation. Read the technical documentation.
looks simple enough, ill try and set it up
@Chaika any way to verify that swapping to DNS was a success?
I ran a
dig TXT _acme-challenge.yourdomain.com
command and it all checks out
Thanks for all your help, everything is perfect nowlogs?
I ran a dig TXT _acme-challenge.yourdomain.com command and it all checks outYou'd have to be really quick to see it lol, it adds, verifies, and deletes
There was no logs, which I guess is a good sign ðŸ˜