Chava
Chava
NNuxt
Created by Chava on 7/19/2024 in #❓・help
Spiking SSR response times causing outages
Hello I'm trying to solve a problem which will be hard to provide a reproduction repo for. We have a really big internal app with a lot of users. Recently we've been having production outages where the SSR requests take over 60 seconds to respond. When this happens, our AWS load balancer returns 504s. We've ruled out a memory or CPU issue, as those look normal. The SSR servers do recover, as there is normal activty before and after the spikes in response times. The 504s only happen on requests to nitro and not on static assets. We also tried reducing our http client timeout down to 10 seconds for when the SSR server needs to make requests out, but the problem persisted. We are having a very hard time figuring out what the cause is. Looking at load balancer logs, there is no particular route which stands out as the offender. Any suggestions on how to instrument further or find the potential cause?
1 replies