R
Railway•3mo ago
Jean

Major slow down with all non-cached requests pending

Since around 3 pm (Paris) our users encounter major slow downs on our app. All requests to our api service remain in "pending" status for a while and the page needs several minutes to load fully. - This occurs the day we have a lot of new users / users beginning to use our app intensively (as they are back to school) - My api service serves a Directus app with PRESSURE_LIMITER_ENABLED but in our case we do not get any 503 error like the pressure limiter throws in case of overloading https://docs.directus.io/self-hosted/config-options.html#pressure-based-rate-limiter - The metrics doesn't seem to be saturated (in attachment) - I have no unusual errors in my logs - My other service Grafana works like a charm querying the same Postgres DB - I use the v2 runtime since last night on all my services - Our project is hosted in Amsterdam mostly for french users - We subscribed to a pro plan It looks like there is some bottleneck / throttle somewhere on the network that I can not access to. So, after checking evrything I could, I need your help. Thanks in advance!! Video of the user experience https://youtu.be/e8wVv_bSaXM Project Id: 65aff0db-6586-4be0-8420-b2e67ae4378d
Configuration Options | Directus Docs
Environment variables are used for all configuration within a Directus project. These variables can be defined in a number of ways, which we cover below.
No description
No description
No description
No description
Solution:
can you go ahead and add 3 replicas to the api?
Jump to solution
56 Replies
Percy
Percy•3mo ago
Project ID: 65aff0db-6586-4be0-8420-b2e67ae4378d
Brody
Brody•3mo ago
hello, do you have any idea on how many RPS you may be seeing?
Jean
JeanOP•3mo ago
Presently not but I will dig I may have it soon
Brody
Brody•3mo ago
perfect! backend is the directus service right?
Jean
JeanOP•3mo ago
Yes I will have the request number soon
Brody
Brody•3mo ago
perfect
Jean
JeanOP•3mo ago
As the service just restarted some users may have been logged out Btw, presently the experience is smooth (First time for 5 hours)
Brody
Brody•3mo ago
the current RPS that is being reported would be lower than our RPS limit, so you aren't running into any kind of platform limitations at the moment. keep an eye on this RPS number when / if you see issues again and feel free to ping me with that info
Jean
JeanOP•3mo ago
(some pendings occur but not as long as before the redeploys)
Brody
Brody•3mo ago
at this time, id have to say this is an application level issue maybe you could try something like increase the postgres pool count?
Jean
JeanOP•3mo ago
But Directus has no query limit by default, they shouldn't be in pending mode, right?
Brody
Brody•3mo ago
im sure there are more factors at play here, can you help me to understand your infra more?
Jean
JeanOP•3mo ago
Can you give me the limit rps rate so I know if it at an app level or noit plz? Sure, you don't have access to my project?
Brody
Brody•3mo ago
i dont know if i can give out the current values for that, sorry, but you are currently well under the limit i do, but id like to understand how it all works together for example, im now seeing the rps for the api, but you said requests to directus are pending, not the api ?
Jean
JeanOP•3mo ago
Yes I guess, I'm very surprised by this issue 🙂 Request to api.hiphiphip.app are pending the backend service is private
Brody
Brody•3mo ago
the api calls directus via the private network?
Jean
JeanOP•3mo ago
- xxxx.hiphiphip.app is a Directus instance with the admin enabled. - api.hiphiphip.app is the same Directus (cloned) without the admin ^panel enable - www.hiphiphip.app calls api.hiphiphip.app (never the bo directly) - the 2 directus services, bo and api access Postgres and Redis through the private network only - there is also a grafana service using Postgres and Redis (via the private network) and a last service backuping Postgres at 5am to AWS
Brody
Brody•3mo ago
are you absolutely positive you are doing all the communicate that you can over the private network?
Jean
JeanOP•3mo ago
yes it costed us too much 🙂 I changed all this the pas week
Brody
Brody•3mo ago
haha yeah that can happen
Jean
JeanOP•3mo ago
No description
Jean
JeanOP•3mo ago
As you can see, the last 3 remaining services having egress are Frontend (green), api (yellow) and grafana (red which had metrics published on our blog until today) Postgres is violet (with the backups every day) and redis is red If you look close you can see the backup sending to aws every night in blue
Brody
Brody•3mo ago
gotcha, thank you for the rundown is there anywhere i could go to see these pending requests?
Jean
JeanOP•3mo ago
I'll create you a demo account
Brody
Brody•3mo ago
thanks!
Jean
JeanOP•3mo ago
The account is being created but the app is quite slow again... :/
Brody
Brody•3mo ago
not seing anyting that would indicate an issue on our side of things, perhaps you could give the api more replicas? start with 3
Jean
JeanOP•3mo ago
- https://www.hiphiphip.app/login - id: [email protected] - pwd: DoYouMakeEgressDiscounts?123
HipHipHip.app
Student reviews planner
Brody
Brody•3mo ago
we do not lol, but kudo's for trying 😆
Jean
JeanOP•3mo ago
Thank you, my brain is totally out of use presently :oop:
Solution
Brody
Brody•3mo ago
can you go ahead and add 3 replicas to the api?
Brody
Brody•3mo ago
if one of your api services in not able to handle your volume of traffic, 3 might be able to
Jean
JeanOP•3mo ago
also can i disable the legacy proxy?
Brody
Brody•3mo ago
you would want that off, yes off on everything, the new proxy is far superior at the same time go ahead and add those 3 replicas
Jean
JeanOP•3mo ago
it's deploying
Brody
Brody•3mo ago
2 is close enough to 3 haha
Jean
JeanOP•3mo ago
^^
Brody
Brody•3mo ago
can you disable the legacy proxy on your other services too please
Jean
JeanOP•3mo ago
hum, railway seems to be buggy: it doesn't propose to deploy when i disable the legacy proxy
Brody
Brody•3mo ago
thats normal, that change is not a part of the staged changes
Jean
JeanOP•3mo ago
ok so we're good
Brody
Brody•3mo ago
nope, you'd only ever see the page for one replica since incoming requests are round robin
Jean
JeanOP•3mo ago
Yes! The service is perfect now! understood
Brody
Brody•3mo ago
okay cool so it seems directus was just a little stressed out is all
Jean
JeanOP•3mo ago
Yeah, thank you very much! The odd point is that it doesn't return the expected 503
Brody
Brody•3mo ago
if you gain more userbase, you can always add another replica!
Jean
JeanOP•3mo ago
We'll probably need to I didn,'t expect that to come so early 🙂
Brody
Brody•3mo ago
happy it was an easy fix!
Jean
JeanOP•3mo ago
yes! I full a little dumb actually :mildpanic: *feel
Brody
Brody•3mo ago
nah dont worry about it, it took me until now to suggest it too lol
Jean
JeanOP•3mo ago
hahaha Let's stop to consume your time, we risk to have to pay some other fees :p Thank you very much for your fast and clever support
Brody
Brody•3mo ago
happy to help! i wish you all the best with your service and its growth!
Jean
JeanOP•3mo ago
:salute:
Jean
JeanOP•3mo ago
Just a question: Is there some autoscaling feature in the pipe, depending on the moment of the day this could save resources and money?
Brody
Brody•3mo ago
we do not have any immediate plans for auto h-scaling
Jean
JeanOP•3mo ago
Ok, thank you!
Want results from more Discord servers?
Add your server