Novu is slow even after creating indexes
Hi everyone, we've self-hosted novu and its quiet slow.
Thanks to Novu team's old responses in some of the thread, we found and created indexes based on
lib/dal
-- https://github.com/novuhq/novu/tree/next/libs/dal/src/repositories
However, even after creating indexes, its quiet slow and uses a lot of CPU. It uses 100% CPU (on M50 - 8vCores, 32 gigs of RAM).
It uses 100% CPU and the query times are >20s
many times.
The main spike we see in screenshot is when we have a schedule. On an average there are 20-25k triggers throughout the day that goes out at the scheduled time.
I've noticed activity feed on frontend load quickly compared to what we had without indexes.
Is this expected? If not how can we improve this?
Any help would be appreciated
Thank you for your time and efforts.
(Lmk if I should move this to #⚓│community-self-host)20 Replies
Another thing we've noticed is that the DB name is
test
by defualt. Which is a bit weird so just adding it here.
Here are some indexes on jobs
-
And RN there are ~150k docs in jobs
@AN1RUDH, you just advanced to level 2!
This keeps getting worse. 😦
Any help would be greatly appreciated 🙏
@Support
Sorry for tagging, Again, I'm aware that support for self-hosting is not provided but just wanted to get your eyes on it and wanted to confirm if this happens w/everyone or not. (in case we have some misconfigurations or something).
Thank you.
Frequent operations that we do:
- trigger (20-25k triggers for a specific workflow everyday. -- sent to topics)
- update topics and subscribers in topics (e.g. insertion/deletion)
- subscriber creation
- reading notifications
Hi @AN1RUDH You can find suggested indexes in schema.ts file apply those index for all the available collections. this is how I fixed this issue!
Thank you for responding.
I have created indexes from here -- https://github.com/novuhq/novu/tree/next/libs/dal/src/repositories
(on every collection).
And its still pretty slow (checkout my message (screenshot) above from yesterday) and the initial message.
If its possible, can you share how many notifications you send daily?
@AN1RUDH do you have a redis caching service enables aswell? (this is not required, but I do recommend having a dedicated caching service running)
yes I have. I have these environment variables set on API and worker
- REDIS_HOST
- REDIS_PASSWORD
- REDIS_PORT
- REDIS_CACHE_SERVICE_HOST
- REDIS_CACHE_SERVICE_PORT
This is
max
memory usage by Redis.This redis instance is used for the queuing system, I would suggest also running a seperate caching redis cluster you can configure with those env variables:
Our cloud implementation runs with memory db and a multi node cluster, but this should also work with a single node I believe.
We sends about 15 to 17 thousand notification per day
Thank you, appreciate your response.
I've added these env variables (this is a separate Redis instance now) --
(its been some time, and I don't see anything in cache -- not sure if its expected)
EDIT: its still empty, after ~12hr
Also, I have indexes on all collections based on
lib/dal
(screenshot has indexes for jobs
). Yet, the max query times have skyrocketed to ~2 minutes
. (for reference we have 20-25k triggers throughout the day and RN we have >600k jobs
, >190k topicsubscribers
and ~32k topics
if its helpful.)Thank you for sharing, if you don't mind, can you share whats the
avg
(or if possible p90
/p99
) and max
mongo request duration?
Also, do you send notifications to topics? -- We send notifications to topics having many subscribers.This is still an issue --
- I see the cache service using this shared provider -- https://github.com/novuhq/novu/blob/next/libs/application-generic/src/services/in-memory-provider/providers/redis-provider.ts#L47-L55
- not sure but this is not reading cache service credentials?
- I think at some point, novu logs the
host
of Redis that its connecting to. I see log for the redis instance used for bullmq, but I do not see the host
set by REDIS_CACHE_SERVICE_HOST
- I've set IS_SELF_HOSTED=true
, yet I see logs where the pattern is [Provider: Elasticache] ...
and not with [Provider: Redis]
-- https://github.com/novuhq/novu/blob/next/libs/application-generic/src/services/in-memory-provider/cache-in-memory-provider.service.ts#L44-L50btw we use Azure Cache for Redis. Tho I don't think it should be an issue -- I did see Azure cache for redis related env vars in Novu's codebase, but not sure if I should set those.
@AN1RUDH could you send me a DM And we will try to find sometime to jump on a quick call with screen sharing? I will try to help out if I can
Yes sure, Thank you very much!
hey @Dima Grossman @AN1RUDH we are running into same issue do you find any solution. we have created index on mongo but still same issue. and we have hosted external mongo of 8 core 12gb and CPU Utilization is 100% and our daily triggers around 10000. please help 😥
Hi Vikram, sorry I missed this notification.
We noticed that we have
Find
and FindAndUpdate
queries in particular on jobs
collections, that are taking a lot of time. I'll be trying to find what those queries are (exactly) and if we have correct indexes.
Do you have the same issue? If you're on mongoDB atlas, you should be able to see slow queries I think. 🤔@AN1RUDH, you just advanced to level 3!