.loadAll() blocks all shards?
Okay, I have finally isolated my problem which I've opened multiple questions about.
The problem is that
(my old bad way of getting all the pieces and reloading them in a loop does the same)
Requirements for reproduction:
- 20+ shards with ~15k guilds (not entirely sure that it's required actually)
- active users using commands
- run the code on all of them
What happens:
1. About half of the shards complete the process within seconds.
2. All clients "freeze" for 10-20 seconds, the shards do not respond to any discord message or interaction. however the event loop does not seem to be blocked because independent functions still run.
3. All shards resume normal operation and the rest of the loadAlls finish.
djs: 14.11.0
node: 16.13.1
sapphire/framework: 4.4.3
21 Replies
A workaround for now is to wait 3 seconds between reloading on each shard... This seems like it causes a small freeze each time, but it's less disruptive
@kyra 🩵🩷🤍🩷🩵 you're more knowledgeable about this kinda stuff
I might have been falling for a red herring this whole time. it might as well be at each loadall only blocks it's own client.
At this point, bomi, share your bot's entire code
Also, what does your bot do that you're using the sharder? Practically speaking, the sharding manager leads to countless of easily avoidable headaches that could be solved by not using it at all, SM is for a tiny fraction of bots, and if you're using it because sharding requirements, it's not needed. @Skyra runs 13k guilds just fine in one process, using internal sharding alone
Multithreading
I'm not sure what you mean by entire here, sorry
You're pointing the wrong thing
What you're having is a feedback loop, shards sending messages to other shards, those sending to the rest, and so on, spamming broadcasts from each shard to the rest of the shards without stop
1 sec ill boil it down as much as possible
Active users using commands makes no effect here because Sapphire loads pieces atomically, it's only unloaded when the new piece is ready and inserted
It's not a problem with Sapphire because otherwise a lot more people would have run into this, specially those who use the HMR plugin as it spams reloads
And ShardingManager may be unreliable at times, but it doesn't cause full app freezes
if I has a feedback loop it'd have multiple console logs, right?
You're running other stuff that's called at the same time as the pieces reload, check for the code at the top level of your commands and the onLoad methods, you have something that's very CPU draining
The reason I say that it's reliant on active users is that I have the same code running on another bot account, and it does not happen, I have nto managed to get it to happen once with that account... T_T
I also recall you did
delete require.cache[]
s (albeit the wrong order, doing it after commands rather than before)
It's possible that whatever modules you're reloading, they're running very expensive operationsOh that would explain why it needs to be done twice
I can't say anything for sure, I know neither Sapphire or Discord.js are the issue, but something in your code is
Something in your code is being called when pieces reload, and they're incredibly expensive and/or have an infinite loop that never breaks for some reason
But here's the most basic boiled down version
(This is in a command)
For unknown reasons (Which is what I think is user activity), it takes 8000ms sometimes
For example you could have multiple modules depending on each other and reading their state, and if you load them out-of-order, you end up executing logic with a cache state of finished rather than a cache state of uninitialised, and that kind of (impure) behaviour can lead to infinite loops
btw you shouldn't use
botclient.stores
, but container.stores
Keep in mind as well that you can run into split cached modules, resulting not only on a memory leak, but also duplicated systems. For example if some files load your database init function, and commands do too, then you reload the database init module (making it re-initialize) and then reload commands without reloading the rest, you end up running 2 database pools/connections simultaneously, and if you depend on them to behave like they're the same object, with its internal queues and everything, things can go south really fast
8000ms sounds like hella heavy initialization logic, on Skyra, loading all pieces (400-500) took around 1 second, and they have a lot of heavy initialization such as rendering +6 canvas to PNGYeah it "should" take 300-400ms as it does when it does not take 8000+ms
Maybe check that, whether or not you have a very heavy or a long-running
onLoad
hookoh my god
Yeah the times it took 8000ms was then require cache was empty
I've red-herringed everyone because I deleted some old junk on the test bot
That old junk needing to be loaded in is what took 7600ms in loadall
Sorry about that, lesson learnt I hope
Thanks for all the help
Im nto sure which message to mark as sollution
Perhaps this
Or this
Whichever the "old junk on the test bot" falls into
The lesson here for me is to not modify anything else on the test bot ever because then I'll have a red herring like this