Kord doesn't reconnect
I pretty much have issues since day 1 with my bot reconnecting, sometimes it just spams RetryLimitReachedEvent in my log
I don't know whether this is just an issue with my logging, but the bot actually doesn't work until a full restart
141 Replies
Also in the process of connecting 14 shards it pretty much happens all the time that when the last shard got connected other shards have been disconnected again
I haven't noticed this with an unsharded bot so it might be related to that
Hm, can you call the
GatewayBotGet
endpoint to see the max_concurrency
for your bot? Maybe it's because we don't properly handle it.It's hard to experience that issue with an unsharded bot as it only connects one shard
I think we do connect shards in parallel
and you are only allowed to connect one concurrently
so it's this issue again, you already opened https://github.com/kordlib/kord/issues/625
GitHub
Properly implement login rate limiting · Issue #625 · kordlib/kord
Currently we connect all shards at once and rate limit the identify command, which causes session resets and timeouts (see #624) A better solution would be to implement rate limiting for logins
IG that's what that means yes
yes
Kord seems to eventually give up reconnecting
I have an all shards ready event
And it consumes all other ready events and if all shards on the current instance are ready it fires
And it has logging and that logging tells me "waiting for [<all shards>]
Kord doesn't log anything
So it could be that which is broken
But that shouldn't affect commands
That event doesn't change some state
2022-09-19T20:28:01.550055599Z 2022-09-19 20:28:01.546 [DefaultDispatcher-worker-8] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 4 RetryLimitReachedEvent, Awaiting login from: [4, 12]@LustigerLurch this also happens
does this log come from kord?
It doesn't
It's my logging
but that basically means a RetryLimitReachedEvent was fired
And it never fires a ready event for that shard again
what does the trace logging for gateway events show?
gonna download .4 GB of LOGS real qiuck
however since enabeling trace logging the issue hasn't occurred yet
Fleet doesn't want to open it http://rice.by.devs-from.asia/u/4869Mq.png
srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.302 [DefaultDispatcher-worker-6] TRACE dev.kord.gateway.DefaultGateway - Gateway >>> {"op":2,"d":{"token":"token","properties":{"os":"Linux","browser":"Kord","device":"Kord"},"compress":false,"large_threshold":250,"shard":[12,15],"presence":{"status":"dnd","afk":false,"game":{"name":"Starting ...","type":0}},"intents":"3243773"}} srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.521 [DefaultDispatcher-worker-22] TRACE dev.kord.gateway.DefaultGateway - Gateway <<< {"t":null,"s":null,"op":9,"d":false} srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.540 [DefaultDispatcher-worker-10] TRACE dev.kord.gateway.DefaultGateway - gateway connection closing srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.543 [DefaultDispatcher-worker-10] TRACE dev.kord.gateway.DefaultGateway - Gateway closed: 4900 reconnecting srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.544 [DefaultDispatcher-worker-4] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 12 SessionReset, Awaiting login from: [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14] srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.544 [DefaultDispatcher-worker-10] TRACE dev.kord.gateway.DefaultGateway - handled gateway connection closed srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.546 [DefaultDispatcher-worker-13] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 12 RetryLimitReachedEvent, Awaiting login from: [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14] srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.701 [DefaultDispatcher-worker-10] TRACE dev.kord.gateway.DefaultGateway - Gateway <<< {"t":null,"s":null,"op":10,"d":{"heartbeat_interval":41250,"_trace":["["gateway-prd-main-gh00",{"micros":0.0}]"]}} srv-captain--votebot.1.kyi5nfvreqag@v220210987031163663 | 2022-09-19 21:41:12.703 [DefaultDispatcher-worker-12] TRACE dev.kord.gateway.DefaultGateway - Gateway >>> {"op":1,"d":null}Okay so op 9 is "The session has been invalidated. You should reconnect and identify/resume accordingly." That's what happens during startup
yeah, pretty sure it's because you are only allowed to login one shard at a time
we should fix this!
yeah
maybe this also causes the other disconnects
because if one shard disconnects
I get rate limmited
@I love Gradle files this is what I have so far as a replacement for the existing
MasterGateway.startWithConfig
:
I am on my phone rn
But looks good at first glance
Could you try out with
feature-login-rate-limiting-SNAPSHOT
?sure
I did all of this in a hurry but I still have session resets
do you use kordex?
it might have to be recompiled
cause there was an inline functions somewhere in this pr
@MrPowerGamerBR are you familiar with sharding? If yes, could you take a look at this PR too? https://github.com/kordlib/kord/pull/693
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
this is the example from the docs, right? I've just pushed a test for the example and it passes (assuming I didn't write a wrong test :kek: )
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
ok but why would you say the code I wrote is incorrect for that example?
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
exactly
but I'm still thinking if we should look at the actual bucket instead (see https://github.com/kordlib/kord/pull/693#discussion_r974822101)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
that's part of the
Gateway
(individual connections) code, this PR is about the MasterGateway
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
oh wait I think I know what you mean
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
yeah, rn we don't do this properly
as it is rn, this is just about the inital startup
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
they are
rate_limit_key
0
but bucket 0 and 1
that's what this is aboutUnknown User•3y ago
Message Not Public
Sign In & Join Server To View
ah wait, they wouldn't, we wait when they key is
<=
the previousUnknown User•3y ago
Message Not Public
Sign In & Join Server To View
so which one should we use
shardId / maxConcurrency
or shardId % maxConcurrency
?Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
still don't understand why we would need a mutex there
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
so for the case a shardId is used twice?
that's not possible
gateways
is a Map<Int, Gateway>
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
but in the impl I have rn, shard 16 wiht rateLimitKey 0 would wait too, because the rateLimitKey is <= the previous one
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
shard 15 has key 15, 16 has key 0 -> 0 <= 15 -> same thing
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
wait, I just noticed that we already have
identifyRateLimiter
in DefaultGateway
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
@I love Gradle files do you use any custom gateway logic?
or does kordex do?
I don't
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
the thing is that individual reconnects also use the
identifyRateLimiter
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
The issues might be related as I stated earlier
I'm wondering if I could reproduce your issues with a bot in 4 servers that I force to use 15 shards :D
yeah, I can
receiving session resets too
that's good
That's actually good yes
ok I think I know what's wrong: the limit to the concurrent Identify requests per 5 seconds specified here https://discord.com/developers/docs/topics/gateway#rate-limiting actually mean the limit of concurrent connections opened that then send an Identify request
oh, nevermind, still getting session resets
I was right, just delayed in the wrong place
hm, now I actually need to implement a new
@I love Gradle files could you try again, I've pushed a change that fixes this and can no longer reproduce session resets with it
RateLimiter
that gets consumed before starting a connection and released when the identify command was sentTmr
sure 👍
@I love Gradle files did you have time to try it yet?
Nope
Will notify you asap
After thinking about this some more, it might actually be better to do something similar to this and replace the
identifyRateLimiter
with it. This would then not only work for initial login but also reconnects that require sending a new identify. So essentially like your Microservice but with a coroutine boundary using e.g. channels.
Nonetheless, the impl in the PR rn should work for the initial loginI have no Idea whether this uses the latest build though
because I cannot control the CI dependency cache
2022-09-21T14:28:40.912679797Z 2022-09-21 14:28:40.912 [DefaultDispatcher-worker-29] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 0 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.914200194Z 2022-09-21 14:28:40.914 [DefaultDispatcher-worker-22] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 1 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.915295882Z 2022-09-21 14:28:40.915 [DefaultDispatcher-worker-10] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 2 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.916031643Z 2022-09-21 14:28:40.915 [DefaultDispatcher-worker-26] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 3 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.916503271Z 2022-09-21 14:28:40.916 [DefaultDispatcher-worker-25] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 4 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.917879113Z 2022-09-21 14:28:40.917 [DefaultDispatcher-worker-17] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 5 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.917889241Z 2022-09-21 14:28:40.917 [DefaultDispatcher-worker-29] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 7 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.917892347Z 2022-09-21 14:28:40.917 [DefaultDispatcher-worker-29] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 8 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.917895144Z 2022-09-21 14:28:40.917 [DefaultDispatcher-worker-14] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 6 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.917897839Z 2022-09-21 14:28:40.917 [DefaultDispatcher-worker-29] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 9 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.921890224Z 2022-09-21 14:28:40.921 [DefaultDispatcher-worker-25] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 10 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 2022-09-21T14:28:40.921929910Z 2022-09-21 14:28:40.921 [DefaultDispatcher-worker-8] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 11 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]Yeah this happens as well
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
It just disconnects all shards
I could do that if nexus would not be a piece of shi
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
Nexus is so shitty it's unbelievable
Just give me the directory listing
but thanks
It's always such a hassle to access these
okay it seems to reconnect
which is good
@LustigerLurch I no receive ShardGotDisconnected with a ReconnectingEvent
But shards seem to be connected
any idea why this wouldn't work https://github.com/DRSchlaubi/mikbot/blob/main/src/main/kotlin/dev/schlaubi/musicbot/core/Bot.kt#L169-L183
GitHub
mikbot/Bot.kt at main · DRSchlaubi/mikbot
A modular framework for building Discord bots in Kotlin using Kordex and Kord - mikbot/Bot.kt at main · DRSchlaubi/mikbot
did you also recompile kordex? cause the change depends on an inline function that kordex depends on :/
also what's
ShardGotDisconnected
? that's not from kordGitHub
mikbot/Bot.kt at main · DRSchlaubi/mikbot
A modular framework for building Discord bots in Kotlin using Kordex and Kord - mikbot/Bot.kt at main · DRSchlaubi/mikbot
Actually not, but both of that events seem to get fired
yeah, but you need to have an recompiled kordex to see the effects of the changes
but if you have no way to do this right now, I can also hardcode your max concurrency temporarily into the PR (in the non-inlined function that get's called)
@I love Gradle files
recompiling kordex could be kinda difficult rn
yeah, then I can do this
it's 1, right?
pushed it @I love Gradle files, could you try once again, when the build is ready?
will do rn
Apart from not knowing where this comes from it looks good
2022-09-22T15:18:10.034921428Z 2022-09-22 15:18:10.034 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 0 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.038597081Z 2022-09-22 15:18:10.038 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 3 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.038958810Z 2022-09-22 15:18:10.038 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 4 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.039505892Z 2022-09-22 15:18:10.039 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 5 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.043609431Z 2022-09-22 15:18:10.043 [DefaultDispatcher-worker-7] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'dev.schlaubi.mikbot.core.gdpr.api.GDPRExtensionPoint'
2022-09-22T15:18:10.045478922Z 2022-09-22 15:18:10.045 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 6 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.045873364Z 2022-09-22 15:18:10.037 [DefaultDispatcher-worker-1] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 2 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.046375681Z 2022-09-22 15:18:10.039 [DefaultDispatcher-worker-32] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 1 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.047121782Z 2022-09-22 15:18:10.047 [DefaultDispatcher-worker-12] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 7 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.047558204Z 2022-09-22 15:18:10.047 [DefaultDispatcher-worker-18] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 8 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
2022-09-22T15:18:10.047844849Z 2022-09-22 15:18:10.047 [DefaultDispatcher-worker-5] WARN dev.schlaubi.musicbot.core.Bot - Shard got disconnected 9 DetachEvent, Awaiting login from: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
DetachEvent
is fine, it means that the gateway was detached aka finally stopped (Gateway.detach()
), when do these events happen?This should be since startup
They happen during startup, which is confusing me
Could it be that the previous version was still running?
All shards are detached during a shutdown hook
@I love Gradle files
so that actually two instances of your bot are writing to the logs
That log is a docker logs output
which means?
I am quite sure a docker container can't run twice
can you stop, wait a little and then restart the container? if you then don't see these events again, they are probably from the previous running container anyway
Caprover summarizes the log
So that's probably the case
I will see whether disconnect issues appear again
Otherwise this looks great
if all shards disconnect and need to reidentify, the current code will still ratelimit, but I'm already working on a solution based on what I've got so far that can handle that situation too
so it's only for inital startup so far
Oh I see
I've pushed that now
ok will try to test it this weekend ig
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
yes, the new identify rate limiter in #693 also does this:
wait for rate limiter to give permission -> open ws connection and send identify -> wait for ready and notify rate limiter it can allow other shards to identify -> rate limiter waits 5s before giving next permission
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
also faced this issue, that's why :D
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
@MrPowerGamerBR what would be a max concurrency typically for large bots?
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
the reason I'm asking is that I now actually have something similar to this and was wondering if it should be using a dynamically sized map or a fixed size array
(thread safety is no concern)
also is this really
max_concurrency
from here https://discord.com/developers/docs/topics/gateway#session-start-limit-object 🤔Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
alright
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
@I love Gradle files
I would say my fix for this is ready and should work for both initial start and reconnects.
Ok will try next week I think
amazing 👍
@MrPowerGamerBR do you think you could try the identify rate limiting from
feature-login-rate-limiting-SNAPSHOT
on your bot?
This should do the job:
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
thanks :)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
seems like, yeah
the thing is I have no idea if this is needed anymore (the commit that added this used ktor 1.5.2 and now we are on 2.1.2)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
even for Dispatchers.Default: shouldn't coroutines just ignore thread count?
you can actually:
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
the dispatcher has a pool of threads, not coroutines, coroutines can be created arbitrarily
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
that would be a problem
but ideally it shouldn't happen because everything uses suspension not blocking
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
hm, we log this about client thread count:
try this:
cause
maxConnectionsCount
is 1000
by default - which is not enough for your shards but still more than 100
oh, I know why 100:
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
should probably have a warning like this for maxConnectionsPerRoute
@I love Gradle files do you remember anything about this: https://github.com/kordlib/kord/pull/198/files#diff-5ca725c196c4f33561f5f5e1485f518fe93f8e64c0c0deba6b947a8b5f5ceeb1
do you know if the fix is still needed?
I don't haha
thought so 😂
but it seems it is no longer needed, so should we just yeet it out? :kek:
and you weren't limited by the
maxConnectionsCount
of 1000
?Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
:)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
hm, what uses that much memory? there shouldn't be a ton of events
(and cached entities)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
yay
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
yeah, we can't really do anything about this :pained_smile:
thanks again for testing this :)
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
Sure, would be amazing if you could use this to test
feature-login-rate-limiting-SNAPSHOT
too @ToxicMushroom
also how many shards does your bot have (cause if it's more than 100, the http client needs config changes)
Could one of you give #693 a review as a sanity check, then it would be good to go
@MrPowerGamerBR @I love Gradle files? You don't have to but it would be really nice :)I will try at my hotel if mrpowergamer wasn't faster
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
alright
merged and in
0.8.x-SNAPSHOT
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
@MrPowerGamerBR do you remember whether you had to increase
maxConnectionsCount
and maxConnectionsPerRoute
or was it just maxConnectionsPerRoute
?
i'm trying to implement this now so i want to know what should be taken into consideration for this warningUnknown User•2y ago
Message Not Public
Sign In & Join Server To View
then we should check both, thanks anyway :)