java.io.EOFException on dev.kord.gateway.DefaultGateway
Hey!
I am getting some spam of errors on Kord, and after a while it seems to just render my application dead and I have to restart it.
As the stacktrace doesn't show much, I was wondering if anyone could have a better idea of how I could troubleshoot or fix this 🙂
I am a newbie with Kord/Kotlin, so I might ask a lot of follow ups
67 Replies
this is just okhttp being bad
personally i use the java ktor client for gateway stuff
ktor-client-java
ok, atm I can not change it
would you think it has any chance to be related to long latency times to respond to commands?
my bot is having issues sometimes where it takes up to 3 minutes for command events to be received (since their createdAt timestamp)
shouldn't
unless the entire jvm freezes during that period and the connection drops
but I think that would be connection reset instead
i've gotten EOF exceptions just running my private bot locally
https://canary.discord.com/channels/556525343595298817/587324906702766226/1233451713613135982
just looked at okhttp's github and many other people have reported similar issues but the team can't reproduce so there's nothing for them to do :clueless:
one of the contributors said v5 includes a fix for websocket concurrency issues
ktor-client-okhttp uses okhttp v4 though
https://github.com/square/okhttp/issues/3113
I think your server is kicking you!can you elaborate on
after a while it seems to just render my application dead and I have to restart it.is there anything else in your logs to indicate why it might be "rendering your application dead" Kord attempts to reconnect on exceptions
well i just notice nothing else was being handled, no new events
we have a gateway microservice handling some kord events, it's usually spammy
and sometimes it would just stop writing anything to logs
but this picture i sent above is from another application, the actual bot, which also handles other events
and these should mostly be command from users, the ones with high interaction latency above
we have this issue every Sunday, somewhat linked it to some brazilian discord servers (our bot is for a game with heavy brazilian population)
it seems these servers have a lot of crypto bots that spammed memberupdatevents and it seemd handling these was causing this latency spike in our bot
however, after a lot of support and discussion on kordex side https://discord.com/channels/1121419906995458098/1323320267031838791
we are running out of ideas
we have improved the current code to skip events we don't care about, and for the ones we do, we also skip if they come from bot users
but still, our bot behaves like this
i'm fairly certain the eof exceptions are separate from your other issues (but dont quote me on that :p)
i do recommend swapping to the
ktor-client-java
engine thoughwill note it as an enhancement for us to do
however we are trying to focus on figuring out this latency issue as it affects us and all users every Sunday for around 24h (00:00 to 23:59 BRT)
@Tschis Hello
enable the stackTraceRecovery flag, it might be eaten up by coroutine
we already have this set
is it a private server?
if you mean bot, it is private, yes
it can be added to any server
is it in many servers?
can you please share the logs after and before the error too?
many is relative, but it is in more than 16k
There is nothing relevant, just normal logic logging
are you sending custom events down the gateway? (asking since that's what I got from the info logs)
the addMembertodb here
no, this gateway only listens to Kord events
this addMemberToDatabase is just a reaction to a MemberJoinEvent
does it happen after certain events?
I became rusty with replies, excuse me
I do not think so, it seems to happen after some time the gateway is running
I have not noticed any specific trends
what is your logging level @Tschis ?
at the moment it is running on INFO
set it to trace
and let it run for a bit
it might show some interesting trends / discord errors that may help us finding out the actual issue
my suspection is that discord is sending no-reconnect error codes that renders a connection dead over time
I will need to get back to you on that becase trace is quite spammy and Im not sure if this affects some of our grafana quotas
Mmmmm
i can run it locally and see if it's reproduceable
yes, that will do
kord-gateway logs error codes though
so unless there are "Gateway closed: <code>" logs this is literally just okhttp being bad
@viztealong time no see
do i know you?
I'll leave that to your imagination
ok
you're weird
I too am regularly coming across this
it is slightly irking me
What engine are you using?
whatever the kord default is
:)
can you set your trace level to trace
this is prod and i can't repro locally so it'll take me a little while to get back to you
and it seems to take a long time to appear
I did realize that
Ok bot has restarted , I'll let you know when i get one
didn't get the error over night on my local
Thank ya'll
The error has come up in my logs, I'll look at the surrounding trace once I finish work 👍
@Moon
@NoComment is it possible for you to swap to java client?
possibly
I don't know how but I can try tomorrow?
that fixed the problem for me
should be able to just do something like this
websocket_client is just my HttpClient(Java) with
WebSockets
installedcan you show plz
?
HttpClient(Java) { install(WebSockets) }
thenk
I will try tomorrow
actually fuck it i'll do it over night
Where is the gateway builder?
kord block
idea disagrees
yeah wait thats specific to my fork
oop
:wubble_explode:
one sec
i should really pr this into kord
what lib is the http client
there's like 4 options and none of them want java lmao
ktor-client-java
and what is the java?
doesn't exist for me
its this artifact
ah duh
io.ktor:ktor-client-java:<ktor version>
time to find out what kordex uses
kord uses 3.0.0
a
cool, overnight test begins momentarily
@NoComment any update?
Seems to be good
Irritatingly I went to bed before discovering that apication commands failed to sync
So like 10 hours of it was errors of people trying commands
But since then all seems well
:omegalul:
Looks like the java client has fixed the funny
I am not sure where the okhttp client is coming from
Our gateway shoud be usingCIO engine from ktor 🤔