R2 errors

Interesting 🤔 not sure if anyone else monitors a lot of data like me when it comes to R2 but if anyone does just curious if you also notice starting 12/13 the error rate as risen a bit from what it used to be
15 Replies
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Unsmart
UnsmartOP•2y ago
So I do every operation type get/put/delete with files of varying size (0MB, 1MB, 5MB, 25MB). I have a load balancer healthcheck that is set to all regions that calls an api which will do 2 operations, 1 to a US bucket, and one to an EU bucket. After the operation completes/errors I send latency/error data to AE for tracking purposes. The most common error I get is Client Disconnect (10054) happens pretty equal across all operations. But I assume the client is still connected just fine otherwise the data wouldnt be in AE since its only in AE if the error is handled by the request. And a way less common error is We encountered an internal error. Please try again. (10001) happening mostly to the put operation but not that often.
andrew
andrew•2y ago
This kind of thing fascinates me, since I do worry about things like increased error rates if I ever do an S3->R2 switchover... please do report any more findings back to the channel 😄 Thanks for doing that monitoring and reporting it
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Unsmart
UnsmartOP•2y ago
Yeah sure the account is dc941e8156f4a1336ca08481cb6d4222. @sdnts just curious if this ever got looked at? I noticed a user complaining about elevated 500 error rates: https://discord.com/channels/595317990191398933/940663374377783388/1069076313517858888 And just wanted to say I also see another spike in error rates starting at around 2023-01-27 17:00:00 UTC
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Unsmart
UnsmartOP•2y ago
Sounds good and yeah definitely a small % 🙂
andrew
andrew•2y ago
@sdnts Just curious, did this end up getting pushed?
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Unsmart
UnsmartOP•2y ago
So my error rate in the last 24 hours has dropped (image 1), but the overall error rate is at an even higher peak now than the jump that happened on 12/13. Jumping up again on 1/27 (image 2). Pre 12/13 the average errors per 12 hours would be about 100. 12/13 -> 1/26 it was about 900 per 12 hours. 1/27 -> 1/30 its from 1500-3000 per 12 hours.
Unsmart
UnsmartOP•2y ago
It looks like the error rate should be going down to like 600-800 every 12 hours from the release that happened today. But still pretty far from going back to the pre 12/13 average which was 100 every 12 hours. I will say each 12 hour point represents about 300,000 operations that happen so the error rate is still extraordinarily low even with the recent jumps I am seeing
Unsmart
UnsmartOP•2y ago
Over the last 6 hours these are the top errors by operation type and error message. Mostly client disconnects, followed by internal errors. (The top one about network connection lost can be ignored thats a DO error that isnt included in the R2 graphs)
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Unsmart
UnsmartOP•2y ago
Yes that's correct. I only save data in AE if I actually get a response back during the request.
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Want results from more Discord servers?
Add your server