Kian finding possible R2 bugs

Kian solving R2 bugs
59 Replies
Isaac McFadyen
Isaac McFadyenOP•3y ago
@kian Now you don't clutter up the main R2 channel 🙂
kian
kian•3y ago
Aaaaaaaaaaaaand it was because I didn't include a body in my replayed requests. So where does NoSuchBucket come from in Ben's case blobcatthinking is it the aws s3 tool that you're using or rclone? NoSuchBucket that I was getting was just a bad request (no body in the PUT), thought I'd copied everything over but evidently not. A few other people have reported NoSuchBucket in the past with multipart uploads though, including rclone 1.59, so I'm still curious as to how it happens
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
Sure, go for it
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
My assumption is that it has a hiccup somewhere, possibly something gets lost in transit, and a bad request causes NoSuchBucket but I honestly can't understand how a client would do it. I can cause that easily, but that's by sending a bad request. How big are the files?
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
I expect it to be a pretty niche bug I think multipart concurrency in R2 can be a bit temperamental at a certain level which could cause it to intermittently say no - but in my experience that returns a signature error
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
Does that upload failed exit the command entirely?
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
beelzabub
beelzabub•3y ago
Make sure you're using a concurrency of <= 3 for multipart
How does S3 'know' what operation I ran, other than the HTTP method and whatever it can determine from the query params?
I don't understand the question. The route is fully determined by the query params & method
kian
kian•3y ago
Fair enough - that was pretty much what I was wondering, if that's how or not. Lemme try the same scenario against S3 and see what they return
beelzabub
beelzabub•3y ago
Let me know how you replicate the NoSuchBucket It might be representative of a deeper error
kian
kian•3y ago
Seems to be Content-Length related. Since I sent with no body, it was sending Content-Length: 0 Ehhhh, maybe not so simple - I mean I am saying that by sending Content-Length to 1 when I'm not actually sending 1 byte The NoSuchBucket scenario doesn't actually give me anything but a 200 OK from AWS So resending an existing part is fine (200) and then an aborted one returns NoSuchUpload.
kian
kian•3y ago
No description
kian
kian•3y ago
To clarify: S3 returns NoSuchUpload when the multipart upload is cancelled, regardless of body. R2 returns NoSuchBucket if there is no body and NoSuchUpload if there is a body. As far as resending an UploadPart with a different body to the original one: S3 accepts it with a 200 OK and returns the ETag in a header R2 returns InternalError if there is a body and NoSuchBucket if there is no body
kian
kian•3y ago
No description
kian
kian•3y ago
So I can just resend any random UploadPart to S3 with whatever body I want, or no body at all, and it'll accept it with a 200 OK status.
kian
kian•3y ago
No description
kian
kian•3y ago
Doing the same in R2 returns InternalError if I provide a body, or NoSuchBucket if I don't provide a body. I don't know if this is an actual bug since it seems like a pretty niche case that is entirely made-up with my Postman requests, or if it's just a design decision that's up to the provider.
beelzabub
beelzabub•3y ago
Pretty sure the NoSuchBucket is an error on our end Very strange. The code internally thinks it's returning NoSuchUpload but somehow the client is seeing NoSuchBucket rendered
kian
kian•3y ago
blobcatthinking
beelzabub
beelzabub•3y ago
(tracing through with console messages)
kian
kian•3y ago
It'd explain why you couldn't line up the error messages (but could line up the timestamps) earlier
beelzabub
beelzabub•3y ago
Well our logs only capture the http status annoyingly. NoSuchUpload and NoSuchBucket are 404 So tracing through the code it was clearly going to be NoSuchUpload
kian
kian•3y ago
Funnily enough, if I do actually send an invalid bucket name, I get NoSuchUpload rather than NoSuchBucket
beelzabub
beelzabub•3y ago
That's because of the decryption piece again 🙂 Can't trick me
kian
kian•3y ago
notlikecatgoogly
beelzabub
beelzabub•3y ago
(the bucket name is part of the key material used to encrypt the upload id) anyway the mystery deepens
kian
kian•3y ago
The InternalError I get when I'm re-sending an UploadPart in Postman, I also can't replicate in aws-sdk-js
[AWS s3 200 0.97s 0 retries] uploadPart({
Bucket: 'sdk-example',
Key: 'test.dat',
PartNumber: 1,
Body: 'abc',
UploadId: 'AE9eaMe15Tv7HIxpQiVGBoLqsLc4wZQz+JxvRcCgu7NbZEBhM9z7RrHsI5Xrpp24Y/qmEz+TiFpIoJiprVrUpn2RaX+hCxpk//vMsHN10Vt3t8Hh6n33bwtOFJbQ9wdzfm2rFzPnq2qQsz0dpGpEdpk/r6TlvqnGfvWU60EtwJgym8tSLrXS+Px8W/URohF3VPaYlE883FBwQY2dETnTx81S7F41JxuP8ngcl/WFjlXH/q7D6QFpY8OsamPwLBRKvqKE/ZfUrSK6LuHkNraxfQ0ERthLMjTWGRjkn5zepQ2s2HC+or/ZBOlPAuGl29GTEA=='
})
uploadPart1 status code: 200
[AWS s3 200 1.034s 0 retries] uploadPart({
Bucket: 'sdk-example',
Key: 'test.dat',
PartNumber: 1,
Body: 'aaa',
UploadId: 'AE9eaMe15Tv7HIxpQiVGBoLqsLc4wZQz+JxvRcCgu7NbZEBhM9z7RrHsI5Xrpp24Y/qmEz+TiFpIoJiprVrUpn2RaX+hCxpk//vMsHN10Vt3t8Hh6n33bwtOFJbQ9wdzfm2rFzPnq2qQsz0dpGpEdpk/r6TlvqnGfvWU60EtwJgym8tSLrXS+Px8W/URohF3VPaYlE883FBwQY2dETnTx81S7F41JxuP8ngcl/WFjlXH/q7D6QFpY8OsamPwLBRKvqKE/ZfUrSK6LuHkNraxfQ0ERthLMjTWGRjkn5zepQ2s2HC+or/ZBOlPAuGl29GTEA=='
})
replayPart1 status code: 200
[AWS s3 200 0.97s 0 retries] uploadPart({
Bucket: 'sdk-example',
Key: 'test.dat',
PartNumber: 1,
Body: 'abc',
UploadId: 'AE9eaMe15Tv7HIxpQiVGBoLqsLc4wZQz+JxvRcCgu7NbZEBhM9z7RrHsI5Xrpp24Y/qmEz+TiFpIoJiprVrUpn2RaX+hCxpk//vMsHN10Vt3t8Hh6n33bwtOFJbQ9wdzfm2rFzPnq2qQsz0dpGpEdpk/r6TlvqnGfvWU60EtwJgym8tSLrXS+Px8W/URohF3VPaYlE883FBwQY2dETnTx81S7F41JxuP8ngcl/WFjlXH/q7D6QFpY8OsamPwLBRKvqKE/ZfUrSK6LuHkNraxfQ0ERthLMjTWGRjkn5zepQ2s2HC+or/ZBOlPAuGl29GTEA=='
})
uploadPart1 status code: 200
[AWS s3 200 1.034s 0 retries] uploadPart({
Bucket: 'sdk-example',
Key: 'test.dat',
PartNumber: 1,
Body: 'aaa',
UploadId: 'AE9eaMe15Tv7HIxpQiVGBoLqsLc4wZQz+JxvRcCgu7NbZEBhM9z7RrHsI5Xrpp24Y/qmEz+TiFpIoJiprVrUpn2RaX+hCxpk//vMsHN10Vt3t8Hh6n33bwtOFJbQ9wdzfm2rFzPnq2qQsz0dpGpEdpk/r6TlvqnGfvWU60EtwJgym8tSLrXS+Px8W/URohF3VPaYlE883FBwQY2dETnTx81S7F41JxuP8ngcl/WFjlXH/q7D6QFpY8OsamPwLBRKvqKE/ZfUrSK6LuHkNraxfQ0ERthLMjTWGRjkn5zepQ2s2HC+or/ZBOlPAuGl29GTEA=='
})
replayPart1 status code: 200
I think Postman is just cursed, somewhere. What else is apart of the uploadId? User-Agent or anything like that?
beelzabub
beelzabub•3y ago
No Nothing you would mess up with postman For the InternalError I know what that is. It should be returning BadUpload
kian
kian•3y ago
Is what's apart of uploadId up to implementation or is it defined in the S3 spec? i.e AWS just says Upload ID identifying the multipart upload whose part is being uploaded.
beelzabub
beelzabub•3y ago
Although it's on my list of things to fix in the next month or so when mutlipart gets revamped (I won't be doing the work but I'll be overseeing it) UploadId is an opaque string It's tied to the account + bucket + object name
kian
kian•3y ago
Gotcha
beelzabub
beelzabub•3y ago
The internal error you're getting is because the upload size is changing but that's not really allowed All parts except the last must be the same size
kian
kian•3y ago
Oh yeah, I think I recall that discussion when ncw was getting the R2 provider into rclone or at least I remember seeing it then - could just be losing my mind
beelzabub
beelzabub•3y ago
Which is technically different from S3 but meh The interpretation I wrote for that logic though is more complex than it needs to be I think. That's part of the cleanup work scheduled in the next month or so By GA for sure anyway - back to figuring out why 'Upload Not Found' == 'Bucket Not Found' returns true
kian
kian•3y ago
R2 needs to have a deep-dive blog post, or like the 'How Workers Work' pages in the documentation - it all sounds pretty blobcatthinking don't know why I'm googling about the two now since I'm fairly certain I won't be finding R2 on Google.
beelzabub
beelzabub•3y ago
I have 3 deep dive blogs written but they haven't been scheduled OMG. I found the bug. FML
kian
kian•3y ago
The question is how many keystrokes it'd take to fix the bug haha Just a bad comparison/match?
beelzabub
beelzabub•3y ago
Kind of Turns out we don't have strict-boolean-expressions on for our linter and I think there was a bad merge conflict or something. But the error conditions look like this:
if (uploadEmptyPartResp.err) {
if (
(uploadEmptyPartResp.val.category == BucketErrorCategory.BucketNotFound,
errContext)
) {
return error.fail(request, error.NoSuchBucket)
}
if (
(uploadEmptyPartResp.val.category == BucketErrorCategory.UploadNotFound,
errContext)
) {
return error.fail(request, error.NoSuchUpload)
}
if (
(uploadEmptyPartResp.val.category ===
BucketErrorCategory.TooMuchConcurrency,
errContext)
) {
return error.fail(request, error.TooMuchConcurrency, errContext)
}
throw uploadEmptyPartResp.val
}
if (uploadEmptyPartResp.err) {
if (
(uploadEmptyPartResp.val.category == BucketErrorCategory.BucketNotFound,
errContext)
) {
return error.fail(request, error.NoSuchBucket)
}
if (
(uploadEmptyPartResp.val.category == BucketErrorCategory.UploadNotFound,
errContext)
) {
return error.fail(request, error.NoSuchUpload)
}
if (
(uploadEmptyPartResp.val.category ===
BucketErrorCategory.TooMuchConcurrency,
errContext)
) {
return error.fail(request, error.TooMuchConcurrency, errContext)
}
throw uploadEmptyPartResp.val
}
Let me know if you spot the problem uploadEmptyPartResp.val.category is indeed BucketErrorCategory.UploadNotFound but we're also taking that first conditional path
kian
kian•3y ago
uploadEmptyPartResp.val.category == BucketErrorCategory.BucketNotFound, errContext admittedly confuses me
beelzabub
beelzabub•3y ago
So most likely @b3nw I'm guessing you were getting TooMuchConcurrency sent back
kian
kian•3y ago
So aws s3 defaults to 10 concurrency, uploadEmptyPartResp.val.category was BucketErrorCategory.TooMuchConcurrency but ended up going down the first path & returning error.NoSuchBucket?
beelzabub
beelzabub•3y ago
Yeah
Isaac McFadyen
Isaac McFadyenOP•3y ago
K, I gotta ask. Whats the a == b, c syntax do?
kian
kian•3y ago
so in my psuedo-JS to replicate the same stuff (and console.log instead of return), somehow it matches every if statement because of errContext errContext being 'truthy' is matching the if blocks, regardless of uploadEmptyPartResp.val.categoryblobcatohhttps://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Comma_Operator
Isaac McFadyen
Isaac McFadyenOP•3y ago
Huh
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
No description
kian
kian•3y ago
No description
kian
kian•3y ago
bless eslint Welp, interesting bug haha I'd never heard of the comma operator before 2 or 3 I believe 3 works but could be a little flaky? Id use 2
kian
kian•3y ago
No description
kian
kian•3y ago
Found this when searching for concurrency :^) I guess my InternalError for a changed part size will be BadUpload eventually and the NoSuchBucket will return the proper NoSuchUpload
beelzabub
beelzabub•3y ago
It was a regression introduced May 30 (not that message)
kian
kian•3y ago
The word regression just reminds me of Sentry emails - Sentry marked FOO-123 as a regression - constantly. I couldn’t live without Sentry though Our apps are PHP/Laravel and the stack traces & context it brings with exceptions is invaluable
Unknown User
Unknown User•3y ago
Message Not Public
Sign In & Join Server To View
kian
kian•3y ago
Will archive this

Did you find this page helpful?