Read stream twice without missing data
I want to grab the first byte from a file and then read the entire file (max. 128 bytes) starting from the beginning, including that first byte. However, the problem is that the first byte I initially read is missing. Here's the code:
Any ideas on how to fix this?
51 Replies
Note that
bytesRead
is the number of bytes which were read, not the value of the first byteYes, I know.
That's intended
The first byte shouldn't be missing in your subsequent read though. I've done similar things plenty of times
It does. Another weird case is that sometimes the second buffer only contains the first byte and the rest of the data is lost.
Note that
Read
reads at most the specified number of bytes. It's expected that it can read less
(but if it reads 0, that means you've reached the end)
There's ReadExactly
if you want to read exactly a specified number of bytesThat's the output with the code from above.
And what values does the second
file.ReadAsync
return?This is the second one
What value does it return?
I.e. the number of bytes read
int bytesRead = await file.ReadAsync(buffer, 0, buffer.Length);
1
Right, so all as expected
Call
ReadExactly
, or call Read
multiple times until either it returns 0, or you've got the total number of bytes you want
(afk, back in half an hour)Not really 😅 . If I remove the
file.Seek()
call. I get everything I want except the first byte. Like so:what about just setting the position file.position
Does the location you're reading from support seeking?
Yes
The behaviour is the same.
No, Stream is document to work this way. There's no guarantee that it returns exactly the number of bytes you asked for
Steam?
Well is is if you use ReadExactly
Please, fix this bug. If you still see an issue, then let's investigate. But there's no point digging into one thing when we know you're doing something wrong
canton7
There's
ReadExactly
if you want to read exactly a specified number of bytesQuoted by
<@660066004059029524> from #Read stream twice without missing data (click here)
React with ❌ to remove this embed.
I've told him to use ReadExactly twice now, explicitly
I think it's entirely expected the first 1-byte read will read 1 byte into an internal cache, and then when you rewind / re-read it will first read the entire cache (all 1 byte of it), then the next read will actually read new data from file
(I'd have to re-read the code again to be sure)
But I don't understand the why. And I also don't have access to ReadExactly, I am on .NET 6.
Why is because that's how Stream.Read is documented to work. And various Stream implementations take advantage of this for various efficiency reasons. See for example my previous message
If you don't have
ReadExactly
, then you can do something like:
I'll remember how to do it eventually 😛Thank you.
I believe there might be a misunderstanding between us. My intention is not to remove the trailing zeros from the byte array; that is perfectly acceptable. Instead, my goal is to read one byte, perform certain checks on it, and then read from the start of the original stream until it reaches a maximum of 128 bytes.
Example:
Stream: 1, 2, 3, 4, 5, 6, 7, 8, 9
var tmpBuffer = new byte[1];
Stream.Read(tmpBuffer) // tmpBuffer: 1
var buffer = new byte[128];
Stream.Read(buffer, 0, buffer.Length) // buffer: 2, 3, 4, 5, 6, 7, 8, 9 , 0, 0 , 0, ...
This is expected because the "cursor" moves one position to the right side (right?).
Stream.Read(buffer, 0, 1)
will only read 1 byte. I'm not sure why your buffer
isn't 2, 0, 0, 0, ...
Sorry I made a typo
Now I thought why not just reset the "cursor" using Stream.Position = 0 to the starting position. And then if I read it using Stream.Read. The output should be the following:
Stream.Read(buffer, 0, buffer.Length) // buffer: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0 , 0, ...
Note that the second
Stream.Read
can return anywhere from 1 byte to 128 bytes. So your buffer might be 2, 0, 0, 0, ...
or it might be 2, 3, 0, 0, ...
or... You need to look at the return value to see how many bytes it actually readYes, it depends on how many bytes I have in the byte array I i would like to read. The rest of the remaining bytes will be "filled up" with zeros. Correct?
No. It will read between 1 byte and the maximum number of bytes you requested. It will write to the section of the array that you tell it to (the second parameter). If it doesn't write to an array element, it just leaves it alone: it doesn't fill anything with zeros
From the docs:
Implementations of this method read a maximum of buffer.Length bytes from the current stream and store them in buffer. The current position within the stream is advanced by the number of bytes read; however, if an exception occurs, the current position within the stream remains unchanged. Implementations return the number of bytes read. If more than zero bytes are requested, the implementation will not complete the operation until at least one byte of data can be read (if zero bytes were requested, some implementations may similarly not complete until at least one byte is available, but no data will be consumed from the stream in such a case). Read returns 0 only if zero bytes were requested or when there is no more data in the stream and no more is expected (such as a closed socket or end of file). An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
I mean this line fills the whole array with zeros: byte[] buffer = new byte[128];
So yeah makes sense
How can I reset the position back to the start once I read 1 byte?
stream.Position = 0
, or stream.Seek(0, SeekOrigin.Begin)
, as you have been doingOk, but then I can only re-read the 1 byte that I already read and the rest of the stream is lost?
Please please listen to what I'm telling you, repeatedly. I even gave you the code to fix your problem
The end of the stream is not lost. You simply need to call
stream.Read
again (and repeatedly) until it's read all the bytes you want to read
This is not hard to understand. Please try and listen to what I'm telling youIt's just weird that without using Seek it reads every byte (except first one, because cursor starts behind). 😅
Yes. This is internal buffering behaviour within FileStream. I even gave you the link to the code which does this, and the comment saying that this is how it behaves
I've also quoted the documentation saying that Stream is allowed to behave in this way. I'm not sure what else I can do
Thank you very much. I try to visualize it for better understanding it.
canton7
I think it's entirely expected the first 1-byte read will read 1 byte into an internal cache, and then when you rewind / re-read it will first read the entire cache (all 1 byte of it), then the next read will actually read new data from file
Quoted by
<@660066004059029524> from #Read stream twice without missing data (click here)
React with ❌ to remove this embed.
What I expected: ( | = cursor)
Stream: | 1, 2, 3, 4, 5, 6, 7, 8
- Read 1 byte
Stream: 1 | 2, 3, 4, 5, 6, 7, 8
- Reset to start using Seek
Stream: | 1, 2, 3, 4, 5, 6, 7, 8
- Read the whole stream
Stream: 1, 2, 3, 4, 5, 6, 7, 8 |
What actually happens:
Stream: | 1, 2, 3, 4, 5, 6, 7, 8
Internal cache:
- Read 1 byte
Stream: | 2, 3, 4, 5, 6, 7, 8
Internal cache: 1
- Reset to start using Seek
Stream: | 1, EOF + 2, 3, 4, 5, 6, 7, 8 ( cache + stream)
- Read the whole stream
Stream: 1, EOF + | 2, 3, 4, 5, 6, 7, 8
So now I have to read it again to get the rest of it. Is this kinda accurate as a mental model?
I don't know where that internal "EOF" came from
There's no "end of stream". EOF only happens when
stream.Read
returns 0, which doesn't happen here
The best mental model is that the stream might be doing some caching internally, or it might be waiting for more data to arrive (e.g. over a pipe or network filesystem), and that therefore stream.Read
will only give you bytes that it has readily available. Only if you call stream.Read
and it doesn't have any bytes readily available will it go and look for more bytes to give youIt serves as a separator. There is no End of File (EOF), and the internal cache is not concatenated with the stream. 😅
This is the solution I came up with.
No, that is wrong
Because the second call to
ReadAsync
might only return 1 byte
You need to call ReadAsync
in a loopAs long as I only get 1 byte it works
I've already given you the code which does exactly that
It might work right now, but it's not guaranteed to work in the future
Yep
I will use your version. Thank you very much for your help.
What's so hard about this? I really don't get it. I've explained what Stream will and won't do. I've explained what you need to do. I've even shown you the code which works properly and will continue working in the future
It's just my brain. Sorry!
And you're still doing things which I've previously explained are not guaranteed to work
It just clicked in my head. Now, most things make sense! Thank you soo much! 🙏