C
C#3y ago
Surihia

✅ Finding the position of a string in a binary file

I have a binary file and in it I want to search for a specific string which I will pass via a textbox. if that string exists in the binary file, then I want to get the byte position of the first letter from the string in that file.
22 Replies
mtreit
mtreit3y ago
The exact answer might depend on the encoding of the string. If it's a simple ASCII string just scan through the file until you find the start character of your search string, mark that as a possible answer, then just check if the next n bytes match the rest of the string?
Surihia
SurihiaOP3y ago
I did try reading the whole file with a stream reader with readToEnd, stored that in a variable and then used indexof method on the variable to get the position. the issue is that the position value is sometimes off by three or four bytes.
mtreit
mtreit3y ago
Well, you said it's a "binary" file. I assume that means it contains bytes that are not valid text. StreamReader is intended to to read text files. I don't think it will work properly with arbitrary binary data. You can use a FileStream to read binary files. (StreamReader, for the record, is a terrible name for something that only reads text.)
Surihia
SurihiaOP3y ago
how do I read it with a filestream ? yup its that type of file.
mtreit
mtreit3y ago
If you are ok with just reading the entire file into memory a simpler option is to just call File.ReadAllBytes()
Surihia
SurihiaOP3y ago
its like a small file so it shouldn't matter much how do I get the string position then ?
mtreit
mtreit3y ago
Well, you would have to implement the algorithm I originally described...
Surihia
SurihiaOP3y ago
this one ?
mtreit
mtreit3y ago
Yes
Surihia
SurihiaOP3y ago
Do I copy the data from the file to a buffer and then check the buffer for the string ?
mtreit
mtreit3y ago
That is a straight-forward way, probably I would do that as a first attempt. This code might have bugs as I just dashed it off very quickly, but it would look something like this:
using System;
using System.IO;
using System.Text;

var file = args[0];
var bytes = File.ReadAllBytes(file);
var targetString = "needle";
int startIndex = -1;
var buffer = new byte[targetString.Length];

for (int i = 0; i < bytes.Length; i++)
{
if ((char)(bytes[i]) == targetString[0])
{
// Potential match!

// Get the bytes that match the length of the string.
Buffer.BlockCopy(bytes, i, buffer, 0, buffer.Length);
var str = Encoding.ASCII.GetString(buffer, 0, buffer.Length);

if (str == targetString)
{
startIndex = i;
break;
}
}
}

if (startIndex < 0)
{
Console.WriteLine("Not found.");
}
else
{
Console.WriteLine($"Found {targetString} at offset {startIndex}");
}
using System;
using System.IO;
using System.Text;

var file = args[0];
var bytes = File.ReadAllBytes(file);
var targetString = "needle";
int startIndex = -1;
var buffer = new byte[targetString.Length];

for (int i = 0; i < bytes.Length; i++)
{
if ((char)(bytes[i]) == targetString[0])
{
// Potential match!

// Get the bytes that match the length of the string.
Buffer.BlockCopy(bytes, i, buffer, 0, buffer.Length);
var str = Encoding.ASCII.GetString(buffer, 0, buffer.Length);

if (str == targetString)
{
startIndex = i;
break;
}
}
}

if (startIndex < 0)
{
Console.WriteLine("Not found.");
}
else
{
Console.WriteLine($"Found {targetString} at offset {startIndex}");
}
If you have to deal with UTF-16 or other encodings you would have to do things slightly differently.
Surihia
SurihiaOP3y ago
why is the start index -1 ? this works
mtreit
mtreit3y ago
-1 is used as a sentinel value to indicate we didn't find it.
Surihia
SurihiaOP3y ago
so its kinda like assigning a empty variable ? like string name = "";
mtreit
mtreit3y ago
You can think of it like that. For local variables you have to assign them some kind of value before you use them. So the code wouldn't compile if we didn't assign some value to it.
Surihia
SurihiaOP3y ago
can I assign it to 0 ? I usually set it to 0 for empty int variables
mtreit
mtreit3y ago
If the thing you are searching for is the first thing in the file then you won't be able to distinguish between that and "not found".
Surihia
SurihiaOP3y ago
its always somewhere in the middle if it was at the start I wouldn't require doing any of this
mtreit
mtreit3y ago
In your particular case maybe, but better to make code like this more general so you could potentially re-use it for other scenarios. The pattern I used is a pretty common one in C#. For instance, if you call string.IndexOf it will return -1 if the string you are searching for is not found.
MODiX
MODiX3y ago
mtreit#6470
REPL Result: Success
var s = "abc";
Console.WriteLine(s.IndexOf("f"));
var s = "abc";
Console.WriteLine(s.IndexOf("f"));
Console Output
-1
-1
Compile: 481.003ms | Execution: 28.098ms | React with ❌ to remove this embed.
Surihia
SurihiaOP3y ago
oh I remember this this was in a tutorial I was following way back.
Accord
Accord3y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity. Closed!

Did you find this page helpful?