Text Manipulation
evening all, anyone free to help with a text manipulation issue?
so im reading a log file that is in hex, ive used bit converter to push this out put to a RTB with no issues, my next task after a looking at how the hex is structured is to break the records into individual lines (records vary in length).
here is a sample of the file entry and 1 record
there is some null data at the start that needs to be cleared out the 1st 47 characters
once that is done i need to search for "00-00-E" count back 6 chars from the 1st "0" and insert a line break "\n", ideally i need to loop this until the end of the file, then display the output into a rtb to verify its looking correct and then save it to a new txt file, so i can decode it later
2 record output
i know the theory but lack ability at present any help appreciated
32 Replies
What's RTB?
rich text box
Is there a reason you are dealing with this as text and not binary data?
was told its in hex data, which i have managed to decode some for example from record 1 the 1st :
there are some bytes in between, but these values do match
i do have to flip the yea record and year bytes
Hexadecimal is just a human-readable way to represent bytes. I feel like if I was working on this I wouldn't use the text representation at all and just work with the bytes instead.
E407 = 07E4, loose 0 and you get 7E4 which is 2020
Well you don't need to lose the zero.
so binary reader?
mtreit#6470
REPL Result: Success
Result: int
Compile: 283.802ms | Execution: 21.216ms | React with ❌ to remove this embed.
i do as on other records it wrong els elol
E4 37 for example
37E4 = 14308
some one did suggest that this could be compressed hense bytes all over the place
It looks like you are trying to adjust for big-endian / little-endian conversions?
yes i believe so, im still very new to coding so complex stuff thows me but if i dont try il never know
Is your original data file in binary or is it text?
Because manipulating byte streams as text is usually not the right way to go.
its a .log file so assumed its text but its not readble
sorry for typos i need to replace this keyboard
Not readable meaning it looks like garbage in a text editor?
Like this kind of thing?
yea massivly
Yes, so your source data is raw binary
Don't try to manipulate it as text
Process it as a stream of bytes. You need to understand the binary format that was used to write it in order to actually parse the data.
not sure how i will work that out
Where did the data come from?
its a log file for some 3rd party software on equipment we use
we used to have a decode that worked fine, but they look to have changed it recently
, im sure the format is still the same as the old output file
Ask the 3rd party for the documentation on the file format
If they changed the decoder presumably the file format also changed
we did they just came back ad said its binary and left us with that
old out put but this ws the input
as you can see there was plain text and we worked out from there was was what
Do you have this binary data as the "text" representation you showed earlier?
With those 5 records?
i have a full file yes
@mtreit pmd
I meant just that matches the 5 records you show above
Pasting the binary data as text doesn't help as it doesn't preserve the underlying bytes correctly
Anyway, I was just curious to see the actual binary data that matches the text data you showed. But in any event you really should try to get details on the actual file format that is being used (is it a regular binary serializer? Is it a completely custom encoding?) and then you should be able to decode it correctly from the byte stream.
Il speak to the company again. I cant see them being to forth coming tbh lol.
May just have to watch the software and see how it writes it
ok no luck with the 3rd party, i plan on sticking it out and continuing as i currently am
if anyone can help with this further that would be great. i am thinking about changing direction, and using memory stream instead
i now know the 1st 16 bytes are pointless and not required the next 4 bytes after that are for the record number i need to use these to identify each record and break accordingly, havent a clue where to start. lol
The problem is that there's not much anyone can tell you, you're dealing with a particular format and you need to know what the actual layout is. That usually means tracking down a spec.
hi bud, atm im not fussed about fomat, we are still working on that on the files. my current goal is simply this
break the records onto individual lines in their hex format, i know currently ive got 4 record lengths, which ties in with the old format,
this is how im looking to out put currently. consider this my end goal for now
No, I mean... that file must have some kind of format, someone wrote it according to some recipe. If you plan to get anything useful out of it, you need to know how it works. The most logical way to do that is to get whoever designed it to specify what goes where in that format.
so it's not just text manipulation - that bit is simple. Your problem is that you have a bunch of hexadecimal that represents some kind of apparently binary data.