C
C#3y ago
Gibbo

Text Manipulation

evening all, anyone free to help with a text manipulation issue? so im reading a log file that is in hex, ive used bit converter to push this out put to a RTB with no issues, my next task after a looking at how the hex is structured is to break the records into individual lines (records vary in length). here is a sample of the file entry and 1 record
00-00-00-00-00-00-00-00-24-00-00-00-24-00-00-00-00-00-00-00-E4-07-09-00-02-09-16-00-11-00-00-00-1A-00-77-01-48-A9-B9-13-6E-56-03-40-D7-56-03-40-48-8D-AF-1E-E7-07-00-00-E4-17-09-DF-02-09-16-75-11-90-00-5C-1A-91-77-01-61-28-F4-18
00-00-00-00-00-00-00-00-24-00-00-00-24-00-00-00-00-00-00-00-E4-07-09-00-02-09-16-00-11-00-00-00-1A-00-77-01-48-A9-B9-13-6E-56-03-40-D7-56-03-40-48-8D-AF-1E-E7-07-00-00-E4-17-09-DF-02-09-16-75-11-90-00-5C-1A-91-77-01-61-28-F4-18
there is some null data at the start that needs to be cleared out the 1st 47 characters once that is done i need to search for "00-00-E" count back 6 chars from the 1st "0" and insert a line break "\n", ideally i need to loop this until the end of the file, then display the output into a rtb to verify its looking correct and then save it to a new txt file, so i can decode it later 2 record output
00-00-00-00-E4-07-09-00-02-09-16-00-11-00-00-00-1A-00-77-01-48-A9-B9-13-6E-56-03-40-D7-56-03-40-48-8D-AF-1E-E7-07-00-00-E4-17-09-DF-02-09-16-75-11-90-00-5C-1A-91-77-01-61-28-F4-18
01-00-00-00-E4-A7-09-96-02-00-16-79-11-E1-00-E6-22-CD-B4-00-A9-18-3D-10-F9-96-7E-89-E9-93-B4-3C-70-B1-C2-9F-ED-BD-6F-7A-5A-45-00-0B-7E-16-04-DC-3F-DB-40-F0-87-FE-81-35-DB-C4-93-DA-86-E1-8D-31-93-71-51-88-4F-91-12-73-B5-FF-99-40-3F-66-AE-06-B9-DD-75-52-79-CE-37-B5-91-77-FA-39-02-61-EE-67-15
00-00-00-00-E4-07-09-00-02-09-16-00-11-00-00-00-1A-00-77-01-48-A9-B9-13-6E-56-03-40-D7-56-03-40-48-8D-AF-1E-E7-07-00-00-E4-17-09-DF-02-09-16-75-11-90-00-5C-1A-91-77-01-61-28-F4-18
01-00-00-00-E4-A7-09-96-02-00-16-79-11-E1-00-E6-22-CD-B4-00-A9-18-3D-10-F9-96-7E-89-E9-93-B4-3C-70-B1-C2-9F-ED-BD-6F-7A-5A-45-00-0B-7E-16-04-DC-3F-DB-40-F0-87-FE-81-35-DB-C4-93-DA-86-E1-8D-31-93-71-51-88-4F-91-12-73-B5-FF-99-40-3F-66-AE-06-B9-DD-75-52-79-CE-37-B5-91-77-FA-39-02-61-EE-67-15
i know the theory but lack ability at present any help appreciated
32 Replies
mtreit
mtreit3y ago
What's RTB?
Gibbo
GibboOP3y ago
rich text box
mtreit
mtreit3y ago
Is there a reason you are dealing with this as text and not binary data?
Gibbo
GibboOP3y ago
was told its in hex data, which i have managed to decode some for example from record 1 the 1st :
Record Empty Year Month Day Hour Minute Second
00 00 00 00 E4 07 09 00 02 09 16 00 11 00 00 00 1A
Record Empty Year Month Day Hour Minute Second
00 00 00 00 E4 07 09 00 02 09 16 00 11 00 00 00 1A
there are some bytes in between, but these values do match i do have to flip the yea record and year bytes
mtreit
mtreit3y ago
Hexadecimal is just a human-readable way to represent bytes. I feel like if I was working on this I wouldn't use the text representation at all and just work with the bytes instead.
Gibbo
GibboOP3y ago
E407 = 07E4, loose 0 and you get 7E4 which is 2020
mtreit
mtreit3y ago
Well you don't need to lose the zero.
Gibbo
GibboOP3y ago
so binary reader?
MODiX
MODiX3y ago
mtreit#6470
REPL Result: Success
0x07E4
0x07E4
Result: int
2020
2020
Compile: 283.802ms | Execution: 21.216ms | React with ❌ to remove this embed.
Gibbo
GibboOP3y ago
i do as on other records it wrong els elol E4 37 for example 37E4 = 14308 some one did suggest that this could be compressed hense bytes all over the place
mtreit
mtreit3y ago
It looks like you are trying to adjust for big-endian / little-endian conversions?
Gibbo
GibboOP3y ago
yes i believe so, im still very new to coding so complex stuff thows me but if i dont try il never know
mtreit
mtreit3y ago
Is your original data file in binary or is it text? Because manipulating byte streams as text is usually not the right way to go.
Gibbo
GibboOP3y ago
its a .log file so assumed its text but its not readble sorry for typos i need to replace this keyboard
mtreit
mtreit3y ago
Not readable meaning it looks like garbage in a text editor? Like this kind of thing?
mtreit
mtreit3y ago
Gibbo
GibboOP3y ago
yea massivly
Gibbo
GibboOP3y ago
mtreit
mtreit3y ago
Yes, so your source data is raw binary Don't try to manipulate it as text Process it as a stream of bytes. You need to understand the binary format that was used to write it in order to actually parse the data.
Gibbo
GibboOP3y ago
not sure how i will work that out
mtreit
mtreit3y ago
Where did the data come from?
Gibbo
GibboOP3y ago
its a log file for some 3rd party software on equipment we use we used to have a decode that worked fine, but they look to have changed it recently , im sure the format is still the same as the old output file
mtreit
mtreit3y ago
Ask the 3rd party for the documentation on the file format If they changed the decoder presumably the file format also changed
Gibbo
GibboOP3y ago
we did they just came back ad said its binary and left us with that
1, 03/04/2017, 13:06:27, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0A04000000
2, 03/04/2017, 13:06:47, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0A04000000
3, 03/04/2017, 14:02:52, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
4, 08/04/2017, 08:02:27, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
5, 08/04/2017, 08:02:50, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
1, 03/04/2017, 13:06:27, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0A04000000
2, 03/04/2017, 13:06:47, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0A04000000
3, 03/04/2017, 14:02:52, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
4, 08/04/2017, 08:02:27, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
5, 08/04/2017, 08:02:50, RPNT-01-PcThermal, RECEIPT_PRINTER, 1, 4, , , , , 7, 0004000000
old out put but this ws the input
$ lö Ðö $ Ò Þ    "  Åm  á  
  ô‘        
 RPNT-01-PcThermal RECEIPT_PRINTER m  á  
 / ¥þ $        
 RPNT-01-PcThermal RECEIPT_PRINTER m  á     4 8k ‘          RPNT-01-PcThermal RECEIPT_PRINTER m  á      ] Ø þ          RPNT-01-PcThermal RECEIPT_PRINTER m  á     2 E k          RPNT-01-PcThermal RECEIPT_PRINTER ƒ  á      îÈ Ø   
  `
$ lö Ðö $ Ò Þ    "  Åm  á  
  ô‘        
 RPNT-01-PcThermal RECEIPT_PRINTER m  á  
 / ¥þ $        
 RPNT-01-PcThermal RECEIPT_PRINTER m  á     4 8k ‘          RPNT-01-PcThermal RECEIPT_PRINTER m  á      ] Ø þ          RPNT-01-PcThermal RECEIPT_PRINTER m  á     2 E k          RPNT-01-PcThermal RECEIPT_PRINTER ƒ  á      îÈ Ø   
  `
as you can see there was plain text and we worked out from there was was what
mtreit
mtreit3y ago
Do you have this binary data as the "text" representation you showed earlier? With those 5 records?
Gibbo
GibboOP3y ago
i have a full file yes @mtreit pmd
mtreit
mtreit3y ago
I meant just that matches the 5 records you show above Pasting the binary data as text doesn't help as it doesn't preserve the underlying bytes correctly Anyway, I was just curious to see the actual binary data that matches the text data you showed. But in any event you really should try to get details on the actual file format that is being used (is it a regular binary serializer? Is it a completely custom encoding?) and then you should be able to decode it correctly from the byte stream.
Gibbo
GibboOP3y ago
Il speak to the company again. I cant see them being to forth coming tbh lol. May just have to watch the software and see how it writes it ok no luck with the 3rd party, i plan on sticking it out and continuing as i currently am if anyone can help with this further that would be great. i am thinking about changing direction, and using memory stream instead i now know the 1st 16 bytes are pointless and not required the next 4 bytes after that are for the record number i need to use these to identify each record and break accordingly, havent a clue where to start. lol
amio
amio3y ago
The problem is that there's not much anyone can tell you, you're dealing with a particular format and you need to know what the actual layout is. That usually means tracking down a spec.
Gibbo
GibboOP3y ago
hi bud, atm im not fussed about fomat, we are still working on that on the files. my current goal is simply this break the records onto individual lines in their hex format, i know currently ive got 4 record lengths, which ties in with the old format,
Gibbo
GibboOP3y ago
this is how im looking to out put currently. consider this my end goal for now
amio
amio3y ago
No, I mean... that file must have some kind of format, someone wrote it according to some recipe. If you plan to get anything useful out of it, you need to know how it works. The most logical way to do that is to get whoever designed it to specify what goes where in that format. so it's not just text manipulation - that bit is simple. Your problem is that you have a bunch of hexadecimal that represents some kind of apparently binary data.

Did you find this page helpful?