Hexdump
Repo: https://github.com/KeithBrown39423/Hexdump
My hexdump program that I started like 2 years ago. I'm now starting to add a lot of new features with some help from @earth's bird
GitHub
GitHub - KeithBrown39423/Hexdump: The alternative cross platfrom he...
The alternative cross platfrom hex dumping utility - GitHub - KeithBrown39423/Hexdump: The alternative cross platfrom hex dumping utility
192 Replies
Currently making a PR adding more testing to optimize_test.py
Making it a cli
Example of verbose logging
GitHub
Release Version 1.2.0 · KeithBrown39423/Hexdump
What's Changed
Speed Optimization
Drastically improved the speed of Hexdump. See the graph attached for a comparison.
Logging (#20)
Hexdump now has built-in logging and will log the following i...
Also, just a little comparison of the execution times
Yep, looks very good, I can actually read it now
😲
Yeah, now I can work on this https://github.com/KeithBrown39423/Hexdump/issues/10
Wow that's cool!
The craziest part was it was such small changes that made such a big difference
What changes for example?
I list them out in the morning, it's 1am rn and I'm tired as hell
haha good night 😂
Colored output is one thing
Converting something to hex too?
I forget :)
One of the biggest ones was i was reading the entire file to a buffer, then writing the output to a stringstream and then writing that string stream to either stdout or a file stream
Now, it will read the file 16 bytes at a time, convert it to the proper display format, and then output it directly to the output stream whether that be a file stream or stdout
There was also a function that took every byte as an integer, then created a string stream, passed it through std::hex and then into the string stream, then return the string from that stream.
This was done for every single byte, so if I file was 1MB (1048576 bytes) the function was run over a million times
i was also adding the ascii color escape squence after every single ascii character as opposed to just once per line, and that also slowed down the speed quiite a bit
Ah nice, very interesting
And as expected it's always IO that is the most expensive
yeah, very true
mainly its the terminal displaying it all that takes a while
on average, file output takes around 1/3 the time of terminal output
I've added the different display types (octal, decimal, etc.)
I'll add a screenshot once I get back to my computer
Here's a video with the features
Im not sure if cxx-ops allows this, but separate the commands in the help menu, into groups
Like for example what I did with optimize_test.py
1: I assume you mean cxxopts,
2: I can, give me a sec and I'll show you what that looks like
offsets have been added
the header also adapts based on the offset
same for ascii
done
Very nice 👏
Much more readable now in my opinion
Yea, I had to mess with the cxxopta code a tad bit
Coming back to hexdump, after taking a break for a couple days, I now forgot what was ever wrong and why I hadn't made a release yet.
An update, Release v2.0 with be remade and written in zig as opposed to c or c++
The main reason behind this is zig has amazing stdout write speed
i have a working hexdump now in 22 sloc
slightly longer, but now over 6 times faster than the original hexdump
(still needs some modification)
* where as the old hex dump would take 5 minutes on about a 512MB file where the zig version takes under a minute
1,048,576 bytes in 58 seconds as opposed to around 400
(6.5 minutes)
Note that
write
should be replaced with writeAll
as the usize is being discarded anyway
The exception is this line _ = try stdout.write("\n");
, where write
should be writeChar
Dev branch of Hexdump is now a clean slate with a more up-to-date readme.mdGitHub
GitHub - KeithBrown39423/Hexdump at dev
The alternative cross platfrom hex dumping utility - GitHub - KeithBrown39423/Hexdump at dev
Here are those changes
Also make sure to capture and display the error is a more useful way on this line
var file = try std.fs.cwd().openFile("test.zig", .{});
Maybe use a catch |err|
and not a try
@earth's bird should I set a max file size (i.e. return if file is two big in order to not spend three hours)?
V1.2 had this feature with any file larger than 4 GiB
Sure, but send a warning out, something like: "This file would take aprox ... hours to complete. If you'd like to run anyways use --run-away"
Crappy example but you get the point
Yeah, that actually makes sense
What should I set a max to?
Well wait till you can run tests and see the projected time for a file of n size to take
I would say anything that over 10 minutes hexdump should have a warning that the projected time is over 10 minutes
or something like that
But the max unless
--run-anyway
or something is passed should be an hour
Those aren't concrete numbers but should work to get you startedWorking on hexdump on the road :)
I'm more doing logical work than writing actual code
Oh ok good, that would be hell
@earth's bird I have the following options, but don't know what short opts to define them to
Here is what I currently have.
(Ignore the numbers)
Well short flags aren't required
Did you mean arent?
yes
I want to have a short opt for all of them though, I just don't know what to assign them to
Uh idk but what you're doing now is ugly
I know
The one bye octal and one byte decimal idk what to do woth
Byte*
maybe just
The problem there is -o and -h are already on ise
Shit, I have to change two byte octal now
Let me do long opts first and I'll come back
oh true
x for hex
and for octal I have no clue
Here is my list of options I have right now
a for ascii, s for skip, n for length, h for help, and v for version are the only ones I can think of that are for sure
Maybe
l
or L
for disable-colorOr maybe just a long option
Maybe I could just have the format options long only
You don't need a short for everything
True
Well short flags aren't required
I think I'll just have to go with that
Also note you can do something like
--color=true
or color=false
That seems a bit redundant because color is true by default
Yeah I was just using it as an example
Although it would allow for enabling outputting VT100 codes to a file
Then disable color when outputting
Well yea, if outputting to a file, vt100 is disabeled
Well yea, if outputting to a file, vt100 is disabeled
By default
Although I doubt you need an output flag
It seems useless
Output for outputting to file
>
?
hexdump somefile.txt > file.txt
vs
hexdump somefile.txt -o file.txt
it just creates more work for you and removes the letter o
for octalWasn't it your suggestion to add -o in the first place?
Not as an output flag
Yea, when I first started creating it
Oh then whoops
Now I've changed my mind
less work for you just letting the user use a pipe to direct stdout to a file
Having an output flag in this case is just weird as the content the want anyway is in stdout
Not exactly, if they pipe it, they have to remember to do --color=false
Then detect if stdout is being piped to a file
Rather maybe have something like
True is always color, false is never color, and auto decides based of where stdout is going to
a TTY or a file
For security purposes, can you even detect if it's been piped?
Yes
I mean things like
tee
can do it
So I'm sure you can
Nothing insecure about knowing where stdout is going toUnix & Linux Stack Exchange
How does a program know if stdout is connected to a terminal or a p...
I'm having trouble debugging a segfaulting program because the ouput right before the segfault is what I need, but this is lost if I'm piping the output to a file. According to this answer: https:/...
Yeah I knew you could
It's not anything about security because you only need to know what type stdout is connect to: in your case a TTY or a file
Hexdump shows up one google images
Yeah it's what shows up for the topic hexdump
A versatile cross-platform hex-dumping alternative
@earth's birdVs
The alternative cross-platform hex dumping utility
The alternative cross-platform hex dumping utility.
The alternative cross-platform hexdump utility
Can now check if stdout is tty or not
Also here is the list of options
If output is a tty, color is disabled and only text is written to stdout
This is per @earth's bird's suggestion, rather than having a
--output
flag, just detect whether stdout is piped
There are three possible conditions to disable color. Either stdout is not tty, tty does not support ansi escape codes, or --color=false
is supplied
which actually brings the question, @earth's bird, should i replace --color with --disable-color ?
You can override the first two. If output isn't a tty, doing --color=true will not override it and since --color is true by default, you would never need to call itYes, but I also suggest maybe having a
--force-color
too. So if someone wants the ansi in the final output no matter what. Even if it's not a tty, and even if it doesn't support the codes. For example you can actually display cat with color by doing echo -e $(cat test.txt)
Nice to see you implemented it, how is this done in zig? Just curious, I know how it's done with C so I doubt it's much differentits not much different although i think its a tad bit more verbose in zig
Oh, well no different then using termios by the looks of it
Just Zig doesn't make your code platform dependent for using it this way
Well it makes sense, I wouldn't even really call this verbose it just makes sense
i think in c you have to do isatty with a file descriptor
which would be something like
isatty(x)
Yeah
which zig is more verbose than the c version
well
getStdOut().isTty()
is better than isatty(1)
imoor this
ISATTY(FILENO(stdout))
it just looks so much better and more readable to someone you might not be familiar with zigYeah
or low level in general
so should i have a --disable-color and --force-color or just --color=true (i.e. --force*) and --color=false (i.e. --disable*)
the second would be easier because i don't have to figure out whether to prioritze disable and force if both are supplied
I prefer
--disable-color
and --force-color
as to me it's more descriptive then --color=true
and --color=false
okay, which one should take priority
force color
okay
I love ugly if statements
I know have a working hexdump
just need to make it pretty
and add ascii support as well as clean up the othert formats
So far they all work except for onebyte char and the two byte ones (those have a unique case for handling)
I also need to work on speed
whats slow?
and how slow compared to the original hexdump
1.5x slower than the linux hexdump
Approximately
Eek
What happened there?
...
Idk
Outputting a 1MB file takes 1.56 seconds
Whereas coreutils hexdump takes 1.03
And that's with optimize release fast, I think
Yeah
Thats with realize fast
I think thats because I'm running like 2 million switch statements though
Woah!? Why?
I mean faster than using an if statement at least
Formatting each and every byte
I wonder how you'd go about optimizing that
I have to handle each byte separately
Why is the C impl. faster? Doesn't it also format every byte
Doing the format switch before print everything and then just formatting each byte by a variable instead of a switch for each byte
Because it doesn't have support for multiple formats
Ah
I'm not too sure I'm following, but I think I have an idea of what you mean
When I get home from school, I can show you
You just mean you're doing the same computation over and over for every byte when you really only need to do it once?
Kind of
I couldn't figure out how yo pass a format into stdout print via variable and then pass an argument into that format
I'll need code to understand this lol
what time do you get home from school?
Probably around 4:30 (ur time)
I have to head to town to see about them fixing my laptop and how much it will cost (I won't be dropping it off today though)
@earth's bird
im not home put i got the chance to pull out my laptop
putting this here for how to handle odd numbered two byte counts
to be fair, my hexdump does have a couple extra characters too (extra spaces, an extra offset half-byte)
I'm currently running into issues with handling the offset for the maximum file size
I'm now debating doing it the way core-util hexdump does
if you didnt see the jump, all it does is add another digit and shift everything over one character
I'm now debating doing it the way core-util hexdump doesSeems like the best idea
@earth's bird removing the catch and doing try does not make it quicker
im going to rework my algorithm for how i read and print data
I told you I assumed it wouldn't
im going to look into have a strict algorithm and order to the entire program to clean up some minor inconsistensys
What's a "strict algorithm". Didn't you mention the speed is about the same and on averge slightly lower than the original hexdump? Maybe thats for a reason? Maybe it doesn't get much faster?
Well it averages 10 milliseconds faster, but I'm going to be adding some things that might slow it down so I'm trying to speed it up now while I'm still in early dev stages of this version
I have decided to modularize handling different formats. I have also made the decision to post-pone two byte formats until I have a stable release, and then will add on later.
The way I am doing this will also allow for custom formatting add-ons (the only cost being you have to recompile it yourself)
So after mine and @earth's bird argument last night, he mentioned an amazing way of doing the formats. The formats are handling by passing the byte to a function and the function is picked beforehand based on the format. I have also completed handling the offset and length options so you can skip the first x bytes and read only y bytes
As a result of rewriting it, and doing it in zig, the program is not 33% faster than the current release and around 250-500 ms faster than core-utils hexdump
I can in about 10 minutes
Alroght
Panic on file not found?
tempoary
You can still easily print out to stderr
Ok
i havent reworjed the whole error yet
i just panick for any error
?
i dont handle options yet...
Oh right, whoops
you need to change the filename variable in the main function
for some reason i decided it was smart to do an absolute path
Does it work with relative paths?
i dont know
it should
I'll find out then
It does
GitHub
GitHub - sharkdp/hyperfine: A command-line benchmarking tool
A command-line benchmarking tool. Contribute to sharkdp/hyperfine development by creating an account on GitHub.
Something you might want to keep note of
using hyperfine, I have done some benchmarking and found a slightly quicker way (albeit more verbose) was have formatting text
When benchmarking, I found the biggest impact of the speed and that is formatting. When adding formatting, the execution speed increases by around 170%
After analizing zig source code, the format functions are quite optimized, just intensive
especially when running several million times
(these benchmarks were done on a 8 MB file with a complete drop of all file caches)
Here is an analysis of the speed
Eek, can you send graphs seperately too
I'm having trouble seeing the data
Or export it some other way
It seems like you may have just taken a screenshot?
that was a pdf to image
Ah
Could you add a comparson for the c++ hexdump too?
i meant to do that and forgot
Silly bird
@earth's bird
zig is all time superior
I knew that was the case, I just wanted to have them graphed out
per @earth's bird's suggestion, I will be deploying hexdump to package managers upon release if v2
I don't quite know how exactly I will do this in linux just yet but I week figure that out
I have opened 3 new issues. 2 feature requests and 1 bug
I know :(
I didn't look at it to closely, I was looking at it on my phone and I saw a lot of cxxopts.h
Ah
Version 2.0.0 has been released
https://github.com/KeithBrown39423/Hexdump/releases/tag/v2.0.0
GitHub
Release Release v2.0.0 · KeithBrown39423/Hexdump
What's Changed
Just some small README changes by @ZackeryRSmith in #24
Minor README changes by @ZackeryRSmith in #25
change the note formatting by @ZackeryRSmith in #26
Fix build href by @Zack...
@earth's bird here is the graph
Looks good
@earth's penguin did you also make sure to compile the binary with optimization flags?
Also take a little look at https://github.com/KeithBrown39423/Hexdump/issues/35#issuecomment-2155432228
GitHub
[Bug] Silent Fail Windows · Issue #35 · KeithBrown39423/Hexdump
Describe the bug When running the Windows binary, if you specify a filename, it silently fails with the error code -1073741819. It also takes around 15-25 seconds before it actually fails. When run...
Thankfully your issue seems to be right in front of your eyes after you fix the silent failing problem.
At least your issue is pretty tracable and should lead to the solution
I will look into it more later if you feel too lazy to do so (which tbh I feel)
also the graph you sent me would be nice to see in the README or release page
I may also get around to improving that README as it's missing a lot of selling point of this Hexdump clone and it's a bit messy rn
Also your graph is lacking units of time which is quite annoying as someone trying to look at the graph
I like this graph quite a bit better
yes. its compiled with release fast
GitHub
Release Release v2.1.0 · KeithBrown39423/Hexdump
What's Changed
stop treating u64 as usize by @ZackeryRSmith in #40
🐛 Fix ascii print when last line is less than 8 bytes by @PauloCampana in #41
Add --squeeze option
Speed optimization to null...
I love making progress
I love when the issues I made don't end up on the todo list :(
this to-do list is older than the repo itself
i havent even added to it in a while
i probably should add those on here thoguh
You mean you will add those on there
Well you don't have to
no, i might
Honestly little things
Gotta love a maintainer who doesn't care about issues made on the project :(
:)
I'll just sob
I want to install hexdump on all my devices, so get some stuff working!
you can install it
you just have to do so manually
Well I do have it installed on my Windows, MacOS, and Linux machines
But it would be nice to have it on some package managers
yea, ill add it to package managers, i just dk when
It shouldn't be all that hard really
At least for brew I know it's not
Hexdump is now getting yet again another rewrite
I have a function that detects whether or not the NO_COLOR or CLIFORCE_COLOR env var is set. Should I also include program options (--no-color and --force-color) or force users to use env vars?
Forcing the use of environment variables is never a good idea. Most, if not everyone, hates it.
There's little a time where it's more valuable then just passing a flag