remove or minify variable names out from compiled exe
I dont see the reason why variable names are compiled inside the exe file data when if the target is debuggin there is the .pdb file with all the debugin information it not just makes file larger (i know that wont do such diference) but probable also slower to load (also probably no palpable diference)
137 Replies
my main target of using .net is begin able to do complex task on windows with very small executables and trere should be a "minify" option on the compile options
that trim out all the useless information out of the compiled project
for example all this methods could be perfectly an array of pointers instead of whole class and method
also why???? are normal strings on utf16 when utf8 also can suport all the special chars and emojis of utf16 by delimiting special char indicating that the next char is utf 16 char
this are the few things that i will never understand about net
I can surely say that more than 95% of strings are fully utf8
Also a very long string would be double in size
it is completely impossible
most of these names are for accessing other libraries
the only way to access classes and members from other libaries is by name, because they are resolved at runtime by the assembly loader. you cannot minify any of the text in your screenshot
in principle you can minify the names of your own members but they are accessible by reflection so it would break your program if it uses reflection
the name of local variables are not in the dll
this is a whole new class of premature optimization
also, strings support "wrong" UTF-16 like e.g. unpaired surrogates, which cannot be represented by UTF-8. i assume this is why the user string table is not encoded that way
I don't see that's the point cause emojis are also 16 bits and still on utf8
no, there is text that can only be written in UTF-16
Like?
:harold:
unpaired surrogates, like i said. surrogate pairs are a concept which exists only in UTF-16. there is a dedicated space of characters U+D800 and U+DFFF which are used in UTF-16 which, when encoded as a pair, actually represent some other character
if you use one of them alone it is kind of invalid, and there is no way to encode that in UTF-8. surrogate pair characters cannot be encoded in UTF-8
FF D0 00
There's your answer
The only point of using that encoding is only if you were typing in Chinese
Cause then every character will be tree bites instead of two
But average people won't be constantly tipping emojis nor Chinese nor those weird chars
the average chinese person would
Ok too much would be need to change for it, maybe for net 10?
Then make a compiler option to select default compiled string encoding
the text in your screenshot will never go away and it is impossible for it to go away
Defaulting to utf8
Sadly yes I don't think that changing soon
what's the use case for this level of file size optimization?
It's like asking what's the point of those minimal performance optimizations
they are also low implementation cost
changing the metadata format to save kilobytes in an above average case is extremely high cost for extremely low reward
with modern levels of storage capacity this seems like a non-issue for all but the most resource constrained systems, and at that point C# may not be the right choice anyway
most applications have assets that will have a massively larger impact on application size than the code itself
I load then separately from internet caching it to disk
they're still on disk though
what do you gain by saving a few kb in string encoding?
The problem is to accept something that can be better just changing dot runtime with no other issue.
And adding a setting to expressly turn it on
Or visual studio compile option
a setting that will break binary compatibility with libraries that use the opposite option
and cause encoding issues like reflectronic already mentioned
It won't if it's released on newer version of a runtime that is not out yet
it will if it's optional
¯\_(ツ)_/¯
I'm saying to change it to net 9 or net 10
Also I would probably strip out variable names too cause there are basically useless
if they were useless they wouldn't be there
They could easily be memory pointers
to where?
To their values?
and where would the values be stored?
...
On memory? :blobthumbsup:
so you're adding the memory cost of a pointer and the string is still there
Just in time compiler already does it
To run the code
or even better just a variable minifier that wouldn't need any change apart of the compiler
which would break reflection
I know that only a few would use it but
No lol
Is like instead of naming your function helloworkd is now a
so then you have consuming code that tries to look up "helloworld" with reflection and finds nothing because you changed the name
As C# and js has very in common
?
Minify a js script with top level and see what happens to variable names and if the code stills works
i don't have to do that to tell you what would happen if you were to do this in C# in the situation i'm talking about
I see that you aren't understanding it let me try explaining more simpler
Imagine that you have your code with all your variable mames and functions
And when you press compile before compiling a bot renames all the variables into different ones like it was your cose but names and functions are named differently
and if you have code that refers to those symbols using string literals (like reflection), what happens?
hi, I'm going to be the third person to come in here and tell you the idea will not work because C# is not JS
Both has jit
and?
they are not the same language
what does the jit have to do with this
The point I'm explaining applies here
and JS minification usually leaves names alone
That doesn't matter the variable names
because they are accessible with strings
Not top level
last i checked the names of locals aren't actually preserved
so are you talking about local variables or all symbol names?
Cause you assuming the code will not be shared nor accessed from anywhere else
they are in debug, I'm pretty sure
Wich could perfectly be in pdb 🥰
but you don't distribute debug builds
Fine you win
I'm going to sleep
If I find something else that has no sense I would greatly debate about it
and if you want no names at all, publishing it with Native AOT will resolve that right (with no stack traces etc)
ah no it's in release too
you can see it there in the IL
Yes will translate all into machine code but the point of having small runtime dependent executable would be lost
you can do this stripping yourself, if you'd like
That's what I don't like
i mean, i already don't see the point of the size optimizations you're proposing
just edit the resulting assembly to remove the local var names
no, the local vairable names are not in the assembly
they are only in the PDB
the decompiler gets them from the PDB
That's calling obfuscate or minify but that could do it already visual studio
mmm
...
if there is no PDB the decompiler makes them up, and ILSpy has some heuristics to generate them
I thought that LocalVariable whatever had a Name prop
i've done enough decompiling to be familiar with my friends num0, num1, num2, etc :KEKW:
Try it yourself with hxd
maybe it doesn't
And do binary search
why don't you try it
show me where your local variable names are
Ok then probably is my framework too old
not field names. not method names. local variables. since that is what we are talking about
if you would remove the pdb, then reflection would break
ah no, refl is right as always, LocalVariableInfo does not have a name prop and I simply misremembered
incorrect
doubt it, this holds even in .NET Framework
it works fine
that's why it's stored in the dll/exe
reflection is entirely based upon data in the assembly
I can remember it does
there is no local table, it is stored as a signature
right
It's on the executable file wich is what matters
it's not
refl is correct that it is not
?
there is no place for that data to go in the file format
Tomorrow I will try with diferent net versión
i'm pretty sure it has worked this way for long enough that your version won't affect it
if not forever
the file format for .net DLLs hasn't really ever changed, but sure
look, prove to me that
abcdefg
is inside of the EXE file when you compile this fileProbably net 6 upwards yes but net framework I don't know
you cannot, because it's not there, but i welcome you to try
for .NET Framework i do know, because i did a lot of decompiling of .NET Framework assemblies when i was game modding
local names aren't in there
like what was said previously, if you don't have the debug symbols the decompiler will just make up names
i can't find it but maybe it's just a me problem
and do you have the pdbs?
oh wait
ok so when i said "it's in the PDB and the decompiler gets that from the PDB" did you not understand or not care
ok nevermind
there is pdb
what about that
that is a field
it is a field because you used that variable inside of a lambda
the variable name cound be perfectly a ,b ,c
also the function name
and if you want to debug the code or have redeable exception just get the pdb and store it all there cause thats why it exists
also i still see variable names without pdb (i dont think decompiles is caching it, it would have no sense)
the decompiler knows how to find the pdb
its quite smart about it
yea also when i remove it?
wait i will reocmpile without pdb
i agree that field names could be optimized further but it is not trivial https://github.com/dotnet/linker/issues/1282
GitHub
Consider stripping names of fields that don't matter · Issue #1282 ...
Now that linker has a pretty good idea what fields are accessed through reflection (and warns whenever its not sure), we could consider stripping names of fields that are not observable. A field na...
yes there still there
i know but some cases people want it
did you actually name them
request
and res
yes
if you change their names does the output change
is this an async method?
request no but res yes
thats weird
hm, yeah, in that case the names are stored in the fields again
well. some of the names are stored
res is declared the same as request
if
request
does not change then that's because it was generated by ILSpy and it just happened to pick the name you have
whether it's stores as a field depends on how you use it later in the methodi both modify them and give them as params to other functions
the only diference is that i am not calling a function within the request class
is request used across an await though
that looks like its captured in that lambda
so again, a field
and why them have to still have the variable name
that's just how Roslyn chooses to emit the names
you're free to write a tool to rename private members
im not that expert
but that will break reflection
if i was i probably find other solution
define reflection
reflection
System.Reflection
there is a whole namespace with that name
ok then it could just be used only if i dont need to acess parts from other assemblies
for example a simple console app
and this could be also posible with libraries
but would be more complex
Because variables outside the lambda are captured into a class, which makes a field, which gets stored:
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA+ABATARgLABQhAbgIZQAEAdgK4C2wMlAvBTgNyGEY5YUAOUAJZUALgDl6jFhQAUASgrMAfBQDehCloo8AnLNoMm8zgQC+pwoJESpTBeyA==
> > >
Is the compiled version of the lambda.
As you can see, the variable "number" is outside the lambda and gets captured
Presumably because Windows is natively utf16.