C
C#3mo ago
glacinefrox

remove or minify variable names out from compiled exe

I dont see the reason why variable names are compiled inside the exe file data when if the target is debuggin there is the .pdb file with all the debugin information it not just makes file larger (i know that wont do such diference) but probable also slower to load (also probably no palpable diference)
137 Replies
glacinefrox
glacinefroxOP3mo ago
my main target of using .net is begin able to do complex task on windows with very small executables and trere should be a "minify" option on the compile options that trim out all the useless information out of the compiled project
glacinefrox
glacinefroxOP3mo ago
for example all this methods could be perfectly an array of pointers instead of whole class and method
No description
glacinefrox
glacinefroxOP3mo ago
also why???? are normal strings on utf16 when utf8 also can suport all the special chars and emojis of utf16 by delimiting special char indicating that the next char is utf 16 char
No description
glacinefrox
glacinefroxOP3mo ago
this are the few things that i will never understand about net I can surely say that more than 95% of strings are fully utf8 Also a very long string would be double in size
reflectronic
reflectronic3mo ago
it is completely impossible most of these names are for accessing other libraries the only way to access classes and members from other libaries is by name, because they are resolved at runtime by the assembly loader. you cannot minify any of the text in your screenshot in principle you can minify the names of your own members but they are accessible by reflection so it would break your program if it uses reflection the name of local variables are not in the dll
Jimmacle
Jimmacle3mo ago
this is a whole new class of premature optimization
reflectronic
reflectronic3mo ago
also, strings support "wrong" UTF-16 like e.g. unpaired surrogates, which cannot be represented by UTF-8. i assume this is why the user string table is not encoded that way
glacinefrox
glacinefroxOP3mo ago
I don't see that's the point cause emojis are also 16 bits and still on utf8
reflectronic
reflectronic3mo ago
no, there is text that can only be written in UTF-16
glacinefrox
glacinefroxOP3mo ago
Like? :harold:
reflectronic
reflectronic3mo ago
unpaired surrogates, like i said. surrogate pairs are a concept which exists only in UTF-16. there is a dedicated space of characters U+D800 and U+DFFF which are used in UTF-16 which, when encoded as a pair, actually represent some other character if you use one of them alone it is kind of invalid, and there is no way to encode that in UTF-8. surrogate pair characters cannot be encoded in UTF-8
glacinefrox
glacinefroxOP3mo ago
FF D0 00 There's your answer The only point of using that encoding is only if you were typing in Chinese Cause then every character will be tree bites instead of two But average people won't be constantly tipping emojis nor Chinese nor those weird chars
Jimmacle
Jimmacle3mo ago
the average chinese person would
glacinefrox
glacinefroxOP3mo ago
Ok too much would be need to change for it, maybe for net 10? Then make a compiler option to select default compiled string encoding
reflectronic
reflectronic3mo ago
the text in your screenshot will never go away and it is impossible for it to go away
glacinefrox
glacinefroxOP3mo ago
Defaulting to utf8 Sadly yes I don't think that changing soon
Jimmacle
Jimmacle3mo ago
what's the use case for this level of file size optimization?
glacinefrox
glacinefroxOP3mo ago
It's like asking what's the point of those minimal performance optimizations
reflectronic
reflectronic3mo ago
they are also low implementation cost changing the metadata format to save kilobytes in an above average case is extremely high cost for extremely low reward
Jimmacle
Jimmacle3mo ago
with modern levels of storage capacity this seems like a non-issue for all but the most resource constrained systems, and at that point C# may not be the right choice anyway most applications have assets that will have a massively larger impact on application size than the code itself
glacinefrox
glacinefroxOP3mo ago
I load then separately from internet caching it to disk
Jimmacle
Jimmacle3mo ago
they're still on disk though what do you gain by saving a few kb in string encoding?
glacinefrox
glacinefroxOP3mo ago
The problem is to accept something that can be better just changing dot runtime with no other issue. And adding a setting to expressly turn it on Or visual studio compile option
Jimmacle
Jimmacle3mo ago
a setting that will break binary compatibility with libraries that use the opposite option and cause encoding issues like reflectronic already mentioned
glacinefrox
glacinefroxOP3mo ago
It won't if it's released on newer version of a runtime that is not out yet
Jimmacle
Jimmacle3mo ago
it will if it's optional
glacinefrox
glacinefroxOP3mo ago
¯\_(ツ)_/¯ I'm saying to change it to net 9 or net 10 Also I would probably strip out variable names too cause there are basically useless
Jimmacle
Jimmacle3mo ago
if they were useless they wouldn't be there
glacinefrox
glacinefroxOP3mo ago
They could easily be memory pointers
Jimmacle
Jimmacle3mo ago
to where?
glacinefrox
glacinefroxOP3mo ago
To their values?
Jimmacle
Jimmacle3mo ago
and where would the values be stored?
glacinefrox
glacinefroxOP3mo ago
... On memory? :blobthumbsup:
Jimmacle
Jimmacle3mo ago
so you're adding the memory cost of a pointer and the string is still there
glacinefrox
glacinefroxOP3mo ago
Just in time compiler already does it To run the code or even better just a variable minifier that wouldn't need any change apart of the compiler
Jimmacle
Jimmacle3mo ago
which would break reflection
glacinefrox
glacinefroxOP3mo ago
I know that only a few would use it but No lol Is like instead of naming your function helloworkd is now a
Jimmacle
Jimmacle3mo ago
so then you have consuming code that tries to look up "helloworld" with reflection and finds nothing because you changed the name
glacinefrox
glacinefroxOP3mo ago
As C# and js has very in common ? Minify a js script with top level and see what happens to variable names and if the code stills works
Jimmacle
Jimmacle3mo ago
i don't have to do that to tell you what would happen if you were to do this in C# in the situation i'm talking about
glacinefrox
glacinefroxOP3mo ago
I see that you aren't understanding it let me try explaining more simpler Imagine that you have your code with all your variable mames and functions And when you press compile before compiling a bot renames all the variables into different ones like it was your cose but names and functions are named differently
Jimmacle
Jimmacle3mo ago
and if you have code that refers to those symbols using string literals (like reflection), what happens?
Aaron
Aaron3mo ago
hi, I'm going to be the third person to come in here and tell you the idea will not work because C# is not JS
glacinefrox
glacinefroxOP3mo ago
Both has jit
Aaron
Aaron3mo ago
and? they are not the same language
Jimmacle
Jimmacle3mo ago
what does the jit have to do with this
glacinefrox
glacinefroxOP3mo ago
The point I'm explaining applies here
Aaron
Aaron3mo ago
and JS minification usually leaves names alone
glacinefrox
glacinefroxOP3mo ago
That doesn't matter the variable names
Aaron
Aaron3mo ago
because they are accessible with strings
glacinefrox
glacinefroxOP3mo ago
Not top level
Jimmacle
Jimmacle3mo ago
last i checked the names of locals aren't actually preserved so are you talking about local variables or all symbol names?
glacinefrox
glacinefroxOP3mo ago
Cause you assuming the code will not be shared nor accessed from anywhere else
Aaron
Aaron3mo ago
they are in debug, I'm pretty sure
glacinefrox
glacinefroxOP3mo ago
Wich could perfectly be in pdb 🥰
Jimmacle
Jimmacle3mo ago
but you don't distribute debug builds
glacinefrox
glacinefroxOP3mo ago
Fine you win I'm going to sleep If I find something else that has no sense I would greatly debate about it
gerard
gerard3mo ago
and if you want no names at all, publishing it with Native AOT will resolve that right (with no stack traces etc)
Aaron
Aaron3mo ago
ah no it's in release too
MODiX
MODiX3mo ago
Aaron
sharplab.io (click here)
public class C {
public unsafe void* M() {
int a = 0;
return &a;
}
}
public class C {
public unsafe void* M() {
int a = 0;
return &a;
}
}
React with ❌ to remove this embed.
Aaron
Aaron3mo ago
you can see it there in the IL
glacinefrox
glacinefroxOP3mo ago
Yes will translate all into machine code but the point of having small runtime dependent executable would be lost
Aaron
Aaron3mo ago
you can do this stripping yourself, if you'd like
glacinefrox
glacinefroxOP3mo ago
That's what I don't like
Jimmacle
Jimmacle3mo ago
i mean, i already don't see the point of the size optimizations you're proposing
Aaron
Aaron3mo ago
just edit the resulting assembly to remove the local var names
reflectronic
reflectronic3mo ago
no, the local vairable names are not in the assembly they are only in the PDB the decompiler gets them from the PDB
glacinefrox
glacinefroxOP3mo ago
That's calling obfuscate or minify but that could do it already visual studio
Aaron
Aaron3mo ago
mmm
glacinefrox
glacinefroxOP3mo ago
...
reflectronic
reflectronic3mo ago
if there is no PDB the decompiler makes them up, and ILSpy has some heuristics to generate them
Aaron
Aaron3mo ago
I thought that LocalVariable whatever had a Name prop
Jimmacle
Jimmacle3mo ago
i've done enough decompiling to be familiar with my friends num0, num1, num2, etc :KEKW:
glacinefrox
glacinefroxOP3mo ago
Try it yourself with hxd
Aaron
Aaron3mo ago
maybe it doesn't
glacinefrox
glacinefroxOP3mo ago
And do binary search
reflectronic
reflectronic3mo ago
why don't you try it show me where your local variable names are
glacinefrox
glacinefroxOP3mo ago
Ok then probably is my framework too old
reflectronic
reflectronic3mo ago
not field names. not method names. local variables. since that is what we are talking about
gerard
gerard3mo ago
if you would remove the pdb, then reflection would break
Aaron
Aaron3mo ago
ah no, refl is right as always, LocalVariableInfo does not have a name prop and I simply misremembered incorrect
Jimmacle
Jimmacle3mo ago
doubt it, this holds even in .NET Framework
Aaron
Aaron3mo ago
it works fine
gerard
gerard3mo ago
that's why it's stored in the dll/exe
Aaron
Aaron3mo ago
reflection is entirely based upon data in the assembly
glacinefrox
glacinefroxOP3mo ago
I can remember it does
reflectronic
reflectronic3mo ago
there is no local table, it is stored as a signature
Aaron
Aaron3mo ago
right
glacinefrox
glacinefroxOP3mo ago
It's on the executable file wich is what matters
Aaron
Aaron3mo ago
it's not refl is correct that it is not
glacinefrox
glacinefroxOP3mo ago
?
Aaron
Aaron3mo ago
there is no place for that data to go in the file format
glacinefrox
glacinefroxOP3mo ago
Tomorrow I will try with diferent net versión
Jimmacle
Jimmacle3mo ago
i'm pretty sure it has worked this way for long enough that your version won't affect it if not forever
Aaron
Aaron3mo ago
the file format for .net DLLs hasn't really ever changed, but sure
reflectronic
reflectronic3mo ago
look,
class M
{
public void D()
{
int abcdefg = 100;
B(ref abcdefg);
}

public void B(ref int x) { }
}
class M
{
public void D()
{
int abcdefg = 100;
B(ref abcdefg);
}

public void B(ref int x) { }
}
prove to me that abcdefg is inside of the EXE file when you compile this file
glacinefrox
glacinefroxOP3mo ago
Probably net 6 upwards yes but net framework I don't know
reflectronic
reflectronic3mo ago
you cannot, because it's not there, but i welcome you to try
Jimmacle
Jimmacle3mo ago
for .NET Framework i do know, because i did a lot of decompiling of .NET Framework assemblies when i was game modding local names aren't in there like what was said previously, if you don't have the debug symbols the decompiler will just make up names
reflectronic
reflectronic3mo ago
i can't find it but maybe it's just a me problem
No description
Jimmacle
Jimmacle3mo ago
and do you have the pdbs?
glacinefrox
glacinefroxOP3mo ago
oh wait
reflectronic
reflectronic3mo ago
ok so when i said "it's in the PDB and the decompiler gets that from the PDB" did you not understand or not care
glacinefrox
glacinefroxOP3mo ago
ok nevermind there is pdb
glacinefrox
glacinefroxOP3mo ago
No description
glacinefrox
glacinefroxOP3mo ago
what about that
reflectronic
reflectronic3mo ago
that is a field it is a field because you used that variable inside of a lambda
glacinefrox
glacinefroxOP3mo ago
the variable name cound be perfectly a ,b ,c also the function name and if you want to debug the code or have redeable exception just get the pdb and store it all there cause thats why it exists also i still see variable names without pdb (i dont think decompiles is caching it, it would have no sense)
DaNike
DaNike3mo ago
the decompiler knows how to find the pdb its quite smart about it
glacinefrox
glacinefroxOP3mo ago
yea also when i remove it? wait i will reocmpile without pdb
reflectronic
reflectronic3mo ago
i agree that field names could be optimized further but it is not trivial https://github.com/dotnet/linker/issues/1282
GitHub
Consider stripping names of fields that don't matter · Issue #1282 ...
Now that linker has a pretty good idea what fields are accessed through reflection (and warns whenever its not sure), we could consider stripping names of fields that are not observable. A field na...
glacinefrox
glacinefroxOP3mo ago
yes there still there
No description
glacinefrox
glacinefroxOP3mo ago
i know but some cases people want it
reflectronic
reflectronic3mo ago
did you actually name them request and res
glacinefrox
glacinefroxOP3mo ago
yes
reflectronic
reflectronic3mo ago
if you change their names does the output change
DaNike
DaNike3mo ago
is this an async method?
glacinefrox
glacinefroxOP3mo ago
request no but res yes thats weird
reflectronic
reflectronic3mo ago
hm, yeah, in that case the names are stored in the fields again well. some of the names are stored
glacinefrox
glacinefroxOP3mo ago
res is declared the same as request
reflectronic
reflectronic3mo ago
if request does not change then that's because it was generated by ILSpy and it just happened to pick the name you have whether it's stores as a field depends on how you use it later in the method
glacinefrox
glacinefroxOP3mo ago
i both modify them and give them as params to other functions the only diference is that i am not calling a function within the request class
DaNike
DaNike3mo ago
is request used across an await though
glacinefrox
glacinefroxOP3mo ago
No description
glacinefrox
glacinefroxOP3mo ago
No description
DaNike
DaNike3mo ago
that looks like its captured in that lambda so again, a field
glacinefrox
glacinefroxOP3mo ago
and why them have to still have the variable name
DaNike
DaNike3mo ago
that's just how Roslyn chooses to emit the names you're free to write a tool to rename private members
glacinefrox
glacinefroxOP3mo ago
im not that expert
DaNike
DaNike3mo ago
but that will break reflection
glacinefrox
glacinefroxOP3mo ago
if i was i probably find other solution define reflection
DaNike
DaNike3mo ago
reflection
reflectronic
reflectronic3mo ago
System.Reflection
DaNike
DaNike3mo ago
there is a whole namespace with that name
glacinefrox
glacinefroxOP3mo ago
ok then it could just be used only if i dont need to acess parts from other assemblies for example a simple console app and this could be also posible with libraries but would be more complex
gerard
gerard3mo ago
Because variables outside the lambda are captured into a class, which makes a field, which gets stored: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA+ABATARgLABQhAbgIZQAEAdgK4C2wMlAvBTgNyGEY5YUAOUAJZUALgDl6jFhQAUASgrMAfBQDehCloo8AnLNoMm8zgQC+pwoJESpTBeyA==
var number = 1;

Action printNumber = () => {
Console.WriteLine(number);
};
var number = 1;

Action printNumber = () => {
Console.WriteLine(number);
};
> > >
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public int number;

internal void <<Main>$>b__0()
{
Console.WriteLine(number);
}
}
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public int number;

internal void <<Main>$>b__0()
{
Console.WriteLine(number);
}
}
Is the compiled version of the lambda. As you can see, the variable "number" is outside the lambda and gets captured
jcotton42
jcotton423mo ago
Presumably because Windows is natively utf16.
Want results from more Discord servers?
Add your server