Help me understand the GC in depth
I'm interested in the inner workings of the CLR GC but the source code looks very complicated to me and uses a lot of jargon I'm not familiar with. I've read the "surface level" design doc (https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/garbage-collection.md) but it doesn't contain the information I'm looking for. I'm particularly interested in the data structures it would use to keep track of objects during the compaction phase. Here are some questions I'm trying to find the answer to (along with answers that I tried to guess myself but could be completely wrong) :
- how does it know the type of an object at a pointer ? is it present in some metadata header before each object in memory or does it know based on where it got the pointer from (aka, it knows the type of the object at the root so it knows the types of the fields, and then it would know the types of those fields, etc...) idk how that second approach would fit into the whole polymorphism thing though, so my guess is it's the first option. If it does use a header for each object, how big is it ? If I allocate a class containing one byte of data, is it going to add hundreds of bytes of metadata to it ?
- how does it know how to iterate through all objects on the heap ? Again, one thing it could do would be to start at the roots and recursively explore the fields in those objects, but I have a feeling it's not the most efficient approach. My other idea would be that every object is prefixed with metadata (in some sort of header again) that would give its size so that it can start at the beginning of the heap and append
- when compacting it needs to keep track of where each pointer will end up at after compaction, so that it can then correct those in every field of every object. Does it just use a hashmap from pointer to pointer for that ? What if I have a huge amount of very small objects on the heap ? Wouldn't that hashmap use more memory than my objects ?
GitHub
runtime/docs/design/coreclr/botr/garbage-collection.md at main · do...
.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps. - dotnet/runtime
6 Replies
For my last point, I'm trying to think of an algorithm that uses a hashmap with a maximum size, so that it compacts a certain amount of objects at a time but it would involve copying each object multiple times and I've read that it's considered to be really inefficient
Other question :
- where does the GC take memory from, if it needs to allocate memory for hashmaps and other things like that ? Is there a guarantee / upper bound on the amount of memory it would use ?
I've had lots of trouble finding any kind of information online past "it's generational" or "it's compacting and corrects references"
For this, you'll find best results reading the GC code https://github.com/dotnet/runtime/tree/main/src/coreclr/gc or opening a discussion on the dotnet/runtime repo
I tried looking at the code but it's full of acronyms and jargon I don't understand (and to be honest, I'm not that great at reading other people's code yet haha)
I should probably open a discussion there, I planned on doing that after asking here if no one could answer
https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/type-system.md probably has more information you care about
I see some stuff here that could be interesting, I'll read it
Thank you !