ABp16
Help me understand the GC in depth
I'm interested in the inner workings of the CLR GC but the source code looks very complicated to me and uses a lot of jargon I'm not familiar with. I've read the "surface level" design doc (https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/garbage-collection.md) but it doesn't contain the information I'm looking for. I'm particularly interested in the data structures it would use to keep track of objects during the compaction phase. Here are some questions I'm trying to find the answer to (along with answers that I tried to guess myself but could be completely wrong) :
- how does it know the type of an object at a pointer ? is it present in some metadata header before each object in memory or does it know based on where it got the pointer from (aka, it knows the type of the object at the root so it knows the types of the fields, and then it would know the types of those fields, etc...) idk how that second approach would fit into the whole polymorphism thing though, so my guess is it's the first option. If it does use a header for each object, how big is it ? If I allocate a class containing one byte of data, is it going to add hundreds of bytes of metadata to it ?
- how does it know how to iterate through all objects on the heap ? Again, one thing it could do would be to start at the roots and recursively explore the fields in those objects, but I have a feeling it's not the most efficient approach. My other idea would be that every object is prefixed with metadata (in some sort of header again) that would give its size so that it can start at the beginning of the heap and append
- when compacting it needs to keep track of where each pointer will end up at after compaction, so that it can then correct those in every field of every object. Does it just use a hashmap from pointer to pointer for that ? What if I have a huge amount of very small objects on the heap ? Wouldn't that hashmap use more memory than my objects ?
10 replies