❔ Benchmarking object size difference

I've got a small PR here to move a few bool fields into an existing "status" int. My understanding is that this would remove 16 bytes from the object size (1 byte padded to 8 bytes, twice). However when I create a small benchmark that just allocates a new Ping object, it retains the same size. Additionally, I'm not sure where the 184 bytes from the benchmark are coming from. I'm sure I'm missing something elementary, does someone know what I'm overlooking here? https://github.com/dotnet/runtime/pull/94151
GitHub
Re-use status flag inside Ping by Vannevelj · Pull Request #94151 ·...
Similar idea as #81251 and a couple of other examples described in the blog post. Instead of storing these flags in separate booleans, re-use the existing mechanism that stores them as separate val...
29 Replies
JakenVeina
JakenVeina15mo ago
your best bet, I think, is to start looking at generated IL. if we trim down the Ping class to just instance state, we get....
using System.ComponentModel;
using System.Threading;

namespace System.Net.NetworkInformation
{
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private bool _disposeRequested;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private bool _canceled;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
}
using System.ComponentModel;
using System.Threading;

namespace System.Net.NetworkInformation
{
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private bool _disposeRequested;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private bool _canceled;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
}
and after changes....
using System.ComponentModel;
using System.Threading;

namespace System.Net.NetworkInformation
{
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
}
using System.ComponentModel;
using System.Threading;

namespace System.Net.NetworkInformation
{
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
}
now, we can go plug those into sharplab.io then you can find someone to help read that, cause I have no idea how to 😆 I CAN do a diff over the two sets of output IL, and see that they're identical except for the two .field definitions my guess would be that the optimizer is already doing effectively what you're doing, and byte-packing these fields
Jeroen (Speedzor)
Jeroen (Speedzor)OP15mo ago
Do you know of any way to confirm that? I was hoping there would be some sort of tool that allows me to tell what an object's structure looks like but I haven't come across something like that yet
mtreit
mtreit15mo ago
You can check in a debugger like windbg.
JakenVeina
JakenVeina15mo ago
I think the deal with byte alignment is that whole OBJECTs are 32- or 64-bit-aligned, not fields so, in the exaple you mentioned, if HttpRequestMessage previously had 33 bytes worth of state, then it would get bumped to 40 for alignment
mtreit
mtreit15mo ago
In this case both objects are 80 bytes
Total 1 objects, 80 bytes
0:000> !dumpobj /d 27b06513348
Name: Ping
MethodTable: 00007ff873a95248
EEClass: 00007ff873a1e748
Tracked Type: false
Size: 80(0x50) bytes
File: C:\temp\Scratch\bin\Release\net7.0\Scratch.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
00007ff873a974c8 4000004 18 ...ualResetEventSlim 0 instance 0000027b06513398 _lockObject
00007ff873824718 4000005 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8737fb3d0 4000006 44 System.Boolean 1 instance 0 _disposeRequested
00007ff8739a8620 4000007 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 4000008 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fb3d0 4000009 45 System.Boolean 1 instance 0 _canceled
00007ff8737fe8d0 400000a 40 System.Int32 1 instance 0 _status
00007ff873824718 400000b 38 System.Void 0 instance 0000000000000000 PingCompleted
0:000> !dumpheap -mt 7ff873a95060
Address MT Size
027b065133c0 7ff873a95060 80

Statistics:
MT Count TotalSize Class Name
7ff873a95060 1 80 Ping2
Total 1 objects, 80 bytes
0:000> !dumpobj /d 27b065133c0
Name: Ping2
MethodTable: 00007ff873a95060
EEClass: 00007ff873a1e6d0
Tracked Type: false
Size: 80(0x50) bytes
File: C:\temp\Scratch\bin\Release\net7.0\Scratch.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
00007ff873a974c8 400000c 18 ...ualResetEventSlim 0 instance 0000027b06513410 _lockObject
00007ff873824718 400000d 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8739a8620 400000e 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 400000f 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fe8d0 4000010 40 System.Int32 1 instance 0 _status
00007ff873824718 4000011 38 System.Void 0 instance 0000000000000000 PingCompleted
Total 1 objects, 80 bytes
0:000> !dumpobj /d 27b06513348
Name: Ping
MethodTable: 00007ff873a95248
EEClass: 00007ff873a1e748
Tracked Type: false
Size: 80(0x50) bytes
File: C:\temp\Scratch\bin\Release\net7.0\Scratch.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
00007ff873a974c8 4000004 18 ...ualResetEventSlim 0 instance 0000027b06513398 _lockObject
00007ff873824718 4000005 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8737fb3d0 4000006 44 System.Boolean 1 instance 0 _disposeRequested
00007ff8739a8620 4000007 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 4000008 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fb3d0 4000009 45 System.Boolean 1 instance 0 _canceled
00007ff8737fe8d0 400000a 40 System.Int32 1 instance 0 _status
00007ff873824718 400000b 38 System.Void 0 instance 0000000000000000 PingCompleted
0:000> !dumpheap -mt 7ff873a95060
Address MT Size
027b065133c0 7ff873a95060 80

Statistics:
MT Count TotalSize Class Name
7ff873a95060 1 80 Ping2
Total 1 objects, 80 bytes
0:000> !dumpobj /d 27b065133c0
Name: Ping2
MethodTable: 00007ff873a95060
EEClass: 00007ff873a1e6d0
Tracked Type: false
Size: 80(0x50) bytes
File: C:\temp\Scratch\bin\Release\net7.0\Scratch.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
00007ff873a974c8 400000c 18 ...ualResetEventSlim 0 instance 0000027b06513410 _lockObject
00007ff873824718 400000d 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8739a8620 400000e 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 400000f 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fe8d0 4000010 40 System.Int32 1 instance 0 _status
00007ff873824718 4000011 38 System.Void 0 instance 0000000000000000 PingCompleted
You can look at the actual layout in memory here
JakenVeina
JakenVeina15mo ago
oooooooooooooh that was quick so, clearly there are fewer instance fields also weird that the addresses are out-of-order? shall we arrange that?
MT Field Offset Type VT Attr Value Name
00007ff873a974c8 4000004 18 ...ualResetEventSlim 0 instance 0000027b06513398 _lockObject
00007ff873824718 4000005 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8737fb3d0 4000006 44 System.Boolean 1 instance 0 _disposeRequested
00007ff8739a8620 4000007 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 4000008 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fb3d0 4000009 45 System.Boolean 1 instance 0 _canceled
00007ff8737fe8d0 400000a 40 System.Int32 1 instance 0 _status
00007ff873824718 400000b 38 System.Void 0 instance 0000000000000000 PingCompleted
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
MT Field Offset Type VT Attr Value Name
00007ff873a974c8 4000004 18 ...ualResetEventSlim 0 instance 0000027b06513398 _lockObject
00007ff873824718 4000005 20 System.Void 0 instance 0000000000000000 _onPingCompletedDelegate
00007ff8737fb3d0 4000006 44 System.Boolean 1 instance 0 _disposeRequested
00007ff8739a8620 4000007 28 System.Byte[] 0 instance 0000000000000000 _defaultSendBuffer
00007ff873824718 4000008 30 System.Void 0 instance 0000000000000000 _timeoutOrCancellationSource
00007ff8737fb3d0 4000009 45 System.Boolean 1 instance 0 _canceled
00007ff8737fe8d0 400000a 40 System.Int32 1 instance 0 _status
00007ff873824718 400000b 38 System.Void 0 instance 0000000000000000 PingCompleted
00007ff8736d93b8 400001c 98 System.Object 0 static 0000000000000000 s_eventDisposed
0000000000000000 400001d 8 ...ponentModel.ISite 0 instance 0000000000000000 _site
0000000000000000 400001e 10 ....EventHandlerList 0 instance 0000000000000000 _events
or am I completely misunderstanding?
mtreit
mtreit15mo ago
MT is the method table for the type
JakenVeina
JakenVeina15mo ago
sure, I was referring to the Field column is that not addresses for those field values? is there a set of docs for this?
mtreit
mtreit15mo ago
Address of the field should be object address + offset I think.
JakenVeina
JakenVeina15mo ago
curious what Field is then alright, so..... well oh, okay, the offset 98 is a static field
Jeroen (Speedzor)
Jeroen (Speedzor)OP15mo ago
int32 is 4 bytes, I think default packing is 8 bytes. It looks like the two booleans (offsets 44 and 45) are placed right after the int32 (offset 40). Is the conclusion that int32 + bool + bool + 2byte padding is the same as int32 + 4 byte padding?
JakenVeina
JakenVeina15mo ago
that's also weird alright, so if that's accurate, before.....
mtreit
mtreit15mo ago
If you want to know all the details about padding and alignment you can ask someone in #allow-unsafe-blocks As far as I recall it uses the size of the largest field to determine alignment and pads so the object aligns on a boundary of that size. But not really my area of expertise.
JakenVeina
JakenVeina15mo ago
Field Offset Size
??? 0x00 0
_site 0x08 8
_events 0x10 8
_lockObject 0x18 8
_onPingCompletedDelegate 0x20 8
_defaultSendBuffer 0x28 8
_timeoutOrCancellationSource 0x30 8
PingCompleted 0x38 8
_status 0x40 4
_disposeRequested 0x44 1
_canceled 0x45 1
PADDING 0x46 10
Field Offset Size
??? 0x00 0
_site 0x08 8
_events 0x10 8
_lockObject 0x18 8
_onPingCompletedDelegate 0x20 8
_defaultSendBuffer 0x28 8
_timeoutOrCancellationSource 0x30 8
PingCompleted 0x38 8
_status 0x40 4
_disposeRequested 0x44 1
_canceled 0x45 1
PADDING 0x46 10
yeah, I think it's what I thought... I think you're misunderstanding, thinking that it's FIELDS that are padded it's actually the whole object if we look at the after....
Field Offset Size
??? 0x00 0
_site 0x08 8
_events 0x10 8
_lockObject 0x18 8
_onPingCompletedDelegate 0x20 8
_defaultSendBuffer 0x28 8
_timeoutOrCancellationSource 0x30 8
PingCompleted 0x38 8
_status 0x40 4
PADDING 0x44 12
Field Offset Size
??? 0x00 0
_site 0x08 8
_events 0x10 8
_lockObject 0x18 8
_onPingCompletedDelegate 0x20 8
_defaultSendBuffer 0x28 8
_timeoutOrCancellationSource 0x30 8
PingCompleted 0x38 8
_status 0x40 4
PADDING 0x44 12
notice in the before version, the object has no issue referring to bool fields as single-byte values _disposeRequested and _canceled are only offset by 1 byte the object as a whole was previously allocating 80 bytes, for 70 bytes worth of fields removing 2 bytes doesn't drop it down to the next-lower padding boundary you'd have to drop a whole 6 bytes off the object to get it down to the next boundary of 64 bytes
Jeroen (Speedzor)
Jeroen (Speedzor)OP15mo ago
I was with you until you changed the padding to 10 and 12 respectively. Why isn't it 2 and 4 to reach the 48-byte mark?
JakenVeina
JakenVeina15mo ago
I'm basing that on the fact that the object reports as 80 bytes in size that would be offset 0x50 to go from 0x45 to 0x50, there must be 10 bytes of padding I could still be off here
mtreit
mtreit15mo ago
BTW if you want to know how i got that data it was:
using System;
using System.ComponentModel;
using System.Threading;
using System.Net.NetworkInformation;
using System.Diagnostics;

var a = new Ping();
var b = new Ping2();
Debugger.Break();
Console.WriteLine(a.ToString());
Console.WriteLine(b.ToString());
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private bool _disposeRequested;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private bool _canceled;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}

public partial class Ping2 : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
using System;
using System.ComponentModel;
using System.Threading;
using System.Net.NetworkInformation;
using System.Diagnostics;

var a = new Ping();
var b = new Ping2();
Debugger.Break();
Console.WriteLine(a.ToString());
Console.WriteLine(b.ToString());
public partial class Ping : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private bool _disposeRequested;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private bool _canceled;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}

public partial class Ping2 : Component
{
private readonly ManualResetEventSlim _lockObject = new ManualResetEventSlim(initialState: true); // doubles as the ability to wait on the current operation
private SendOrPostCallback? _onPingCompletedDelegate;
private byte[]? _defaultSendBuffer;
private CancellationTokenSource? _timeoutOrCancellationSource;
private int _status = 0;
public event PingCompletedEventHandler? PingCompleted;
}
Then build Release mode (dotnet build --configuration Release) Then:
windbg .\bin\Release\net7.0\Scratch.exe
windbg .\bin\Release\net7.0\Scratch.exe
Then in winbg hit g so it runs and hits the breakpoint. Then:
0:000> !dumpheap -stat -type Ping
Statistics:
MT Count TotalSize Class Name
7ff873a95248 1 80 Ping
7ff873a95060 1 80 Ping2
Total 2 objects, 160 bytes
0:000> !dumpheap -stat -type Ping
Statistics:
MT Count TotalSize Class Name
7ff873a95248 1 80 Ping
7ff873a95060 1 80 Ping2
Total 2 objects, 160 bytes
Then in windbg you can just click the addresses (they're hyperlinks) to dump out the objects for those types, and then from there you can dump out the object instances (click the address) to get the actual memory layout. Very useful for investigating this kind of thing.
JakenVeina
JakenVeina15mo ago
I was absolutely going to ask that, thank you seems like I wasn't too far off in my approach I'm surprised the private fields don't get optimized away I'm guessing that's what the console prints are for, with regard to the local variables?
mtreit
mtreit15mo ago
Yes, that's just to ensure the optimizer doesn't throw away the instances entirely. Also to ensure they don't get GC'd before the debugger break.
JakenVeina
JakenVeina15mo ago
yeah NOW I'm a little curious why Benchmark.NET is reporting 184 Bytes for these allocations not building in release?
mtreit
mtreit15mo ago
BenchmarkDotNet only runs in release mode It won't let you use debug I don't think the memory diagnoser is byte-by-byte accurate. The documentation says this:
In order to get the number of allocated bytes in cross platform way we are using GC.GetAllocatedBytesForCurrentThread which recently got exposed for netcoreapp1.1. That's why BenchmarkDotNet does not support netcoreapp1.0 from version 0.10.1.
MemoryDiagnoser is 99.5% accurate about allocated memory when using default settings or Job.ShortRun (or any longer job than it).
In order to get the number of allocated bytes in cross platform way we are using GC.GetAllocatedBytesForCurrentThread which recently got exposed for netcoreapp1.1. That's why BenchmarkDotNet does not support netcoreapp1.0 from version 0.10.1.
MemoryDiagnoser is 99.5% accurate about allocated memory when using default settings or Job.ShortRun (or any longer job than it).
So, it's supposedly 99.5% accurate. ¯\_(ツ)_/¯ https://benchmarkdotnet.org/articles/configs/diagnosers.html#:~:text=MemoryDiagnoser%20is%2099.5%25%20accurate%20about%20allocated%20memory%20when,or%20Job.ShortRun%20%28or%20any%20longer%20job%20than%20it%29.
JakenVeina
JakenVeina15mo ago
¯\_(ツ)_/¯ so long as it's consistent
Jeroen (Speedzor)
Jeroen (Speedzor)OP15mo ago
Thanks both, it's been really interesting! Will play around with windbg a bit more; I appreciate your input Circling back here: I've written up my findings at https://vannevel.net/posts/exploring-object-layouts. The missing link was the Object Header and the Method Table -- both are 8 bytes each and make up the diff with the expected 80 bytes that we were seeing.
JakenVeina
JakenVeina15mo ago
I actually accounted for 8 bytes at the beginning of the object and I would've thought the method table would be on the type definition, not repeated out per-instance oh, I gotcha a REFERENCE to the method table that makes more sense I guess that's stored at the END of an object? nevermind
mtreit
mtreit15mo ago
Pretty sure it's stored at the beginning
mtreit
mtreit15mo ago
No description
mtreit
mtreit15mo ago
Sergey Tepliakov
Developer Support
Managed object internals, Part 1. The layout - Developer Support
The layout of a managed object is pretty simple: a managed object contains instance data, a pointer to a meta-data (a.k.a. method table pointer) and a bag of internal information also known as an object header. The first time I’ve read about it,
JakenVeina
JakenVeina15mo ago
yeah, that's why I said nevermind
Accord
Accord15mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?