C
C#•10mo ago
Tony Wang

C# IEnumerable ToArray() Benchmark

I was benchmarking different kinds of IEnumerable Methods out of curiosity and found something very strange: Calling ToArray() on a temporary array is relatively slow. I am aware, that array.ToArray() creates a copy of array, but I would have thought that the compiler was smart enough to ommit the copy if array is an temporary object. Am I missing something?
No description
No description
39 Replies
Jimmacle
Jimmacle•10mo ago
no, ToArray() explicitly makes a copy of the collection as an array
Tony Wang
Tony WangOP•10mo ago
So is the compiler really that bad even with Release Configuration? Is there any possibility to enable more Optimization?
Jimmacle
Jimmacle•10mo ago
what do you mean bad? you're expecting it to do something it shouldn't do
Tony Wang
Tony WangOP•10mo ago
The Same code in C++ generates two identical assemblies..
Jimmacle
Jimmacle•10mo ago
it's not the same code because it's a completely different language
Tony Wang
Tony WangOP•10mo ago
Just for completness. This is the function that is called inbetween. As you can clearly seen, a temporary object is created that is never accessed again
No description
Tony Wang
Tony WangOP•10mo ago
Ok my bad, I meant equivalent code
maxmahem
maxmahem•10mo ago
From the compilers point of view, how is it supposed to know that callers of ConversionTest2 are not expecting a new copy?
Tony Wang
Tony WangOP•10mo ago
It is indeed expecting a new copy! But it gets a new copy either way since ConvertToPointEnumerable() already returns a copy
Jimmacle
Jimmacle•10mo ago
you're expecting more optimization from compiling to IL than makes sense i think reflectronic is around so i'll wait for him to chime in :when:
maxmahem
maxmahem•10mo ago
how does it know that ConvertToPointEnumerable returns a copy? Like this is the sort of optimization the JIT might be able to make if everything gets inlined, but there isn't any gurantee that it happens from an outside PoV you asked it to make two copies, so it shouldn't be surprising that it makes two copies.
Jimmacle
Jimmacle•10mo ago
the simple solution here is "don't make unnecessary copies by calling ToArray/List/etc"
Tony Wang
Tony WangOP•10mo ago
I agree, its not trivial, but a C++ compiler would have no problem to see that the object is temporary and omit the copy. On the other hand C++ compilers far slower, so I guess its a reasonable tradeoff
maxmahem
maxmahem•10mo ago
the C++ compiler couldn't make this optimization unless the methods likewise got inlined.
Tony Wang
Tony WangOP•10mo ago
Yes, but it would just inline it, since it sees the static method
Jimmacle
Jimmacle•10mo ago
you could ask #allow-unsafe-blocks , i don't know enough about the JIT internals to be that useful other than to tell you not to write inefficient code and hope the compiler fixes it 😛
Tony Wang
Tony WangOP•10mo ago
Yeah... In this case it's easy, but sometimes doing things in Place is a little bit more complicated. I would have hoped that the Compiler does more heavy lifting, but I guess I have to do it myself... So there is one thing I just found out: the compiler can omit copies in some cases. Chaining multiple ToArray() statements makes the code very slow. array.ToArray().ToArray().ToArray().ToArray() is much slower than array.ToArray() but interestingly ToImmutableArray() doesnt get slower no matter how often you chain it. The first call does copy the data, but the chained calls wont.
Jimmacle
Jimmacle•10mo ago
because it checks if it's an immutable array first
No description
Jimmacle
Jimmacle•10mo ago
it makes sense to omit copies for an immutable array because it can't be modified
Tony Wang
Tony WangOP•10mo ago
Yes, just like it makes sense to omit a copy for a temporary object... But one is easier to check at compile time than the other
Jimmacle
Jimmacle•10mo ago
none of this is being done at compile time in either case that's a runtime check in the code i shared
Tony Wang
Tony WangOP•10mo ago
Good point, ill see if the JIT compiler removes the check
Metasyntactic
Metasyntactic•10mo ago
The just will not remove the check. It would have to know somehow that the array isn't being held onto anywhere (aliased) The there could be impls that, for example, stored the last few arrays into static variables somewhere. And this difference would be observable.
maxmahem
maxmahem•10mo ago
I was thinking about this. But in this case, if say specifically all the ToArray calls were inlined, shoudn't it be able to see that the temporary copies never escape the scope and omit them?
Metasyntactic
Metasyntactic•10mo ago
No. Because it would have to know how the array type itself worked.
Jimmacle
Jimmacle•10mo ago
i was gonna say, arrays aren't just a chunk of memory like C style arays are
maxmahem
maxmahem•10mo ago
well... specifically for the special case of array here, couldn't it? This isn't any arbitrary object...
Jimmacle
Jimmacle•10mo ago
there is no special case, this method takes an IEnumerable<T>
maxmahem
maxmahem•10mo ago
ah yeah didn't consider that
Jimmacle
Jimmacle•10mo ago
it doesn't know it's an array at all
Metasyntactic
Metasyntactic•10mo ago
It's not just ToArray, it's everything involved. It would need to know that there was no aliasingg at all and that this was a fresh copy itself, and that the copy being requested was exactly the same, (including variance) etc. The runtime would need to cheaply be able to track aliasing somehow.
Tony Wang
Tony WangOP•10mo ago
It would be great if functions could specify that they return an unaliased object, than the JIT would know
maxmahem
maxmahem•10mo ago
well my thinking was if everything got inlined to essentially...
var t1 = new T[og.Length];
Array.Copy(ogArray, temp1, og.Length);
var t2 = new T[t1.Length];
Array.Copy(t1, t2, t1.Length);
var t3 = new T[t2.Length];
Array.Copy(t2, t3, t2.Length);
var t1 = new T[og.Length];
Array.Copy(ogArray, temp1, og.Length);
var t2 = new T[t1.Length];
Array.Copy(t1, t2, t1.Length);
var t3 = new T[t2.Length];
Array.Copy(t2, t3, t2.Length);
And so on. Those could eluded. But I guess ToArray taking an IEnumerable prevents that.
Jimmacle
Jimmacle•10mo ago
the implementation specializes for IIListProvider<T> and ICollection<T>, this hits the latter afaik which calls the collection's CopyTo method
maxmahem
maxmahem•10mo ago
yeah Array's obviously implement ICollection so I guess in theory something like this should be in-lineable.
Tony Wang
Tony WangOP•10mo ago
I dont quite understand this, can you elaborate?
maxmahem
maxmahem•10mo ago
so... like Jimmacle pointed out, ToArray operates on an enumerable. What if my implementation of the enumerable did something like...
bool MoveNext() {
IncrementACounterInSomeOtherObject();
counter++;
return counter < dataCount;
}
bool MoveNext() {
IncrementACounterInSomeOtherObject();
counter++;
return counter < dataCount;
}
that is, every time the object is enumerated, some other object gets modified.
Tony Wang
Tony WangOP•10mo ago
In general this would be quite hard, but aren't there cases where this would be easy? Like for some return objects? For simple functions, its quite easy to guarantee that you return an unaliased object and this could be inferred automatically for simple cases like my example. For cases where you create an object using another method, you would need to know that the other method creates an unaliased object and so on. That wouldn't be cheap to track. But couldn't you manually provide a keyword at compile time just like const functions in c++? That.... looks scary. I wish there was a way to guarantee no side effects in C# like in C++
Metasyntactic
Metasyntactic•10mo ago
it's not easy, as yo uhave to know precisely how every operation works. you can't make any assumptions her.e remember that you might run on any runtime, with any impl of any type that doesn't whatever it wants.

Did you find this page helpful?