LINQ: Duplicate key after GroupBy
Hello,
I have the following problem: I get an ArgumentException in the following code:
https://paste.mod.gg/jmzxdwctfzpw/0
The exception in the client system is: System.ArgumentException - An item with the same key has already been added. I don't understand how this could happen, because I
m already grouping the values by the userid and this is also the key for the dictionary. So "GroupBy" creates multiple groups with the "same key"?
I don't have access to the client system, so I
m looking for potential causes for the exception. I am working with EF Core.
Thanks for helping.BlazeBin - jmzxdwctfzpw
A tool for sharing your source code with the world!
10 Replies
Presumably, the database and the dictionary have different ideas on what makes two keys the same
Is
Userid
a string? What does it contain? Is it user-supplied text?Hey. Yes, UserId is a string...the database is MSSQL. I already tried to set up a query that produces two groups with the "same" key. But if I just change the case (e.g. "michael" and "MICHAEL" for the userid) it just creates one group. Empty spaces are also getting ignored in a normal query with GROUP BY userid.
But if I just change the case (e.g. "michael" and "MICHAEL" for the userid) it just creates one groupDo you mean, if you have two users in your table, one with a
Userid
or "michael" and one with "MICHAEL", that GroupBy
creates a single group containing both?Yes, both "michael" and "MICHAEL" where recognized as the same group on the database level and when using the IEnumerable GroupBy (which uses the OrdinalIgnoreCase-Comparer on default).
But the IEnumerable-GroupBy does not ignore whitespaces and 'michael ' and 'michael' where recognized as different groups, but not different keys for the dictionary. One working solution is to trim the userid. Another one would be to group on the sql server (which irgnores the case and trailing whitespaces (but not leading ones))...but I will chose to just trim the Userid in the groupby, because I dont know how different databases behave.
Thanks for pointing me in the right direction. 🙂
Your GroupBy should be being performed in the database, as the query hasn't been materialised at that point?
Yeah...I though so too. That why I first checked if I could create a query with the "same" keys...I looked at the sql query with the sql profiler and the .Select(...) does already materialize the query.
var userGroupDict = rechteDbModel.Um_Gruppenzuweis
.AsNoTracking()
.Select(gz => new { gz.Userid, gz.Um_Usergr_Id })
.GroupBy(gz => gz.Userid.Trim())
.ToDictionary(g => g.Key, g => g.Select(gz => gz.Um_Usergr_Id).ToList());
result in the following query:
SELECT [t].[Key], [t].[Userid], [t].[Um_Usergr_Id]
FROM (
SELECT [u].[USERID] AS [Userid], [u].[UM_USERGR_ID] AS [Um_Usergr_Id], LTRIM(RTRIM([u].[USERID])) AS [Key]
FROM [dbo].[UM_GRUPPENZUWEIS] AS [u]
) AS [t]
ORDER BY [t].[Key]
And without the trim, this query is performed:
SELECT [u].[USERID], [u].[UM_USERGR_ID] AS [Um_Usergr_Id]
FROM [dbo].[UM_GRUPPENZUWEIS] AS [u]
ORDER BY [u].[USERID]
Okay, I now though a little bit about the problem and want to update it:
The problem is the following: The group by of EF works like this: Select from the database and order by the group key. After that, the grouping is done in memory: EF expects the values to be properly ordered. Now comes the interesting part: the collation of the table. If this is a collation that ignores whitespaces (in my case it is 'Latin1_General_CI_AS'), the row with the whitespaces could be somewhere in the middle in the result set. (screenshot, note the ORDER BY) And when the row with the whitespace is hitted, a new group is created and after that "poisoned" row a new group with the original value is created...resulting in a duplicate key in the ToDictionary().
There is a UseCollation in the EF modelbuilder where you can specify the collation. But I didn't had any luck with changing it. I get the same error - but when I change it at the database level everything works fine.
Ah interesting. That makes sense as a problem description.
(Although I've no idea how to solve it I'm afraid!)
Yeah, I needed some time to figure this out. I had three options to solve it:
1. Change the collasion in the data base schema to something that doesent ignores whitespaces (e.g. SQL_Latin1_General_Pref_CP1_CI_AS)
2. The MS way would be something like this: (https://learn.microsoft.com/de-de/ef/core/miscellaneous/collations-and-case-sensitivity)
var userGroupDict = rechteDbModel.Um_Gruppenzuweis
.AsNoTracking()
.Select(gz => new
{
CollatedUserid = EF.Functions.Collate(gz.Userid, "SQL_Latin1_General_Pref_CP1_CI_AS"),
gz.Um_Usergr_Id
})
.GroupBy(x => x.CollatedUserid)
...
3. In my case I just materialized before the group by and used the group by of IEnumerable.
Sortierungen und Groß-/Kleinschreibung – EF Core
Konfigurieren von Sortierungen und Groß-/Kleinschreibung in der Datenbank und für Abfragen mit Entity Framework Core (EF Core)
Trimming the group key is dangerours...