C
C#6d ago
Mr. Roach

Microsoft DataFrame example not working

Hello there, I'm trying to understand Microsoft DataFrames for a c# project where I need to sum data of prices of items that have the same names inside a csv(sum costs for each entry of "apple", "banana" etc). In my python version I used pandas for that and pivoted after dropping not needed columns to achieve what I wanted with few lines of code. But now I'm stuck already trying to follow the examples provided by Microsoft for DataFrames. I tried to copy the code mentioned in the "Combine Data Sources" but I'm getting error that column "id" wouldn't exist. Does someone know how good the Microsoft Website is for getting into DataFrames or is there a better place or solution to achive what I want?
var ids = new List<Single>() { 1, 2, 3, 4, 5, 6 };
var bedrooms = new List<Single>() { 1, 2, 3, 2, 3, 1 };

var idColumn = new SingleDataFrameColumn("Id", ids);
var bedroomColumn = new SingleDataFrameColumn("BedroomNumber", bedrooms);
var dataFrame2 = new DataFrame(idColumn, bedroomColumn);

dataFrame = dataFrame.Merge(dataFrame2, new string[] { "Id" }, new string[] { "Id" });

DataFrame.SaveCsv(dataFrame, "path\\result.csv", ',');
var ids = new List<Single>() { 1, 2, 3, 4, 5, 6 };
var bedrooms = new List<Single>() { 1, 2, 3, 2, 3, 1 };

var idColumn = new SingleDataFrameColumn("Id", ids);
var bedroomColumn = new SingleDataFrameColumn("BedroomNumber", bedrooms);
var dataFrame2 = new DataFrame(idColumn, bedroomColumn);

dataFrame = dataFrame.Merge(dataFrame2, new string[] { "Id" }, new string[] { "Id" });

DataFrame.SaveCsv(dataFrame, "path\\result.csv", ',');
https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/getting-started-dataframe
Getting started with DataFrames - ML.NET
Learn how to use DataFrame to manipulate and prepare data.
12 Replies
Angius
Angius6d ago
Any particular reason you need to use machine learning for that, instead of a simple LINQ query?
Mr. Roach
Mr. Roach6d ago
Thanks for a quick response. On the Microsoft page it states that DataFrames can also be used for data manipulation and my google search when I looked for a panda equivalent pointed me towards that, so I thought that was alright too since I used panda DataFrames in Python aswell... I also stumbled upon LINQ as well and tried a bit but that didn't worked out as well, I sadly already deleted my LINQ tries so I can't provide that right now. I think I tried to use a group by statement using the itemName column of the csv but the code threw exception. The csv looks like this, just bigger and with more items and entries:
itemString,itemName,stackSize,quantity,price,otherPlayer,player,time,source
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
itemString,itemName,stackSize,quantity,price,otherPlayer,player,time,source
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
As I said, on panda i just made a dataFrame, dropped all rows besides itemString, itemName, quantity and price, made an pivot table with itemString, itemName as index and used sum operation for the quantity and price column, that was fairly simple for me as a beginner. But i dont ensist on having to use Microsoft DataFrame if a LINQ solution is easier.
MODiX
MODiX6d ago
Angius
REPL Result: Failure
var csv = """
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
""";
var itemPrices = csv.Split('\n')
.Select(r => r.Split(','))
.Select(r => (name: r[1], price: r[4]))
.GroupBy(r => r.name)
.Select(r => (name: r.Key, price: r.Sum()));

itemPrices
var csv = """
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
""";
var itemPrices = csv.Split('\n')
.Select(r => r.Split(','))
.Select(r => (name: r[1], price: r[4]))
.GroupBy(r => r.name)
.Select(r => (name: r.Key, price: r.Sum()));

itemPrices
Exception: CompilationErrorException
- 'int' does not contain a definition for 'Sum' and the best extension method overload 'Enumerable.Sum(IEnumerable<int>)' requires a receiver of type 'System.Collections.Generic.IEnumerable<int>'
- 'int' does not contain a definition for 'Sum' and the best extension method overload 'Enumerable.Sum(IEnumerable<int>)' requires a receiver of type 'System.Collections.Generic.IEnumerable<int>'
Compile: 608.568ms | Execution: 0.000ms | React with ❌ to remove this embed.
Angius
Angius6d ago
That's what I get for writing code on Discord lol
MODiX
MODiX6d ago
Angius
REPL Result: Success
using System.Linq;

var csv = """
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
""";
var prices = csv.Split('\n', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.Select(r => r.Split(','))
.Select(r => (name: r[1], price: int.Parse(r[4])))
.GroupBy(r => r.name)
.Select(g => (name: g.Key, price: g.Sum(i => i.price)));

prices
using System.Linq;

var csv = """
i:2835,Stone,16,16,100,player1,player2,1717498177,Auction
i:2835,Stone,9,9,374,player1,player2,1717499796,Auction
i:53010,Cloth,1,2,13536,player1,player2,1717511825,Auction
i:53010,Cloth,1,4,13536,player1,player2,1717512297,Auction
i:57194,Potion,1,1,349997,player1,player2,1717524867,Auction
i:57194,Potion,1,4,364646,player1,player2,1717524871,Auction
""";
var prices = csv.Split('\n', StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.Select(r => r.Split(','))
.Select(r => (name: r[1], price: int.Parse(r[4])))
.GroupBy(r => r.name)
.Select(g => (name: g.Key, price: g.Sum(i => i.price)));

prices
Result: List<ValueTuple<string, int>>
[
{
"item1": "Stone",
"item2": 474
},
{
"item1": "Cloth",
"item2": 27072
},
{
"item1": "Potion",
"item2": 714643
}
]
[
{
"item1": "Stone",
"item2": 474
},
{
"item1": "Cloth",
"item2": 27072
},
{
"item1": "Potion",
"item2": 714643
}
]
Compile: 625.373ms | Execution: 77.272ms | React with ❌ to remove this embed.
Angius
Angius6d ago
'Ere Could probably make it shorter with .Aggregate()
Mr. Roach
Mr. Roach6d ago
oh my god that's working and I tried it yesterdays for hours. I thank you so much, now i need to see what's different here from what i tried and proper understand it. Appreciated!
Angius
Angius6d ago
If you have questions about anything in that query, feel free to ask
Mr. Roach
Mr. Roach6d ago
Ah okay if you offer that... so as far as i think to understand, we make a "selection" variable r that get's filled with the content of each row splitted by the seperator ",", so it basically becomes an array i guess. we then make a new selection of r, selecting only the 2nd [1] and the 5th [4] column and name them name and price, we also make sure that the price colmun gets parsed as an integer to make the sum method work. Then we group by the name column of the r select variable and make a new selection variable that gets called g, get's g value automatically filled by what ever GroupBy spit's out before? I also dont understand why we write "name: g.Key", how do i know whats .key? I hover it in visual studio and read "gets the key of the IGrouping<out TKey, out TElement> but it confuses instead of enlighten me. I think IGrouping is a type of object that gets created by the GroupBy statement? And <out TKey, out TElement> correspond to the name and price column. If i remember correctly, keys of tables/their entries are used i.E when we want to join tables. I also don't understand why we need to write g.Sum(i => i.price) instead of just g.Sum(price) or g.Sum(int.Parse(r[4])) like we did before in the select statement. What i think to understand too is that => is lambda and is used inside a linq query variable if we execute the statement later and not immediately. But maybe i missunderstand, I just try to remember stuff i read online when i tried some stuff yesterday Also why do we need to make a new select variable "g" in the last select?
Angius
Angius6d ago
selecting only the 2nd [1] and the 5th [4] column and name them name and price
We turn them into a named ValueTuple, to be more precise
get's g value automatically filled by what ever GroupBy spit's out before?
.GroupBy() spits out a grouping. Basically, a
new[]{ 1, 2, 3, 4, 5 }.GroupBy(x => x % 2 == 0 ? "even" : "odd");
new[]{ 1, 2, 3, 4, 5 }.GroupBy(x => x % 2 == 0 ? "even" : "odd");
would result in something like
[
{
Key: "even",
Values: [ 2, 4 ]
},
{
Key: "odd",
Values: [ 1, 3, 5 ]
}
]
[
{
Key: "even",
Values: [ 2, 4 ]
},
{
Key: "odd",
Values: [ 1, 3, 5 ]
}
]
with Values being available straight from that grouping thanks to it implementing IEnumerable interface. Key will be what you group on, in this case "even" or "odd". Values will be the items that fit that group
I also don't understand why we need to write g.Sum(i => i.price) instead of just g.Sum(price)
Because g at this point is an IEnumerable<(string name, int price)> and it's just the prices we need to sum Just price means nothing here. Sum() takes a lambda whose parameter is the consecutive item of the IEnumerable you're summing up
or g.Sum(int.Parse(r[4]))
r does not exist here
Also why do we need to make a new select variable "g" in the last select?
Otherwise you'd get an IEnumerable<IGrouping<string, (string name, int price)> We want just an IEnumerable<(string name, int price)> where price is the total
Mr. Roach
Mr. Roach6d ago
just to be clear, "we want just IEnumerable<[...]" or "just IGrouping<[...]"? because visual studio doesnt say g is IEnumerable, but IGrouping. So does the first Select(r => work with the return value provided by csv.Split? The second Select(r works with the returned value of the first Select(r the Group by works with that returned value and so on? Again a big thank you that you further explain that to me. I wonder how I would learn that stuff the best if I dont have somebody that can help/explain to me? Reading microsofts documentation for everything?
No description
Angius
Angius6d ago
g is an IGrouping, yes We select just the key and the sum of prices from it, into a tuple
var prices = csv.Split('\n')
// split each line on a comma, returns `IEnumerable<string[]>
.Select(r => r.Split(','))
// take each `string[]` and creates a tuple from it, returns `IEnumerable<(string name, int price)>
.Select(r => (name: r[1], price: int.Parse(r[4])))
// groups the `Ienumerable` of tuples into an `IEnumerable` of groupings
.GroupBy(r => r.name)
// Takes that `IEnumerable` of groupings, and based on each grouping creates a tuple
.Select(g => (name: g.Key, price: g.Sum(i => i.price)));
var prices = csv.Split('\n')
// split each line on a comma, returns `IEnumerable<string[]>
.Select(r => r.Split(','))
// take each `string[]` and creates a tuple from it, returns `IEnumerable<(string name, int price)>
.Select(r => (name: r[1], price: int.Parse(r[4])))
// groups the `Ienumerable` of tuples into an `IEnumerable` of groupings
.GroupBy(r => r.name)
// Takes that `IEnumerable` of groupings, and based on each grouping creates a tuple
.Select(g => (name: g.Key, price: g.Sum(i => i.price)));
how I would learn that stuff the best
Microsoft LINQ documentation: https://learn.microsoft.com/en-us/dotnet/csharp/linq/ Though it favours the less-used SQL-like query syntax for it