❔ Want to try predict the ballon d'or but not really sure where to start
I would like to make a program that uses statistics to try calculate the ballon d'or winner, i want to look at past winners, specifically at stats such as trophies won that year, goals scored etc and try calculate how important each is to wining the ballon d'or, for example calculating from past winners how much winning the world cup helps your chance of wining the ballon d'or. I then want to be able to give the program a list of 30 players and have it use the model to predict which one is most likely to win the award. Do you have any suggestions or models I could use to do something like this? I want to try using a regression model such as liner regression since I have learnt about it in school but i am not sure how I could use it for this?
2 Replies
This is coming from the perspective of someone who knows nothing about the sport so I'm looking at it purely from an ML scope which may not give good results just based on the amount of data and the fact it's a popularity contest? Someone else on here might give you a better answer but this is just here to get you thinking.
Linear regression, or any regression, is a good way of modeling continuous numeric outcomes but in the case of past soccer players there are only 2 outcomes... they either won the trophy or lost it. A regression would be appropriate if in your data you have some numeric column that you're trying to predict.
Classification models would be good here since you're trying to see if your model predicts the player is a winner and how confident (probability) they are with that prediction. I'd suggest some basic classification models like KNN, logistic regression, and decision tree.
Start by doing some data collection. Going off the Wikipedia page, I'd collect the top 3 players between 1956-2021 and label first place as winners and second and third as losers. You'd also want to compile the stats of said players. One issue I see you running into is sparse data, so you need to decide what to do if there's no data about certain players during that year.
You may also have to consider oddities in years. Maybe in 1960 all the goal keepers just sucked which resulted in higher goals that year compared to every other year. Or one year all individuals were amazing and could have easily won the award if they played another year.
After that you'll just go through the motions of creating your models. (Test/Train/Validation sets, Feature/Hyperparameter selection, Creating and evaluating the models)
Honestly, you're dataset is kinda small ~180 (w/ 30 positive cases) so it'd be very easy to over/underfit. Even if you included all 30 nominees you'll run into a problem of class imbalance.
https://en.wikipedia.org/wiki/Ballon_d%27Or
Ballon d'Or
The Ballon d'Or (French pronunciation: [balɔ̃ dɔʁ] ; lit. 'Golden Ball') is an annual football award presented by French news magazine France Football since 1956. Between 2010 and 2015, in an agreement with FIFA, the award was temporarily merged with the FIFA World Player of the Year (founded in 1991) and known as the FIFA Ballon d'Or. That part...
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.