Data Science: Hackdays Challenge – Data Science Meets Football
The painstaking process of coding (20 hours!) has finally come to an end. You press the “Enter” key, the whole team holds its breath… and it works! The algorithm accurately maps the present and predicts the future – hopefully, also precisely. Magic? Applied Data Science!
Author and Hackdays participant: Leonid Gavrilyuk
Data Analyst & Master of Science in Applied Information and Data Science Student
Challenge Owner: FC Servette
Sports Hackdays November 2021
Sport Hackdays aimed to use Data Science to the benefit of the sports industry. Master students of the Applied Information and Data Science programme came together with the representatives of the sports industry to develop state-of-the-art solutions to the current challenges.
Football Players Profiling Challenge
Selecting the right person for the right job is an important task in every field. But in team sports such as football, finding people with complementary skills is the most crucial ingredient for success. Research shows, that coaches and places cannot notice or remember more than half of the relevant actions that happened during the game (Sport Performance Analytics, 2018). With the growing application of data analytics in sports, data-driven insights can help the coach staff and players to achieve optimal results on the football field.
During the Sports Hackdays, the “Player profiling” challenge aimed to develop an algorithm for analysis of the footballers’ performance. With this knowledge, the challenge owner, the Servette Football Club from Geneva, will hopefully be able to win the Swiss Super League for the first time since 1999.
Warm-up: Data Preparation
The team consisted of FC Servette staff, FIFA certified football scouts and three Master students. First, the team prepared the data: performance statistics of 5 European Leagues and Swiss Super League data for the year 2020. The information included 48 features, such as the number of shots, penalty wins, key passes, percentage of successful dribbles, etc. The data was scraped from fbref.com – a great source for football analytics.
Starting to play around: Data Exploration
In order to identify the distribution of data, the challenge team explored and analysed the data to find the outliers, etc. Turns out, there are a couple of “problem children” – footballers who, for some reason, show very strange behaviour on the field. For example, according to the database one football player – we will not mention his name – has managed to receive two red cards in one match. It is virtually impossible, but this finding has proved – once again! – that it is extremely important to diligently prepare the data.
Advancing: Insights extraction
Once the data is prepared, it’s time to start the analysis. The scout and the Servette FC representative have defined the performance characteristics of players’ roles. Based on these characteristics, the players can be grouped into performance clusters (the k-means clustering method was used for this). For example, the Striker player role has six clusters: Scoring chance generator, Target Man, Finisher, Selfish & Risky, Dribblers, and Efficient attacking Creator. The FC Servette strikers can then be assigned to these clusters (Figure 2).
The heatmap in Figure 3 shows the “importance” of each performance variable for the given cluster on a scale from 0 to 1. For example, the “Efficient attacking creator” usually has high numbers in many characteristics such as “Carries”, “Successful crosses”, “Assists”, etc. The “Target Man”, on the contrary, shows supreme performance in fewer characteristics – for example, in “Deep progressions” and “Aerial wins”. The information can help the coaching staff in planning training routines of the players, showing the areas to focus upon for each player profile. At this point, the team focused on visualising the data guided by the belief that a Data Scientist needs to present the information in a way that can be easily understood by the target group.
Another data visualisation helps to compare the clusters in terms of their key characteristics (Figure 4). The player with the “Efficient attacking creator” profile has a higher number of OP assists than the Finisher.
Final: Delivering the result
Based on the developed algorithm, Servette FC can scout and acquire players with complementary skills. For example, if the performance data shows that the European Leagues are won by the teams where strikers match all six afore-mentioned profiles – then the Servette FC coaching staff should either train its footballers accordingly or look for players with respective skills. At the end of the Hackdays, the Challenge team presented this solution to the jury and the fellow Hackdays participants. Although the group did not win, it was a great experience nevertheless. Coming together with a group of people with different backgrounds, and trying to solve a problem in a very limited time, is a valuable opportunity to grow and to push your skills to another level.
Homework: Improvement plans
Naturally, in the limited time available, the team did not manage to realise all plans. Some ideas are yet to be implemented. For example, only one clustering method – k-means clustering – was tested during the Hackdays. It would be interesting to see the results of other algorithms. Another important improvement is to integrate information about changes in the team strategy and coaching approach. This will distinguish between the player’s individual behaviour and the strategic directive he receives from the coach. Likewise, incorporating game event data allows one to observe under which circumstances the player performed in a certain way: maybe it was caused by the style of the competing team?
The challenge aimed to assist Servette FC in defining the profiles of their players. In the future, the improved algorithm will allow the football club to have an overview of their athletes performance, plan its scouting activities and better develop the players potential based on the target profile. As a Data Scientist, it is important to have an opportunity to apply your technical skills in a new field and to test different opportunities of applying the algorithms. In this situation, the domain knowledge – that is, awareness of the field you analyse through machine learning – cannot be underestimated. This is why Data Science can be seen not only as a separate job. It is also a set of skills that can be acquired by people with different backgrounds for the benefit of their domain, be it sport or medicine.
Sport Performance Analytics. 2018. The Role of a Performance Analyst in Sports. Retrieved from https://www.sportperformanceanalysis.com/article/what-is-a-performance-analyst-in-sport
DATA IS THE RESOURCE OF THE 21ST CENTURY!
REGISTER & JOIN US FOR A FREE ONLINE INFORMATION EVENT:
Monday, 16 October 2023, online, English
Monday, 6 November 2023, online, German
Friday, 1 December 2023, online, English