Data Science: Hackdays Challenge – Data Science Meets Football

Data Science: Hackdays Challenge – Data Science Meets Football

The painstaking process of coding (20 hours!) has finally come to an end. You press the “Enter” key, the whole team holds its breath… and it works! The algorithm accurately maps the present and predicts the future – hopefully, also precisely. Magic? Applied Data Science!

FC Servette Data Science Challenge – Table of Contents:
Introduction | Data | Development | Conclusion || –> Info-Events | Programme-Information | Contact | Professional Data Science Portraits

Leonid Gavrilyuk - Data Analyst and HSLU Applied Data Science Student

Author and Hackdays participant: Leonid Gavrilyuk
Data Analyst & Master of Science in Applied Information and Data Science Student

Challenge Owner: FC Servette

Sports Hackdays November 2021

Sport Hackdays aimed to use Data Science to the benefit of the sports industry. Master students of the Applied Information and Data Science programme came together with the representatives of the sports industry to develop state-of-the-art solutions to the current challenges.

Football Players Profiling Challenge

Selecting the right person for the right job is an important task in every field. But in team sports such as football, finding people with complementary skills is the most crucial ingredient for success. Research shows, that coaches and places cannot notice or remember more than half of the relevant actions that happened during the game (Sport Performance Analytics, 2018). With the growing application of data analytics in sports, data-driven insights can help the coach staff and players to achieve optimal results on the football field.

During the Sports Hackdays, the “Player profiling” challenge aimed to develop an algorithm for analysis of the footballers’ performance. With this knowledge, the challenge owner, the Servette Football Club from Geneva, will hopefully be able to win the Swiss Super League for the first time since 1999.

 

Warm-up: Data Preparation

The team consisted of FC Servette staff, FIFA certified football scouts and three Master students. First, the team prepared the data: performance statistics of 5 European Leagues and Swiss Super League data for the year 2020. The information included 48  features, such as the number of shots, penalty wins, key passes, percentage of successful dribbles, etc. The data was scraped from fbref.com – a great source for football analytics.

Starting to play around: Data Exploration

In order to identify the distribution of data, the challenge team explored and analysed the data to find the outliers, etc. Turns out, there are a couple of “problem children” – footballers who, for some reason, show very strange behaviour on the field. For example, according to the database one football player – we will not mention his name – has managed to receive two red cards in one match. It is virtually impossible, but this finding has proved – once again! –  that it is extremely important to diligently prepare the data.

Applied Data Science_HSLU_Hackdays_Data Exploration
Data exploration is always a thorough and time-consuming process (Figure 1)

Advancing: Insights extraction

Once the data is prepared, it’s time to start the analysis. The scout and the Servette FC representative have defined the performance characteristics of players’ roles. Based on these characteristics, the players can be grouped into performance clusters (the k-means clustering method was used for this). For example, the Striker player role has six clusters: Scoring chance generator, Target Man, Finisher, Selfish & Risky, Dribblers, and Efficient attacking Creator. The FC Servette strikers can then be assigned to these clusters (Figure 2).

Applied Data Science_HSLU_Hackdays_Mapping Servette Players
Mapping the Servette players to the Striker clusters (Figure 2)

Conclusion

The heatmap in Figure 3 shows the “importance” of each performance variable for the given cluster on a scale from 0 to 1. For example, the “Efficient attacking creator” usually has high numbers in many characteristics such as “Carries”, “Successful crosses”, “Assists”, etc. The “Target Man”, on the contrary, shows supreme performance in fewer characteristics – for example, in “Deep progressions” and “Aerial wins”. The information can help the coaching staff in planning training routines of the players, showing the areas to focus upon for each player profile. At this point, the team focused on visualising the data guided by the belief that a Data Scientist needs to present the information in a way that can be easily understood by the target group.

Applied Data Science_Hackdays_HSLU_Heatmap Variables
The heatmap of variables importance for each cluster (0-1) (Figure 3)

 

Another data visualisation helps to compare the clusters in terms of their key characteristics (Figure 4). The player with the “Efficient attacking creator” profile has a higher number of OP assists than the Finisher.

Applied Data Science_Hackdays_HSLU_Comparison
Comparison of the Finisher and Efficient attacking Creator profiles (Figure 4)

Final: Delivering the result

Based on the developed algorithm, Servette FC can scout and acquire players with complementary skills. For example, if the performance data shows that the European Leagues are won by the teams where strikers match all six afore-mentioned profiles – then the Servette FC coaching staff should either train its footballers accordingly or look for players with respective skills. At the end of the Hackdays, the Challenge team presented this solution to the jury and the fellow Hackdays participants. Although the group did not win, it was a great experience nevertheless. Coming together with a group of people with different backgrounds, and trying to solve a problem in a very limited time, is a valuable opportunity to grow and to push your skills to another level.

Homework: Improvement plans

Naturally, in the limited time available, the team did not manage to realise all plans. Some ideas are yet to be implemented. For example, only one clustering method – k-means clustering – was tested during the Hackdays. It would be interesting to see the results of other algorithms. Another important improvement is to integrate information about changes in the team strategy and coaching approach. This will distinguish between the player’s individual behaviour and the strategic directive he receives from the coach. Likewise, incorporating game event data allows one to observe under which circumstances the player performed in a certain way: maybe it was caused by the style of the competing team?

Conclusion

The challenge aimed to assist Servette FC in defining the profiles of their players. In the future, the improved algorithm will allow the football club to have an overview of their athletes performance, plan its scouting activities and better develop the players potential based on the target profile. As a Data Scientist, it is important to have an opportunity to apply your technical skills in a new field and to test different opportunities of applying the algorithms. In this situation, the domain knowledge – that is, awareness of the field you analyse through machine learning – cannot be underestimated. This is why Data Science can be seen not only as a separate job. It is also a set of skills that can be acquired by people with different backgrounds for the benefit of their domain, be it sport or medicine.

References

Sport Performance Analytics. 2018. The Role of a Performance Analyst in Sports. Retrieved from https://www.sportperformanceanalysis.com/article/what-is-a-performance-analyst-in-sport 

Many thanks to the author Leonid Gavrilyuk an the challenge owner FC Servette for this very interesting article and for your commitment!

DATA IS THE RESOURCE OF THE 21ST CENTURY!
REGISTER & JOIN US FOR A FREE ONLINE INFORMATION EVENT:
Monday, 13 May 2024, online, English
Monday, 10 June 2024, online, German
Monday, 12 August 2024, online, English

PROGRAMME INFO: MSc in Applied Information and Data Science
MORE FIELD REPORTS & EXPERIENCES: Professional portraits & study insights
FREQUENTLY ASKED QUESTIONS: FAQ

Home -> Applied Data Science blog
Home -> Applied Data Science programme HSLU

fh-zentralschweiz