People are overlooked for a variety of biased reasons and perceived flaws: age, appearance, personality. Of the twenty thousand notable players for us to consider, I believe that there’s a championship team of twenty five people that we can afford. Because everyone else in baseball under values them – like an island of misfit toys.” (Moneyball movie, 2011)

This is how Peter Brand in the movie “Moneyball” explains the reasoning behind using data analytics while scouting players rather than using traditional methods.

For years, scouts would make predictions about players using subjective analysis and would hope for more right decisions than wrong ones. But these decisions are, in most cases, biased and subject to luck. Nowadays, data analysts have become an integral part of many sport teams that use data as a competitive advantage. In order to have more data driven decisions, data analysts who are able to communicate the excerpted statistics and extract hidden insights from the data are crucial. Otherwise, data is useless. 

The idea of this blog is to give you some basic insights into how data analytics can help in picking the winning team while playing Fantasy Football. This game is slowly shifting from just an entertainment to a profitable business. There are a lot of successful startups out there which created advanced predictive tools by using the complex data manipulation and machine learning algorithms which can help users achieve better results. In the following text we will briefly walk you through the steps that you need to perform before creating your own predictive data model and who knows, maybe win the competition. 

First things first, what is Fantasy Football?

The Fantasy premier League (FPL) is the official fantasy football browser game for the English Premier League, with over 7.5 million users actively competing against each other every season aiming to accumulate the highest number of points. The idea behind Fantasy Football is that every user starts off with a budget of £100 million and chooses a team of 15 players separated into 4 different positions: 2 goalkeepers, 5 defenders, 5 midfielders and 3 forwards. It is allowed to have only 3 players from the same club which means that it is impossible to load up only players from top notch clubs. Before each gameweek, Fantasy users pick 11 players for their team. Such virtual players will get points based on their real-life performances in actual matches. Additionally, users are allowed to pick one virtual player as a captain whose points will be doubled up at the end of each gameweek. That is Fantasy Football in a nutshell. 

Putting together a winning team

Because of the continuous rise in popularity and number of users every season, achieving a respectable rank is becoming increasingly difficult. In order to be competitive in the FPL, the main goal is to try to predict the future. How to identify players that will score the most points in the season? In the gameweek? These are some of the questions that you need to answer. But this is far from easy. Here are a few steps that can help you get ahead of the competition.

Processing the Raw Data

The first step in data analytics is to actually gather the relevant data. Current statistics on each individual player and their teams may be scraped from the Fantasy Football API. Typically, data is stored in JSON format so an additional JSON parser would be needed in order to obtain data of interest. Rather than just collecting the FPL data, combining data from other relevant sources such as the official Premier League website, betting odds, news articles and blogs and even FPL forums in order to collect all the recent and relevant news would improve the chance of a more accurate prediction. Even though more data requires more work, it is worth adding all the valuable data points to your model.

The second step is to prepare and clean the obtained dataset for further usage. It is crucial to understand all the obtained fields and also standardize those, especially in cases of parsing data from multiple sources. 

Once the final dataset is ready, we can start with the actual analysis.

Let’s dive into data

We will cover a few data points that you may consider incorporating into your own model. All examples will be based on the data that is excerpted during the Gameweek 5 of the season 2020/2021. It is a small sample for now, but having all FPL statistics parsed and updated on a daily basis will provide the possibility to collect more data and excerpt more accurate conclusions going forward.

Pick players who will actually make a difference

Many of the FPL users will own expensive top notch players who bring the majority of points throughout the season. Hence, it is important to identify cheaper players not owned by so many users that will bring you differential points. One of the easiest approaches of how to identify such players is to divide players’ cost and total points that specific player has earned so far and hope that the trend of good games will continue.

Top 20 most valuable players and ownership statistics

There seems to be a few great hidden options among the top 20 most valuable players that can make the difference and it may be worth adding them into your team. One such example is Creswell, defender from West Ham who is the 6th most valuable player and he is owned by only 2.7% FPL users. Bear in mind that a huge number of high performers are generating “easy” points by taking most of the penalties, free kicks and corners. You just need to identify such players and wait for the goals or assists.

Pick as many starters as possible

It is important to own players that have a chance of scoring points during each gameweek. Having players that come into play in the late stage of the match or players who play every other match is usually a bad strategy in the long run. One such example from the last season is Manchester City’s Pep Guardiola who rarely played two matches in a row with the same players. In order to avoid this trap, we can pick more players from teams which tend to play with at the similar squad during each gameweek.

Number of players that played all games so far

By analyzing only top teams from the table above, we can see that Leicester is playing without a lot of substitutions from the beginning of the season while Chelsea has only a single player who played full time in all matches during previous gameweeks.

Stop wasting budget on starters that do not bring many points

The best players in FPL are the ones that get points regardless of their influence on the pitch. Focus on the players who are the most likely to get points via goals, assists and clean sheets. With that said, avoid defensive midfielders such as Kante (Chelsea) or Matic (Man Utd). Even though they are great players, they have more defensive roles while on pitch and will not get you many points. By obtaining data from the FPL API there are two factors that can help you filter out such players. Those factors are Creativity and Threat. Creativity is calculated based on the number of crosses and assists players are creating while Threat is used to show the players that are most likely to score goals. These two factors may also help you find defenders who have offensive aspirations. We are all familiar with the expensive solutions in defence such as Alexander Arnold and Robertson (Liverpool) but there are also affordable players that can be identified by using said factors such as Justin (Leicester) or Dallas (Leeds).

A single extraordinary performance is no guarantee of future great results

It is important to be patient and avoid making transfers every time players that are not in your team earn a lot of points in a single gameweek. One great match does not mean a similar performance will happen again. Oppositely, after a couple of bad gameweeks, FPL users will usually remove such players from their teams, but if there is a data driven explanation that such players are getting quality goal opportunities even though they were unlucky in previous matches, it may be worth waiting and earning points when the competition does not own such players. It is a long season so patience can be key.

Use future fixture schedule while making transfers

Make sure to take into account difficulties of at least 3 upcoming matches when choosing your team. Every player may score against each team but players who play against bad defence have a high possibility of earning more points. Furthemore, match ups of similar opponents may also be taken into account. Creating a pair-wise variable, such as two teams that do not score many goals, may also be a good solution. The probability of a clean sheet is really high in such matches so you can benefit by having defenders from those teams.

Your captain choice may be crucial in making a difference

This is one of the most interesting aspects but it also may be one of the most frustrating decisions that you make throughout the entire season. Even though a specific gameweek was not so productive for the players in your team, there might be a single player in your team who scored a lot of points and you put him as a captain. It may significantly boost up your score and save you from a total flop. So how to pick a good captain? The best solution is to mix all of the above tips and calculate who would be the best choice for the specific gameweek. From a statistical point of view, you’ll do better off choosing captains based on fixture difficulty rather than player form. Or if you are not sure what to do, just pick Mohamed Salah as a captain :).

To sum up

Once you collected all relevant information, it would be great to blend it all together and come up with a model that will generate you a winning team. The model may not be perfect, but you can always tweak it and after some time you will definitely generate more points than expected. After all, it is a great exercise for exploring the world of data analytics. Remember, if you torture the data long enough, it will confess eventually.

Want to discuss this in relation to your project? Get in touch:

Leave a Reply