Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

Decision Algorithms in Sport

Written by Luke Menzies and Gavita Regunath

Introduction

Data science is the banner, under which many techniques and algorithms are applied to real-world business problems and services. All too often within the business world, discussion of such techniques is often scoped to descriptive methods. These descriptive methods are considered to be comprised of regression, clustering and classification. All three are very powerful methods that can be used to drive profit and growth within a company. These are the trendy techniques that have been adopted by many services and products, where a great effort has been made to automate the process, to allow even the most modest tech savvy employee to deliver accurate analytical insights. There is an element of sadness in this since data science has more to offer than just this for the business world.

An often-neglected part of data science is the second part of the puzzle, namely, prescriptive methods. Descriptive and prescriptive methods go hand-in-hand, where descriptive techniques can be used to fuel the prescriptive ones. Once there is a well-known understanding of variables and trends, the final step is being able to know what to do with these variables. This is where prescriptive methods come in, where the goal is to optimise actions or decisions the business makes to give the best possible outcome. Prescriptive methods in data science are often referred to as optimisation. A simple explanation is the desire to either maximise or minimise a chosen metric within a system, with a set of constraints set by whoever is trying to search for the ultimate maximum or minimum value. This type of optimisation problem can be broken down further into linear and non-linear optimisation. Although the desired outcome for the best values through prescribed actions is the same, non-linear optimisation problems are the more challenging of the two, where only a limited number of problems can be solved. Data science has definitely allowed this field to come on leaps and bounds with the use of AI. However, it is certainly one of the least mature parts of data science. Luckily, many optimisation problems in the business world can be approximated to linear optimisation problems.

Linear optimisation

Linear optimisation (often referred to as linear programming) is not cutting edge or new. It has been around for a very long time. It was first introduced within the field of operational research during World War II, where it was used to help minimise costings. The method proposed for solving these problems is known as the simplex method, and it hasn’t changed much today. Although this method hasn’t changed significantly, what has changed significantly is the computing power and accessibility of this technique, allowing these methods to be used on very complex scenarios with almost a click of a button. Convenient libraries have allowed the intricate complexities of setting these problems up on a computer to be simplified.

This article is going to give a simple example for such a scenario which, although it may seem mundane, can be quite a laborious task that is hard to get perfect. This example is the scheduling of a football teams rota. This prescriptive problem is not necessarily an optimisation problem since it isn’t trying to maximise or minimise anything as such. The goal is to run it through the algorithm to ensure the outcome hits the desired set of constraints.

This problem is not fictitious. This was actually used to solve a real-world scheduling problem that was faced by a local football team in England, United Kingdom. To maintain the confidentiality of the players, all names have been changed.

The football team scheduling task

Consider the situation where you are a football manager or coach for an under 8s football team. The coaching session takes place every Saturday, where there is always a 40 minute 5-a-side game. During the length of the game, all players have the opportunity to participate in various positions. If you had a team that consisted of 5 players that had to be distributed across 5 playing positions, this would be a fairly straightforward scheduling problem.

After examining your team, you realise that there are more players than positions available for each team. Although this appears to be excellent and gives the impression of having the freedom to switch between players, it then becomes a scheduling challenge of how to cycle players to guarantee a balanced allocation of time and positions. When managing a young football team, the goal is to ensure that all players get an equal amount of play and have the freedom to experience different play positions within a season. A manager would want to avoid a situation in which the rota scheduling results in a player being left on the bench for the whole game. Similarly, you would want to avoid a situation in which a player is forced to play the whole 40-minute period without any pauses or breaks. Essentially, reducing bias towards a particular player and ensuring fairness across all players are important factors to consider when dealing with a schedule.

In this team, there are 8 players, 5 positions and a 40-minute playtime with the option of substitution every 10 minutes.

This means that for a 40-minute session, you will be able to carry out four substitutions. Additionally, you need to ensure that the sum of time played for every player is equal across a season. For example, if Charlie had a total of 250 minutes of playtime for a season, then Jim, Elsie, Ruby, Tim, Mo, Raz and Juan all would have 250 minutes of playtime. You might be tempted to sit down in front of a spreadsheet and work it all out. You will soon realise that to achieve all the constraints and maintain fairness is challenging! However, knowing that you need to produce a schedule, through trial and error, you come up with the following schedule that suits the constraints closely.

Table 1: Results showing the schedule produced manually for a game.

The table shows the positions that each player would be allocated for every time slot. The abbreviations GK, LM, RM, D and F represent Goal Keeper, Left Midfield, Right Midfield, Defender and Front/Striker, whilst SUB denotes substitute.

You know there is likely to be a schedule out there that suits everyone. You’re just not sure how to get there. Lucky for you, data science can come to the rescue. A data scientist can produce a Python algorithm that generates a schedule that satisfies the players and team’s needs. The algorithm uses a library known as PuLP.

PuLP

PuLP is a linear programming library that interfaces into many other optimisation engines such as GNU Linear Programming Kit (GLPK- a set of routines written in C). It is a powerful library that can be used in a number of scenarios. The library relies on the setup of a user-defined objective function to be maximised or minimised whilst bound by a chosen set of constraints.

It allows for integer, float and binary decision variables to be considered within a linear programming problem. Traditionally, linear optimisation only considered float decision variables. The introduction of integer decision variables gave rise to a branch of linear programming problems known as mixed integer problems (MIP). PuLP is able to facilitate such problems, making it a very diverse tool.

The algorithm

The algorithm consists of three main areas, the input variables, the constraints and the objective function. The goal of the algorithms is to find an optimal solution using the input variables and the objective function. The end results could either be an optimal or infeasible solution.

Input variables

The first part of the script is to define the input variables and parameters used within the schedule.

slots = 4- The number of slots within a game. Since each game is a 40-minute game with 10 minutes each slot, the number of slots is 4.

days = 10 -The number of days the schedule goes up to. This means that the schedule will make a plan for 10 Saturdays games. This can be altered to suit the number of games in a season.

players = ["Charlie", "Jim", "Elsie", "Ruby", "Tim", "Mo", "Raz", "Juan"]- This is a list of the players for the team of interest.

roles = ["GK", "LM", "RM", "D", "F"]- This is the list of positions that can be played in the game:

  • GK = Goal Keeper

  • LM = Left Midfield

  • RM = Right Midfield

  • D = Defender

  • F = Front/Striker

match_slots = slots * len(roles)- the match slots is defined as the number of defined slots multiplied by the number of roles to be filled in a game.

max_assignments_per_person = math.ceil(match_slots/len(players))- is a function of the number of players available and the match slots. This is a dynamic variable that will be automatically computed using the number of players available per game, the number of roles defined and the number of slots.

For example, consider a game that has 4 slots and 5 positions to be filled. This would imply the match slots are 4 x 5 = 20 slots to be filled. If two players fell ill, so the team now has only 6 players, and we further constrained the maximum positions per player to be 3, this would then yield 6 x 3 =18 player slots. This is less than the match slots being 20, which would yield a solution that is infeasible. To overcome this, we need to ensure that the player slots is equal to or greater than the match slots. This ensures that if a player is unable to participate in a game, the algorithm will optimise the best schedule given the limitations.

min_slots_per_person = (max_assignments_per_person-1)- This ensures that the difference of maximum and minimum assignments allocated to a particular player is one slot. For example, if Ruby is allocated 3 slots out of the for available slots, then Juan will either be allocated 3 slots or a minimum of 2 slots. This eliminates the possibility of a situation in which one person is given 4 slots while another is given just 2.

Objective function

The next step is to initialise the problem object in the script:

problem = LpProblem("rota_generator", LpMaximize)

This class LpProblem object will contain the objective function and constraints that need to be added to it separately. The LpProblem class takes two arguments. The first is the title of the problem, the second is whether it’s a maximum or minimum problem. These are routines passed in from the PuLP library. Although it doesn’t matter in this scenario if it’s a maximise or minimise problem, an improvement to this scenario (which will be discussed later) will require it to become a maximise problem.

The next steps are to define variables you are looking to optimise.

We have called them assignments since we want to assign a player, for a certain position, in a certain slot, on a certain day. The category has been set to binary since there can only be a 1 or 0 assignment for each slot. There are 4 dimensions:

  • Slots = going 1 to 4

  • Person = the list of players

  • Role = the list of available positions

  • Day = the day the per game is on (going 1 to 10)

After this, we want to define the objective function. Below is the routine used for such a case. The objective function has been given the title the ‘Sum_of_everything’. Though it can be anything the user wants. The assignments variable is basically a dictionary. Since no player, position, slot or day is favoured over another, there is no cost associated with each element. Therefore, it is simply the sum of all the elements.

Constraints

Now we are ready to define the constraints, which is the main focal point of the problem. This set of rules make sure all players are happy with the arrangement. A total of 7 constraints were found for this scheduling problem, which will be detailed in full below.

Constraint 1

Firstly, we want to ensure that each position is assigned to only one player for any given game for every time slot. For example, if the striker position is assigned to Mo for the first slot, then that particular position cannot be assigned to Charlie as well. This is programmatically done using the following constraint.

This is an important constraint because it also ensures that every position is filled in every slot, as well as eliminating the situation of having two or more players being allocated the same position on a particular slot.

Constraint 2

Next, we want to ensure that each player is allocated to one position per slot. If for instance, Mo is assigned to be a striker in the first slot, he can’t then be allocated a position of a goalkeeper in the exact time slot, which is physically infeasible.

Constraint 3

Now we have setup constraints that make sure the scenario matches the real world, we are going to start considering conditions that are used to meet the needs of the players. When considering the requirements of the players, we do not want a player to play more than 3 slots in one game so that each player gets a 10-minute rest in a 40-minute game. This value is set in the variable max_assignments_per_person.

Constraint 4

Following on from that, we must assign a minimum number of slots in which a player will participate in a certain game. This is because we do not want to end up in a position where a player turns up for a 40 minute game, but is only scheduled to play for 10 minutes and is sat on the bench for the remaining 30 minutes. To overcome this, we can define a constraint that ensures that a player is allocated at least 20 minutes per 40 minute game. This value is set in the variable min_slots_per_person.

Constraint 5

The next set of constraints are shuffle constraints. This is not necessarily needed but it is an excellent addition to incorporate which ensures that each player has the opportunity to play in different positions in every game. For example, in a 40 minute game, Elsie will have the chance to play as goalkeeper, striker and defender. This gives Elsie the opportunity to experience various positions rather than pigeon-holing her into one particular position.

Constraint 6

Now we want to make sure every player gets to play every position at least once.

This will encourage fair distribution of positions amongst all players.

Constraint 7

Finally, we add an equal number of plays per season constraint. This can be a tricky constraint since it isn’t always wholly possible to have an equal number of plays for every player, depending on the number of days chosen. E.g. if the number of days is set to 1, it’s clear that one day isn’t enough to ensure an equal number of plays.

Now the problem and constraints are setup, we are ready to let the solver do the magic!

Using the command problem.solve() initiates an attempt at solving for a solution. You can see if it managed to find an optimal solution using problem.status . If this returns a 1, you know it has succeeded. Anything else means the problem may not be feasible or you need to re-examine how you’ve setup the constraints or objective function.

The way to get the schedule that has been selected as the ‘optimal schedule’ is to loop through the grid and use the pulp.value(assignments[slot, person, role, day]) commend. Each element will either be 0 or 1. E.g. if Jon plays position ‘G’ in slot 1 of day 1, this will be a value ‘1’.

Results

The script has now generated a schedule that meets both the demands of the players and also the requirements of the game. Everyone is happy!

Table 2: Results showing the schedule produced automatically for several games.

The table shows the generated schedule. This schedule took just under a second to be generated. Doing this by hand would have taken over 10 hours. This translates to an around 36000x improvement in time spent. Besides the substantial time savings, this algorithm also eliminates the effect of human bias whereby a player is allocated to a position to fit the schedule. The algorithm will ensure that players are scheduled optimally to meet the constraints.

If the football coach wanted to extend this further, they could weight certain positions for certain players. This would mean if certain players were better in, (or favoured) certain positions, the script would put them in those positions more often. This is because it is a maximisation problem and the algorithm will look the optimise the total value of the schedule, utilising the weightings. This is one use case for linear optimisation, however, there are many more!

Conclusion

A whole host of scheduling problems can be solved using linear programming techniques. Hopefully, this article will give you inspiration and motivation to use linear programming techniques and Pulp to solve scheduling problems. Tremendous success has been had with implementing linear programming to solve a football team scheduling problem. The algorithm has not only been able to produce a schedule almost 36000x quicker, but it has also been able to allocate players in a fair and consistent manner.

There are many techniques used in data science that is often overlooked by companies. Quite often this is because the techniques are harder to automate. This is why a good data scientist will be able to maximise revenue for a business, better than automated software will be able to. At least not yet anyway.

The code used to solve this problem has been uploaded to a GitHub repo, so feel free to fork and utilise it!