Monday 21 August 2023

Generate teams of similar strengths with contiguous areas

Idea

It's quite sad that so many great countries (e.g. India) and players (e.g. Haaland) may never play at the FIFA (football/soccer) World Cup.

enter image description here

It would be neat to try and create a more balanced event, while still keeping the location-based element, where each team is of a (roughly) similar strength, with all of the players from a contiguous area (preferably of the same landmass, but obviously not possible in all circumstances, and beyond the scope of this question).

For example, in the case of Norway, maybe it would work out to roughly be a Scandinavian team, or for France, maybe there would be multiple teams with a team of just the players from the Banlieues of Paris.

I could then afterwards plot the areas, maybe using Voronois.

Background

I managed to gather the relevant information, through scraping (of Football Manager and Transfermarkt), but I'm stuck on how to design the algorithm to select the teams.

Problem

There is a large number of coordinates, which correspond to places of birth of players. These players all have ratings (from 1 - 100) and positions.

The task is, given a certain team size (11), and a certain number of teams (in the example below, 10), and a certain number of required players in each position (1, though substitutes would be handy), divide the area up into contiguous areas where the best team you can form from the players of the given area has roughly equal skill to the teams of other areas.

Question

I've been reading a bit about graph theory things, but I'm unsure how to create an algorithm for this kind of problem, with all of its nuances. Any advice you could provide would be much appreciated!

If you can create something with the toy example, that would be amazing!!

Also, if you can find a way to narrow the problem, and can create something which addresses that smaller problem, which can then be generalised to the larger problem, that would be great too.

Sample code (in R, but I've included Python and Julia equivalent code further below)

set.seed(0)

library(tidyverse)

df <- tibble(
    # generate random latitudes and longitudes, and the number of players for that location
    lat = runif(100, 0, 90),
    lon = runif(100, 0, 180),
    players = rnorm(100, 0, 5) |> abs() |> ceiling() |> as.integer()
)

num_positions <- 11

position <- letters[1:num_positions]

df <- df |>
   # generate position and skill data, and unnest
  mutate(position = map(players, ~sample(position, .x, replace = TRUE)),
         skill = map(players, ~sample(1:100, .x, replace = TRUE)))|>
         unnest_longer(c(position, skill))

# plot the data
df |> 
  summarise(skill = mean(skill), players = first(players), .by = c(lat, lon)) |>
  ggplot(aes(x = lon, y = lat)) +
    geom_point(aes(size = players, color = skill)) +
    scale_size_continuous(range = c(1, 10)) +
    scale_color_viridis_c(option = "magma") + # similar to the Colombian shirt colours
    theme_minimal() +
    theme(legend.position = "none",
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

n_teams <- 10

Scatter plot of the sample data: scatter plot of sample data

Notes

  1. In the thing I want to do, it would involve 64 teams, so at least 704 players, and probably around 3x the size of the sample dataset. The real dataset has a lot of rows, but by filtering it, I should be able to get it down to a few thousand.

  2. In real life, some players can play well in more than one position, but in the example code I gave, each player only has one position. Adding multiple positions per player would likely make this a lot more difficult to solve, so it's outside the scope of this question.

  3. If you can do it across a sphere (like the globe), that would be amazing, but a rectangle would be okay too.

Update:

Python code:
import random, math, pandas

random.seed(0)

df = pandas.DataFrame({'lat': [random.uniform(0, 90) for i in range(100)],
                'lon': [random.uniform(0, 180) for i in range(100)],
                'players':[math.ceil(abs(random.normalvariate(0, 5))) for i in range(100)]})

num_players = 11

positions = list(map(chr, range(97, 97+num_players)))

df['position'] = df['players'].apply(lambda x: random.choices(positions, k=x))
df['skill'] = df['players'].apply(lambda x: random.choices(range(1, 101), k=x))
Julia code:
using Random, DataFrames
Random.seed!(0)

df = DataFrame(lat = rand(100) * 90, 
               lon = rand(100) * 180, 
               players = Int64.(ceil.(abs.(5randn(100)))))

num_positions = 11

position = ['a'+ i for i in 0:num_positions-1]

df[!, :position] = [rand(position, players) for players in df.players]
df[!, :skill] = [rand(1:100, players) for players in df.players]

df = flatten(df, [:position, :skill])


from Generate teams of similar strengths with contiguous areas

No comments:

Post a Comment