Skip to main content

Data vs. Narrative: WC 2022

·164 words
Jonathan Kudsk
Author
Jonathan Kudsk
Datamatiker student · Backend, data & web

This project investigates whether Lionel Messi’s 2022 World Cup Player of the Tournament award can be explained by offensive performance data alone. It compares Messi and Kylian Mbappé using per-90 normalization, engineered features, weighted scoring, clustering, and model validation.

There is no public deployment for this work; the full notebook, Streamlit app, pipeline, and data live in the GitHub repository.

What the project includes
#

  • A reproducible Python pipeline that filters players, merges World Cup CSV datasets, engineers V4 offensive features, and calculates weighted player scores.
  • Analytical notebooks using EDA, KMeans clustering, Random Forest regression/classification, feature importance, and robustness checks with ARI and Monte Carlo weight perturbation.
  • A Streamlit dashboard for exploring top players, score decomposition, cluster membership, and direct Messi vs. Mbappé comparisons.

Result
#

The analysis produced a structured BI workflow showing how offensive data can support or challenge football narratives, with Mbappé ranking highly on the project’s offensive metrics while Messi and Mbappé remain comparable elite profiles in the clustering analysis.