Skip to content

Instantly share code, notes, and snippets.

View tonyelhabr's full-sized avatar
🏠
Working from home

Tony ElHabr tonyelhabr

🏠
Working from home
View GitHub Profile
@tonyelhabr
tonyelhabr / how-to-get-fotmob-stats.md
Created January 1, 2025 20:07
How to get fotmob league stats with R

General Guide

  1. In your browser, go the URL where you can find the data that you want. For example, https://www.fotmob.com/leagues/126/stats/premier-division?season=2024
  2. Use the browser's "Inspect" feature and "Network" tab to find the right GET request. You should be able to tell which request is the right one by looking to see if there is a detailed JSON response in the "Response" tab. image
  3. Right click on the request and copy the CURL command. image
  4. Go to https://curlconverter.com, copy the CURL command into the "curl command" box at the top of the page with "R > httr2" (or "R > httr") selected. image
library(tidyverse)
library(rvest)
library(janitor)
page <- read_html('https://mychiptime.com/searchevent.php?id=15902#%206')
raw_table <- page |> html_table() |> pluck(1)
melted_table <- raw_table |>
mutate(
row = row_number()

Looking at the biggest differences in 2024/25 xG in data scraped right after the match and then later updated to account for ball height and defender positioning.

library(dplyr)
library(tibble)

joined_data <- readRDS('joined_fb_match_shooting_big5_20241201.rds')
@tonyelhabr
tonyelhabr / gamestate-xgd.md
Last active June 16, 2024 04:04
The correct way to calculate xG difference by gamestate

Here's some fake data showing how shot logs might look for a soccer match.

library(tibble)
library(dplyr)

df <- tibble::tibble(
  shot_id = seq.int(1, 8),
  minute = c(7, 13, 25, 30, 41, 44, 58, 78), ## doesn't matter, just for illustrative purposes
@tonyelhabr
tonyelhabr / print-for-chunks.md
Last active April 6, 2024 14:34
Replicate reprex printing for arbitrary variables

This is useful if you simply want to copy-paste some code output without having to run code (or render/knit a whole document).

Method 1

## Based on internals of reprex:::reprex_impl
print_df_for_chunk <- function(code_string) {
  reprex_document_options <- list(
    venue = 'gh', 
@tonyelhabr
tonyelhabr / scrape-places.md
Last active March 11, 2024 13:14
Scraping Google Places API

Setup.

library(googleway)
library(dplyr)
library(tibble)

DATA_DIR <- "path/to/dir"
dir.create(DATA_DIR, showWarnings = FALSE, recursive = TRUE)
@tonyelhabr
tonyelhabr / scrape-big5-team-logos.md
Created January 26, 2024 13:59
Get Big 5 team logos from FBref
library(rvest)
library(tibble)

url <- 'https://fbref.com/en/comps/Big5/Big-5-European-Leagues-Stats'
page <- read_html(url)

team_elements <- page |> 
  html_elements('table') |> 
@tonyelhabr
tonyelhabr / delete-gha-runs.md
Last active July 9, 2024 00:14
Delete GitHub action run logs
library(gh)
library(purrr)
library(dplyr)

token <- Sys.getenv("GITHUB_PAT")
REPO <- "my-repo"
OWNER <- "me"

runs &lt;- gh::gh(
@tonyelhabr
tonyelhabr / shots.md
Created September 14, 2023 11:30
shot x-y data

Full set of fotmob shot releases here: https://github.com/JaseZiv/worldfootballR_data/releases/tag/fotmob_match_details

raw_shots <- readr::read_csv('https://github.com/JaseZiv/worldfootballR_data/releases/download/fotmob_match_details/47_match_details.csv')
#> Rows: 28667 Columns: 43
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (18): league_name, league_round_name, parent_league_season, match_time_u...
#> dbl (22): match_id, match_round, league_id, parent_league_id, home_team_id, ...
@tonyelhabr
tonyelhabr / pull-2023-fifa-womens-world-cup-xg.md
Created August 27, 2023 22:20
Pull xG data from StatsBomb and Opta (via FBRef) for the 2023 FIFA Women's World Cup

Raw data pull

library(StatsBombR)
library(worldfootballR)
library(dplyr)
library(janitor)
library(tibble)