Skip to content

Instantly share code, notes, and snippets.

@keynmol
Last active June 9, 2017 16:13
Show Gist options
  • Save keynmol/432f5104f292b66eb9601aee53deb3bf to your computer and use it in GitHub Desktop.
Save keynmol/432f5104f292b66eb9601aee53deb3bf to your computer and use it in GitHub Desktop.
Scrape Tax Calculator for Gross vs. Net income information

Requirements:

  • Scraping: curl, ack, python
  • Visualisation: R, tidyverse packages, scales

Scrape

echo "Gross;Net" > data.csv; 
python -c 'RANGE=(20000, 200000); STEPS=40; [print(int(RANGE[0] + i*(RANGE[1] - RANGE[0])/STEPS)) for i in range(1, STEPS)]' | \ 
xargs -P4 -n1 -I{} ./scrape_tax_calculator.sh {} >> data.csv

This will give you data.csv

Visualise

library(tidyverse)
library(scales)
read_csv2("data.csv") %>% 
  arrange(Gross) %>% 
  mutate(GrossIncrease=Gross/lag(Gross, 1) - 1, NetIncrease=Net/lag(Net, 1) - 1) %>% 
  select(-Net) %>% 
  gather(Type, SalaryIncrease, -Gross, GrossIncrease, NetIncrease) %>% 
  filter(complete.cases(.)) %>% 
  ggplot(aes(Gross, SalaryIncrease, colour=Type)) + 
      geom_line() + 
      scale_y_continuous("Increase compared to previous salary", labels=percent) + 
      scale_x_continuous(labels=dollar_format(prefix="£"), breaks=seq(24500,195500,9000)) + 
      xlab("Gross Income") + 
      theme(axis.text.x = element_text(angle=45)) + 
      geom_point(size=1.2) + 
      scale_color_discrete("", labels=c("Gross", "Net")) + 
      ggtitle("Gross vs. Net relative salary increase\nIf your gross salary increases by 10%, how will net salary increase?") + 
      theme(text=element_text(size=15))
curl -s -X POST http://www.moneysavingexpert.com/tax-calculator/request/ -F "taxman_grosswage=$1" -F 'taxman_payperiod=1' -F 'taxman_year=2017' -F 'taxman_age=0' -F 'taxman_extra=' -F 'email=' -F 'taxman_vw_yr=on' -F 'taxman_vw_mth=on' -F 'taxman_vw_1wk=on' | jq '.mainOutput' | ack 'This means <strong>&pound;(.*?)</strong> in your' --output "$1; \$1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment