Last active
April 7, 2023 17:17
-
-
Save benmarwick/27b3a8df2b141158dd4b1daf8e6c04f7 to your computer and use it in GitHub Desktop.
Scraping academic jobs on wikia.com
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyverse) | |
base_url <- "http://academicjobs.wikia.com/wiki/Archaeology_Jobs_" | |
# starts at 2010-2011 | |
years <- map_chr(2010:2019, ~str_glue('{.x}-{.x +1}')) | |
# though it seems to start at 2007-8: https://academicjobs.fandom.com/wiki/Archaeology_07-08 | |
urls_for_each_year <- str_glue('{base_url}{years}') | |
library(rvest) | |
#------------------------------------------ | |
# 2010-2011 has no table | |
urls_for_each_year[1] %>% | |
read_html() %>% | |
# html_nodes('.mw-content-text') %>% | |
html_nodes('.mw-headline') %>% | |
html_text() | |
#------------------------------------------ | |
# table first appears in 2011-2012 | |
urls_for_each_year[2] %>% | |
read_html() %>% | |
html_node('table , td') %>% | |
html_table() | |
# but headings are not systematic | |
urls_for_each_year[2] %>% | |
read_html() %>% | |
# html_nodes('.mw-content-text') %>% | |
html_nodes('.mw-headline') %>% | |
html_text() | |
#------------------------------------------ | |
# table first appears in 2012-2013 | |
urls_for_each_year[3] %>% | |
read_html() %>% | |
html_node('table , td') %>% | |
html_table() | |
urls_for_each_year[3] %>% | |
read_html() %>% | |
# html_nodes('.mw-content-text') %>% | |
html_nodes('.mw-headline') %>% | |
html_text() | |
#------------------------------------- | |
# all years | |
urls_for_each_year_headers <- | |
map(urls_for_each_year, | |
~.x %>% | |
read_html() %>% | |
html_nodes('.mw-headline') %>% | |
html_text()) | |
# what are the different sections? | |
# "TENURE-TRACK POSITIONS" | |
# "TENURE-TRACK OR TENURED / FULL-TIME POSITIONS" | |
# "Tenure-Track or Tenured / Full-time Position " | |
# "ASSISTANT PROFESSOR OR OPEN RANK" | |
# "TENURE TRACK ASSISTANT PROFESSOR OR OPEN RANK" | |
# "TENURED ASSOCIATE OR FULL PROFESSOR" | |
# "ASSOCIATE OR FULL PROFESSOR" | |
# "NON-TENURE-TRACK POSITIONS" | |
# "VISITING POSITIONS / Limited-Term Appointments / Postdocs" | |
# "VISITING POSITIONS / LIMITED TERM APPOINTMENTS / POSTDOCS" | |
# "VISITING POSITIONS / LIMITED-TERM APPOINTMENTS / POSTDOCS / PART-TIME POSITIONS" | |
# "VISITING POSITIONS" | |
# "COMPLETED SEARCHES" | |
# "DISCUSSION, RUMORS AND SPECULATION" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Relevant literature:
Karl, R., Möller, K. and Krierer, K. 2012. Ain’t got no job. The archaeology labour market in Austria,Germany and the UK, 2007-2012. Vienna: http://www.archaeologieforum.at https://www.academia.edu/9367180/Ain_t_got_no_job_The_archaeology_labour_market_in_Austria_Germany_and_the_UK_2007_2012_Vienna_I%C3%96AF_2012
Boothby, C., Milojević, S. An exploratory full-text analysis of Science Careers in a changing academic job market. Scientometrics 126, 4055–4071 (2021). https://doi.org/10.1007/s11192-021-03905-2
Rachael Pitt & Inger Mewburn (2016) Academic superheroes? A critical analysis of academic job descriptions, Journal of Higher Education Policy and Management, 38:1, 88-101, DOI: 10.1080/1360080X.2015.1126896
Leon, L. A., Seal, K. C., Przasnyski, Z. H., & Wiedenman, I. (2018). Skills and competencies required for jobs in business analytics: A content analysis of job advertisements using text mining. In Operations and Service Management: Concepts, Methodologies, Tools, and Applications (pp. 880-904). IGI Global.
Kortum, H., Rebstadt, J., & Thomas, O. (2022, January). Dissection of AI Job Advertisements: A Text Mining-based Analysis of Employee Skills in the Disciplines Computer Vision and Natural Language Processing. In Proceedings of the 55th Hawaii International Conference on System Sciences.