The data sets are a result of three crawls of mobile web, as documented in our ACM CCS paper, in May 2018. Two of crawls were performed from the University of Illinois (US1 and US2) and a third was from a data center in Frankfurt (EU1). The crawls visited 100,000 websites as taken from Alexa top sites. The list of sites used and their corresponding ranks is included as site-list.csv
.
The files <crawl>-crawl-data.sqlite.xz
and <crawl>-javascript.ldb.tar.xz
(e.g., US1-crawl-data.sqlite.xz
) contain the raw data generated by OpenWPM, as described in https://github.com/citp/OpenWPM#output-format. The crawl data file contains an sqlite3 database (compressed using xz) with instrumentation data from each web page load, and the javascript database contains all of the scripts fetched while loading a site, stored using LevelDB instance (and then archived using tar and xz).