GitHub repository: https://github.com/rOpenGov/psData
Social scientists have access to many electronically available panel series datasets. However, downloading, cleaning, and merging them together is time-consuming and error-prone: for example, using Reinhart and Rogoff's data on the fiscal costs of the financial crisis involves downloading, cleaning, and merging 4 Excel files with over 70 individual sheets, one for each country’s data. Furthermore, because such datasets are not bundled in a format that is easy to manipulate, many of them are not updated on a regular basis.
In this talk, we introduce the psData
package for the R statistical software. This package is being developed under the rOpenGov
framework to solve two problems:
- Time wasted by social scientists downloading, cleaning, and transforming commonly used data sets for their own research
- Errors introduced by data import and transformation scripts that are written individually and never shared across researchers
The psData
package aims to address these problems by distributing easy to use R functions for downloading, cleaning, and merging datasets used by social scientists. The package focuses on panel series data, which are frequently found in political science and macroeconomics. It is hosted on GitHub and can be easily added to and modified by the community, which will allow to fix and patch distributed datasets to all users simultaneously, improving overall data quality.
The team behind this project currently includes NN members from universities in NN countries. NN members will be in Berlin at the time of the conference.