Goal: Convert .sql file ("sql dump") into CSV files.
The data lives in a zip file here:
https://api.whitehouse.gov/v1/downloads/data.sql.zip
If you download it and extract the sql file, you can look at its contents (only first few lines shown)
-- MySQL dump 10.13 Distrib 5.5.24, for Linux (x86_64)
--
-- Host: dbmaster-42 Database: petitions
-- ------------------------------------------------------
-- Server version 5.5.24-55-log
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
--
-- Table structure for table wtp_data_issues
--
DROP TABLE IF EXISTS wtp_data_issues;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE wtp_data_issues (
id int(11) NOT NULL DEFAULT '0' COMMENT 'TODO: please describe this field!',
name varchar(64) NOT NULL DEFAULT '' COMMENT 'TODO: please describe this field!',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='TODO: please describe this table!';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Dumping data for table wtp_data_issues
--
LOCK TABLES wtp_data_issues WRITE;
/*!40000 ALTER TABLE wtp_data_issues DISABLE KEYS */;
set autocommit=0;
INSERT INTO wtp_data_issues VALUES (1,'Agriculture'),(2,'Arts and Humanities'),(3,'Budget and Taxes'),(4,'Civil Rights and Liberties'),(8,'Climate Change'),(9,'Consumer Protections'),(10,'Criminal Justice and Law Enforcement'),(12,'Defense'),(13,'Disabilities'),(16,'Economy'),(18,'Education'),(19,'Energy'),(20,'Environment'),(21,'Family'),(22,'Firearms'),(24,'Foreign Policy'),(25,'Health Care'),(26,'Homeland Security and Disaster Relief'),(27,'Housing'),(28,'Human Rights'),(29,'Immigration'),(30,'Innovation'),(97,'Job Creation'),(103,'Labor'),(109,'Natural Resources'),(115,'Postal Service'),(121,'Poverty'),(127,'Regulatory Reform'),(133,'Rural Policy'),(139,'Science and Space Policy'),(145,'Social Security'),(151,'Small Business'),(157,'Technology and Telecommunications'),(163,'Trade'),(169,'Transportation and Infrastructure'),(175,'Urban Policy'),(181,'Veterans and Military Families'),(187,'Women\'s Issues'),(193,'Government Reform');
/*!40000 ALTER TABLE wtp_data_issues ENABLE KEYS */;
UNLOCK TABLES;
commit;
...
These are all SQL commands which can be used to recreate the original database. Unfortunately, not every version of SQL is interchangeable - these commands will work in mysql.
To install MySQL on a Mac, it's easiest to use homebrew. Install the software, then start the MySQL server, then start the MySQL client as follows:
brew install mysql
mysql.server start
mysql -uroot
Once in MySQL, you must enter the commands to make a new database and use it:
mysql> CREATE DATABASE pres;
mysql> USE PRES;
And finally execute the code in the sql dump:
mysql> \. data.sql
We can see what tables are present by using the following command inside MySQL:
mysql> SHOW TABLES;
+-----------------------------+
| Tables_in_pres |
+-----------------------------+
| wtp_data_issues |
| wtp_data_petition_issues |
| wtp_data_petition_responses |
| wtp_data_petitions |
| wtp_data_responses |
| wtp_data_signatures |
+-----------------------------+
6 rows in set (0.00 sec)
For each of the tables, we can export the tables as following:
mysql> SELECT * FROM wtp_data_signatures INTO OUTFILE '/tmp/wtp_data_signatures.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
This saves the data in the directory /tmp, and this command must be repeated for each table with a different name for each.
I zipped these csv files, and they come out to 200mb or so.