Some quick work to facilitate reading data for the Census 2020 PL94-171 data release into Pandas dataframes.
Sample data for Providence County, RI can be downloaded from https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html, as can auxiliary materials.
The file headers.py
was created by parsing the SAS import scripts from the link above.
It seems as though the Census Bureau removed the sample data for Providence County, RI, against which this code was tested. You can get a copy of it from http://files.censusreporter.org/ri2018_2020Style.pl.zip
Note: The full data release is now available at https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File--PL_94-171/
To run this:
- download the two
.py
files here and put them in a directory - retrieve the ZIP above, or, once the data is out, the real 2020 data for your state. Unzip it into the same directory, and, if necessary, update the filenames in
FILES
(increate_dataframes.py
)
Then, in ipython
or jupyter
, enter
%run create_dataframes.py
after it's done, you should be able to use p1
, p2
, p3
, p4
, p5
, and h1
in your notebook/runtime.
See headers.py
for comments explaining the meaning of the different column names, or read the technical docs from Census.