Fantastic data journalism by Christine Zhang at the LA Times: https://github.com/datadesk/homeless-arrests-analysis
First I grabbed the data - a zipped feather file:
wget https://github.com/datadesk/homeless-arrests-analysis/blob/master/arrests.zip?raw=true
mv arrests.zip\?raw\=true arrests.zip
unzip arrests.zip
This produced an arrests.feather
file.
Next I made a Python virtual environment and installed the dependencies needed to access that file:
virtualenv --python=python3 venv
source venv/bin/activate
pip install feather-format pandas
Then in Python I used pandas to turn the .feather file into a CSV:
import feather
df = feather.read_dataframe('arrests.feather')
df.head()
df.to_csv(open('arrests.csv', 'w'))
Quick inspection...
$ head arrests.csv
,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc
...
I used vi
to add a id
column ad the start of that line:
id,booking_num,homeless,arrest_year,arrest_ymd,booking_ymd,gender,race,age,occupation,charge_code,charge_desc
Then I used csvs-to-sqlite to build a database, extracting some of the columns into foreign key tables:
csvs-to-sqlite arrests.csv -c gender -c race -c occupation -c charge_code -c charge_desc arrests.db
I previewed the database using datasette arrests.db
, then I published it to now.sh
using datasette publish now
:
datasette publish now arrests.db \
--title="LA Times homeless arrests data" \
--source="LA Times" \
--source_url="https://github.com/datadesk/homeless-arrests-analysis"
> Deploying /private/var/folders/jj/fngnv0810tn2lt_kd3911pdc0000gp/T/tmpdq7q__9e/datasette under simonw
> Ready! https://datasette-qjopjrpscl.now.sh (copied to clipboard) [36s]
> Synced 3 files (76.57MB) [0ms]
> Initializing…
> Building
...
Finally I set up a nicer URL using now alias
:
now alias https://datasette-qjopjrpscl.now.sh la-times-homeless-arrests-analysis.now.sh