SQLite in Production

Okay, I can see how at first glance this might seem super hacky, but I just detached production calisphere from postgress, and now it is running from sqlite.

Basically. Postgress is still in use -- for the database where edits are done. We have an on demand EC2 machine running django where have just migrated all the editoral content we publish. It is all controlled in the django admin interface.

For launch, I just hooked production to the same postgress database we did the migration to (mostly from static HTML!), but that left us with no place to work on new content w/o publishing it. Now, when a batch of new editorial content is ready for publication, the script below will be run from the instance. (I think this will eventually be moved to calisphere-test beanstalk)

Django has a command to dump data from the database to json fixture files. When we are ready to publish new editorial content, the script to read from postgres and upload json to S3 is basically this:

# deploy-content.sh (simplified)
DUMP="exhibits-$1.json"
python manage.py dumpdata exhibits -o $DUMP
gzip $DUMP
aws s3 cp $DUMP.gz s3://$BUCKET/$DIR/$DUMP.gz

In production, we use Amazon Elastic Beanstalk. The environment gets a UCLDC_EXHIBITIONS_DATA=s3://$BUCKET/$DIR/$DUMP.gz that tells it where to look for the current data:

And in container_commands -- which get run when the environment or application version change:

# .ebextensions/01syncdb.config (simplified)
container_commands:
  01syncdb:
    command: "python manage.py migrate --noinput"
    # leader_only: true
  02load:
    command: "./load-content.sh"
  03collectstatic:
    command: "python manage.py collectstatic --noinput"

./load-content.sh will download the content json, and load it into sqlite.

# load-content.sh (simplified)
filename="${UCLDC_EXHIBITIONS_DATA##*/}"         # http://unix.stackexchange.com/a/64435/40198
name=$(echo $filename | rev | cut -c 4- | rev )  # http://stackoverflow.com/a/5863742/1763984

if [[ ! -e $name ]]  # have we seen this one before
  then
    aws s3 cp $UCLDC_EXHIBITIONS_DATA .
    gunzip $filename
    python manage.py loaddata $name  # un-load any old data first?
fi

This scales horizontally, because each django server has a copy of the data.

The json fixture is 1.4M uncompressed, and the sqlite db is 888K. Here are the web pages it drives https://calisphere.org/exhibitions/

Will have to make sure this works when the fixtures change! (might need to drop the old data fisrt?)

Also, this will not work if the database schema is migrating, but the schema is stable for right now.

I don't think this is a long term solution, but is seems to work pretty well for a read-only database.

deploy-content.sh load-content.sh .ebextensions/01syncdb.config

tingletech/sqlite_in_production.md

SQLite in Production