Postgres & Superset

Postgres Installation

$ docker pull postgres
$ docker run --name ss-pg -e POSTGRES_PASSWORD=secret -d -p 5432:5432 postgres

This pulls the Postgres image and runs it with name ss-pg. You'll probably want to pick a better password than just secret. This is the password for the postgres admin user. The option -p 5432:5432 makes the Postgres instance available on localhost on port 5432, which will be convenient later when we fill it with data.

Postgres Configuration

Let's open up a bash shell from within the container and create databases.

$ docker exec -it ss-pg bash
$ su postgres
$ psql
postgres=# CREATE DATABASE superset;
postgres=# CREATE DATABASE example;

The database superset will contain tables that are used internally by Superset. The data will go into example.

Create `superset_config.py`:

We now need to configure Superset. I used the example from the documentation and changed only the database URI.

I placed it at ~/pg-ss-tutorial/superset_config.py.

Use:

SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:secret@ss-pg:5432/superset'

Superset installation

I don't think Airbnb supplies an official Docker image for Superset, but luckily someone else has gone through the trouble to make it available. And at the time of writing, it was updated very recently.

We're back on the host machine now, but keep the Postgres shell open in a different window.

And now run the container:

$ docker pull amancevice/superset
$ docker run --name ss -d -p 8088:8088 -v ~/pg-ss-tutorial:/etc/superset --link ss-pg:postgres amancevice/superset

It should run now when you try docker ps, if not check docker logs superset

Now tell Superset to create its internal database: docker exec -it superset superset-init

You'll be asked to configure an admin user during this step.

Superset run

Now browse to http://localhost:8088 and login with the admin user you just created.

Go to Sources, Add database.

Name: example
SQLAlchemy URI: postgresql://postgres:secret@ss-pg:5432/example

Ok, we can now fill the database with tables.

Python code to load a DataFrame

There are all sorts of ways to load data.

from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:funky@localhost:5432/kad')
df.to_sql('vector_matched', engine)

Did it work?

The DataFrame .to_sql(...) command doesn't give any feedback when it completes, so if you're paranoid (and still have the Postgres shell open) you can try the following:

postgres=# \c example
You are now connected to database "example" as user "postgres".
example=# \d+
                         List of relations
 Schema |      Name      | Type  |  Owner   |  Size  | Description
--------+----------------+-------+----------+--------+-------------
 public | vector_matched | table | postgres | 392 kB |
(1 row)

Yep, it worked!

Back to Superset

In Superset, go to Sources -> Tables -> Add
Choose database: example and the table you just added.
Schema can be left empty.

Stopping and starting it later

You can see a all running containers with:

docker ps

To stop something, do:

docker stop <id or name>

E.g.:

docker stop pg-ss
docker stop ss

Now, to list also the containers that aren't running, do:

docker ps -a

To start it later, just do:

docker start pg-ss
docker start ss

Should you want to remove the containers, do:

docker rm pg-ss
docker rm ss

Caveat

You actually need to go a bit further than this, because all the pros strongly advise against storing data inside a Docker container, and with good reason. Containers are supposed to be easy to destroy and re-create, so you don't want your data stuck in there. I wanted to kee things simple so I didn't include it but there's a lot of documentation on the subject.

jvanlier/postgres_superset.md