$ docker pull postgres
$ docker run --name ss-pg -e POSTGRES_PASSWORD=secret -d -p 5432:5432 postgres
This pulls the Postgres image and runs it with name ss-pg
. You'll probably want to pick a better password than just secret
. This is the password for the postgres
admin user. The option -p 5432:5432
makes the Postgres instance available on localhost on port 5432, which will be convenient later when we fill it with data.
Let's open up a bash shell from within the container and create databases.
$ docker exec -it ss-pg bash
$ su postgres
$ psql
postgres=# CREATE DATABASE superset;
postgres=# CREATE DATABASE example;
The database superset
will contain tables that are used internally by Superset. The data will go into example
.
We now need to configure Superset. I used the example from the documentation and changed only the database URI.
I placed it at ~/pg-ss-tutorial/superset_config.py
.
Use:
SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:secret@ss-pg:5432/superset'
I don't think Airbnb supplies an official Docker image for Superset, but luckily someone else has gone through the trouble to make it available. And at the time of writing, it was updated very recently.
We're back on the host machine now, but keep the Postgres shell open in a different window.
And now run the container:
$ docker pull amancevice/superset
$ docker run --name ss -d -p 8088:8088 -v ~/pg-ss-tutorial:/etc/superset --link ss-pg:postgres amancevice/superset
It should run now when you try docker ps
, if not check docker logs superset
Now tell Superset to create its internal database:
docker exec -it superset superset-init
You'll be asked to configure an admin user during this step.
Now browse to http://localhost:8088 and login with the admin user you just created.
Go to Sources, Add database.
- Name:
example
- SQLAlchemy URI:
postgresql://postgres:secret@ss-pg:5432/example
Ok, we can now fill the database with tables.
There are all sorts of ways to load data.
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:funky@localhost:5432/kad')
df.to_sql('vector_matched', engine)
The DataFrame .to_sql(...)
command doesn't give any feedback when it completes, so if you're paranoid (and still have the Postgres shell open) you can try the following:
postgres=# \c example
You are now connected to database "example" as user "postgres".
example=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+----------------+-------+----------+--------+-------------
public | vector_matched | table | postgres | 392 kB |
(1 row)
Yep, it worked!
- In Superset, go to Sources -> Tables -> Add
- Choose database: example and the table you just added.
- Schema can be left empty.
You can see a all running containers with:
docker ps
To stop something, do:
docker stop <id or name>
E.g.:
docker stop pg-ss
docker stop ss
Now, to list also the containers that aren't running, do:
docker ps -a
To start it later, just do:
docker start pg-ss
docker start ss
Should you want to remove the containers, do:
docker rm pg-ss
docker rm ss
You actually need to go a bit further than this, because all the pros strongly advise against storing data inside a Docker container, and with good reason. Containers are supposed to be easy to destroy and re-create, so you don't want your data stuck in there. I wanted to kee things simple so I didn't include it but there's a lot of documentation on the subject.