What would you need:
- Postgres 9.3, 9.4, 9.5, 9.6 or 10 with cstore_fdw extention (https://github.com/citusdata/cstore_fdw)
- Docker 1.12.6 or higher
- Docker Compose
- Linux machine
Hardware requirements
The following configuration ensures normal operation of the database
requirements | |
---|---|
CPU count | 8 |
Memory | 32G |
Disk storage space | > 75G |
Postgres 9.6.6 Docker container creation
Dockerfile for Postgres 9.6.6 (postgres/Dockerfile
)
FROM postgres:9.6.6
USER root
# Install cstore_fdw dependencies
RUN apt-get update && apt-get install -y \
apt-utils \
build-essential \
git \
protobuf-c-compiler \
libprotobuf-c0-dev \
postgresql-server-dev-9.6
# Install cstore_fdw
WORKDIR /opt/
RUN git clone https://github.com/citusdata/cstore_fdw
WORKDIR /opt/cstore_fdw
RUN PATH=/usr/local/pgsql/bin/:$PATH make ; \
PATH=/usr/local/pgsql/bin/:$PATH make install
Configuration file for Postgres 9.6.6 (postgres/postgresql.conf
)
listen_addresses = '0.0.0.0'
shared_preload_libraries = 'cstore_fdw'
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 4GB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 4
Docker Compose file (docker-compose.yml
)
version: "2.1"
services:
postgres:
build: ./postgres/
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=seacret_password
volumes:
- ./postgres/docker-persistence:/var/lib/postgresql/data
- ./postgres/postgresql.conf:/var/lib/postgresql/data/postgresql.conf
- ../accounts-leaks/BreachCompilation/data:/opt/BreachCompilation
ports:
- 127.0.0.1:5432:5432
networks:
default:
Now you can run Postgres 9.6.6 Docker container by command:
docker-compose up
Database creation
Create database accounts
by command in Postgres 9.6.6 Docker container console from user postgres
:
createdb accounts
Then you should activate cstore_fdw extention by command:
echo "CREATE EXTENSION cstore_fdw ; CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;" | psql accounts
Now create table breachcompilation
in database accounts
and activate cstore_fdw extention by command:
echo "CREATE FOREIGN TABLE breachcompilation (
email text,
password text
)
SERVER cstore_server
OPTIONS(compression 'pglz');" | psql accounts
Import Breach Compilation files from container's filesystem to table breachcompilation
by command:
for file in $(find /opt/BreachCompilation -type f)
do
echo "COPY breachcompilation(email, password) FROM PROGRAM 'cat ${file} | grep -E -o \"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}+[\:\;][A-Za-z0-9!@#$%^&*()_+]+$\" | sed \"s/\;\|\:/,/\" | sed \"s/.*,/\L&/\" ' WITH (format csv, delimiter E',');" | psql accounts
done
Breach Compilation is a collection of several data sources, some of which have invalid data. Therefore, during import, you must filter the input stream with some regular expression.
Done!
Enjoy, but remember some limitations caused by cstore_fdw extension: "don't support updating table using DELETE, and UPDATE commands. We also don't support single row inserts."
Useful links:
- https://github.com/citusdata/cstore_fdw - cstore_fdw repository
- http://tech.marksblogg.com/billion-nyc-taxi-rides-postgresql.html - "A Billion Taxi Rides in PostgreSQL"
how can i get the dataset