russ garrett talking about postgres

russ garrett on postgres

I like it as a default database when I don’t know what database I want to use

schemas

I like schemas and I hate schemaless databases
it’s useful to be able to enforce your schema before the data gets written to disk
- rather than in your ORM!
- CHECK constraints: ensure balance >= 0

schema changes often aren’t great

this is why people don’t like schemas
in mysql, adding a column to a table locks the table while it adds the column to every row

postgres makes this better

zero-downtime schema changes
all schema changes are transactional
some schema changes are O(1)
- adding a nullable column without a default
- dropping column
- some trivial column type changes (varchar to text, small int to big int)
CREATE INDEX CONCURRENTLY
- not strictly concurrent
some dancing to add a non-nullable column
- add nullable nondefaulted column
- set default on column
- update all rows (takes a while, but doesn’t block writes)
- remove nullable from column

schemaless

can use hstore, xml, or json for column types
- hstore since time immemorial
- xml since 8.3
- json since 9.3

querying json

SELECT name FROM articles WHERE data->>’type’ = ‘post’;
SELECT data->’keywords’->>0 FROM articles;
this is quite slow because it has to parse the json in every row

indexing json fields

CREATE INDEX articles_data ON articles USING GIN(data);
or
CREATE INDEX articles_data ON articles(data->>’type’);

code in the database

0:12:07

postgres has loads of stored procedure languages

0:12:38: pgSQL, tcl, perl, python (included)
0:12:47: extensions: js, ruby, scheme, java, php

0:13:34

backups

0:15:05: log replication can be used for automated incremental backups
0:15:53: always less than 5 minutes old (though can be configured to be even younger)

WAL (write ahead log) archiving using WAL-E

0:17:41

automates base backups

0:17:58

saves log segments to S3 or other block stores

0:18:13

compresses (LZO) and encrypts (GPG)

0:18:25

automates segment retrieval

0:19:12

having incremental WAL backups allows point-in-time recovery

0:19:36: eg to right before someone ran an UPDATE without a WHERE

slaves

0:20:34: “cold slave” reads log incremental backups
0:21:01: separate setting to allow reads from slaves
0:21:12: cold slave has same guarantees – less than 5 minutes old
0:21:22: can do streaming replication instead
0:21:30: synchronous replication: postgres waits for at least one of the slaves to write to disk before completing a commit (slow unless they’re on the same network)

sharding and clustering

0:23:38: all the good clustering solutions for postgres are commercial
0:24:28: if you have have a dataset which is huge, postgres is not the right database
0:24:42: if it’s merely large, postgres on a beefy server might be easier in the long run than a distributed solution

GIS and such

0:25:10

geographic querying

0:25:18

PostGIS is great, but it’s a bit weird

0:25:24

SELECT AddGeometryColumn('tube_stations', 'geom', 4326, 'POINT', 2);

0:26:02: ^ wtf. this is how you add a geometry-aware column

0:27:05

INSERT INTO tube_stations(name, geom) VALUES ('Holborn', ST_GeomFromTest('POINT....))

0:27:48

supports indexes:

0:27:48

CREATE INDEX tube_stations_geom_gist ON tube_stations USING GIST (geom);

0:28:51

index enables queries of “what’s in this box”?

0:29:23

PostGIS is also supported by other GIS applications

credits

0:30:00: twitter: @russss
0:30:08: http://russ.garrett.co.uk/talks/postgres-gds

q&a

0:32:21

what’s the failover process for synchronous replication?

0:32:33: automating it is hard
0:32:52: once something comes out of replication mode, it enters a different timeline

0:33:43

where does postgres need resources? where would you spend money on postgres?

0:34:05: a lot of OSS contributors are from companies with money to throw at postgres
0:34:30: I would say clustering, but it’s such a huge thing that I don’t know how you’d go about running that

0:38:42

user model & authentication model: is there an easy way to explain it? what do you end up doing in a prod system?

0:39:07: once you have the knack for it, it’s not that hard to comprehend but it is a bit weird
0:39:22: why? it’s at heart a fairly old school unix database
0:39:39: you have to configure the db user and then configure their access
0:40:48: the default configuration in most distros is a bit of a faff

0:41:03

is vacuuming still painful?

0:41:10: don’t let txns run for a long time
0:41:17: they’re especially bad in postgres because it keeps the state of the db as it was at the start of txn
0:42:09: other than that, autovacuuming should take care of everything.

philandstuff/russ-garrett.org

Select an option

No results found