- I like it as a default database when I don’t know what database I want to use
- I like schemas and I hate schemaless databases
- it’s useful to be able to enforce your schema before the data
gets written to disk
- rather than in your ORM!
- CHECK constraints: ensure balance >= 0
- this is why people don’t like schemas
- in mysql, adding a column to a table locks the table while it adds the column to every row
- zero-downtime schema changes
- all schema changes are transactional
- some schema changes are O(1)
- adding a nullable column without a default
- dropping column
- some trivial column type changes (varchar to text, small int to big int)
- CREATE INDEX CONCURRENTLY
- not strictly concurrent
- some dancing to add a non-nullable column
- add nullable nondefaulted column
- set default on column
- update all rows (takes a while, but doesn’t block writes)
- remove nullable from column
- can use hstore, xml, or json for column types
- hstore since time immemorial
- xml since 8.3
- json since 9.3
- SELECT name FROM articles WHERE data->>’type’ = ‘post’;
- SELECT data->’keywords’->>0 FROM articles;
- this is quite slow because it has to parse the json in every row
- CREATE INDEX articles_data ON articles USING GIN(data);
- or
- CREATE INDEX articles_data ON articles(data->>’type’);
- 0:12:07
- postgres has loads of stored procedure languages
- 0:12:38
- pgSQL, tcl, perl, python (included)
- 0:12:47
- extensions: js, ruby, scheme, java, php
- 0:13:34
- <example javascript stored procedure>
- 0:15:05
- log replication can be used for automated incremental backups
- 0:15:53
- always less than 5 minutes old (though can be configured to be even younger)
- 0:17:41
- automates base backups
- 0:17:58
- saves log segments to S3 or other block stores
- 0:18:13
- compresses (LZO) and encrypts (GPG)
- 0:18:25
- automates segment retrieval
- 0:19:12
- having incremental WAL backups allows point-in-time recovery
- 0:19:36
- eg to right before someone ran an UPDATE without a WHERE
- 0:20:34
- “cold slave” reads log incremental backups
- 0:21:01
- separate setting to allow reads from slaves
- 0:21:12
- cold slave has same guarantees – less than 5 minutes old
- 0:21:22
- can do streaming replication instead
- 0:21:30
- synchronous replication: postgres waits for at least one of the slaves to write to disk before completing a commit (slow unless they’re on the same network)
- 0:23:38
- all the good clustering solutions for postgres are commercial
- 0:24:28
- if you have have a dataset which is huge, postgres is not the right database
- 0:24:42
- if it’s merely large, postgres on a beefy server might be easier in the long run than a distributed solution
- 0:25:10
- geographic querying
- 0:25:18
- PostGIS is great, but it’s a bit weird
- 0:25:24
SELECT AddGeometryColumn('tube_stations', 'geom', 4326, 'POINT', 2);
- 0:26:02
- ^ wtf. this is how you add a geometry-aware column
- 0:27:05
INSERT INTO tube_stations(name, geom) VALUES ('Holborn', ST_GeomFromTest('POINT....))
- 0:27:48
- supports indexes:
- 0:27:48
CREATE INDEX tube_stations_geom_gist ON tube_stations USING GIST (geom);
- 0:28:51
- index enables queries of “what’s in this box”?
- 0:29:23
- PostGIS is also supported by other GIS applications
- 0:30:00
- twitter: @russss
- 0:30:08
- http://russ.garrett.co.uk/talks/postgres-gds
- 0:32:21
- what’s the failover process for synchronous replication?
- 0:32:33
- automating it is hard
- 0:32:52
- once something comes out of replication mode, it enters a different timeline
- 0:33:43
- where does postgres need resources? where would you
spend money on postgres?
- 0:34:05
- a lot of OSS contributors are from companies with money to throw at postgres
- 0:34:30
- I would say clustering, but it’s such a huge thing that I don’t know how you’d go about running that
- 0:38:42
- user model & authentication model: is there an easy
way to explain it? what do you end up doing in a prod
system?
- 0:39:07
- once you have the knack for it, it’s not that hard to comprehend but it is a bit weird
- 0:39:22
- why? it’s at heart a fairly old school unix database
- 0:39:39
- you have to configure the db user and then configure their access
- 0:40:48
- the default configuration in most distros is a bit of a faff
- 0:41:03
- is vacuuming still painful?
- 0:41:10
- don’t let txns run for a long time
- 0:41:17
- they’re especially bad in postgres because it keeps the state of the db as it was at the start of txn
- 0:42:09
- other than that, autovacuuming should take care of everything.