Hello all, I'm in the process of learning Postgres and I find it very hard to find a resource where postgres is explained in terms of developer. Since I'm in the process of reading and applying it in my side project; I treat this gist as my knowledge sharing place on Postgres and its features. Each sub-heading tries to explain the concepts of Postgres with simple example. I may be wrong at times here, if so please free to comment.
All the codes are written and run on psql
I would be adding many stuffs as I learn them. There is no particular order in which I add the contents.
I'm using the following postgres version:
SELECT version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.2 on x86_64-apple-darwin16.4.0, compiled by Apple LLVM version 8.0.0 (clang-800.0.42.1), 64-bit
(1 row)
In postgres, we can do typecasting using the shorthand syntax like:
column_name :: <your constant>
the above will type cast your constant
to column_name data type, for example:
date : 'start'
here start
contains 2010-02-02
(a string) and date
data type is date
. Hence start
would be typecasted to date
. We
can think this as an postgres infering system.
In postgres, one could use \d+
or \dt+
for getting information on database or tables respectively. To print all the tables in a schema, you can use \dt <schema_name>.
note that .
is important here.
export the editor of your choice like:
export EDITOR='sublime -w'
where sublime
is in PATH. Once there, you can open up the psql command prompt. Then enter \e
. This will open up the file for
you, write the query and save the file. While you exit the file, the query here runs on psql. If you want to see what you have
written on the editor, then run \p
on your command prompt. That will show the latest file.
Postgres has a very special type of join called lateral join
. Lets see what it can do. Before we do that, lets define us a problem statement:
Imagine we have the following tables users
and message
:
Table "tweet.users"
Column | Type | Modifiers | Storage | Stats target | Description
----------+--------+--------------------------------------------------------+----------+--------------+-------------
userid | bigint | not null default nextval('users_userid_seq'::regclass) | plain | |
uname | text | not null | extended | |
nickname | text | | extended | |
bio | text | | extended | |
picture | text | | extended | |
Table "tweet.message"
Column | Type | Modifiers |
-----------+--------------------------+-------------------------------------------------------------+-
messageid | bigint | not null default nextval | plain | |
userid | bigint | not null | plain | |
datetime | timestamp with time zone | not null default now() | plain | |
message | text | not null | extended | |
favs | bigint | | plain | |
rts | bigint | | plain | |
location | point | | plain | |
lang | text | | extended | |
url | text | | extended | |
Our job is to get the last two messages for all the users. Let's see a quick solution without lateral join:
SELECT m.datetime,m.message,u.uname
FROM (
SELECT * FROM (
SELECT row_number() OVER (
PARTITION BY userid ORDER BY datetime DESC
) as pos,
datetime,message,userid
FROM tweet.message order by tweet.message.datetime DESC
) AS m1
WHERE m1.pos <= 2
) AS m
INNER JOIN tweet.users as u
using(userid)
order by u.uname;
The reason we have created a sub-query with pos
column is that using joins in postgres, we can't refer the outside table column names! However we got our answer. Now, lets see the same using lateral
joins:
select m.datetime,m.message,u.uname from tweet.users u
cross join lateral
(
select * from tweet.message
where tweet.message.userid = u.userid
order by tweet.message.datetime DESC
limit 2
) m
order by u.uname;
There are couple of benefits using lateral joins here:
- We can able to refer the another tables column in our joining process.
- The SQL code is much more elegant and simple to read.
- Performance benefits.
Now in the previous section, we saw how to achieve good results with lateral join. If you are like me, who likes to treat sql as a code, lets put out the sql code in a stored procedure like:
create or replace function get_last_2_messages
(
out datetime timestamp with time zone,
out message text,
out uname text
)
returns setof record
language sql
as $$
select m.datetime,m.message,u.uname from tweet.users u
cross join lateral
(
select * from tweet.message
where tweet.message.userid = u.userid
order by tweet.message.datetime DESC
limit 2
) m
order by u.uname;
$$;
Now with get_last_2_messages
in place one can find out the last 2 message for a given user, something like this:
select * from get_last_2_messages() where uname = 'Helena';
Wow, thats good. Treating sql as code and putting it in a stored procedure or function is really good practice. Because we are heavy lifting the process of data processing into the database. Imagine, we are achieving the same in application code, by getting each user, then their messages, then calculate their time etc, whoff! That's lot of work, isn't? Pushing those works to database is a right choice to make. This will also reduces the network latency required for your application to send the queries over the network.
Psql seems to be really a great tool, I find the below settings to be super cool when you are playing with it:
\pset null '¤'
the above command will display ¤
in case of null
columns. This would be really helpful, if your dealing with empty and null columns in a single table, a must in your config.
\set ON_ERROR_STOP on
\set ON_ERROR_ROLLBACK interactive
Allows you to stop on the error and inspect it. For example using ON_ERROR_ROLLBACK
to interactive
we can do the following:
select 1/0
ERROR: division by zero
now if you are running this inside a transaction, then there would be problem, however since ON_ERROR_ROLLBACK
is set to interactive
, we can now run SQL commands and even commit your work. The last one I want to show is the following:
\set PROMPT1 '%~%x%# '
this will be helpful when running transactions. For example, when you run BEGIN
the command prompt would like:
~*>
which clearly indicates you are on a transaction. Once you do the commit, it will turn to ~
.
Also, there are few commands like \d
etc in psql. The interesting part is, these commands are also a SQL query. You can think these commands as alias in our linux system. In order to see those sql for each commands, we can turn on the following:
\set ECHO_HIDDEN true
now run \d
:
********* QUERY **********
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','v','m','S','f','')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************
Its a good exercise to see and understand these sql queries behind our little commands.