-
Create two Postgres clusters, one on 5432 port (
node1) and one on 5433 (node2). Compile and installpostgres_fdwextension on both. -
Set
enable_partitionwise_joinandenable_partitionwise_aggregatetoon. -
Set
postgres_fdw.use_remote_estimatetotrue. -
On
node2:
\i init2.sql- On
node1:
\i init1.sql
\i load.sql- Run
ANALYSEon bothnode1andnode2.
Or simply run setup.sh script if all Postgres binaries are present in the PATH.
Then check some queries from the queries.sql.
At the time of 30.06.2020 PostgreSQL cannot efectively push-down queries to the specific node, when postgres_fdw + native partitioning and a filter by sharding key are used, e.g.:
postgres_node1=# EXPLAIN (
ANALYZE,
VERBOSE
)
SELECT
*
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=200.00..248.36 rows=1 width=100) (actual time=3.024..688.854 rows=20235 loops=1)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text, users.company_id, users.id, users.created_at, users.name
Join Filter: (documents.user_id = users.id)
Rows Removed by Join Filter: 2003265
-> Foreign Scan on public.users_node2 users (cost=100.00..124.33 rows=6 width=48) (actual time=1.280..1.697 rows=100 loops=1)
Output: users.company_id, users.id, users.created_at, users.name
Remote SQL: SELECT company_id, id, created_at, name FROM public.users_node2 WHERE ((company_id = 5))
-> Materialize (cost=100.00..123.59 rows=5 width=52) (actual time=0.017..3.366 rows=20235 loops=100)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text
-> Foreign Scan on public.documents_node2 documents (cost=100.00..123.56 rows=5 width=52) (actual time=1.668..131.552 rows=20235 loops=1)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text
Remote SQL: SELECT company_id, id, user_id, created_at, text FROM public.documents_node2 WHERE ((company_id = 5))
Planning Time: 0.378 ms
Execution Time: 692.072 msIt uses two foreign scans and performs a final join locally. However, once one specifies a required partition in the query it is immediately pushed-down to the required node:
postgres_node1=# EXPLAIN (
ANALYZE,
VERBOSE
)
SELECT
*
FROM
documents_node2
INNER JOIN users_node2 ON documents_node2.user_id = users_node2.id
WHERE
documents_node2.company_id = 5
AND users_node2.company_id = 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..147.92 rows=1 width=100) (actual time=2.973..204.586 rows=20235 loops=1)
Output: documents_node2.company_id, documents_node2.id, documents_node2.user_id, documents_node2.created_at, documents_node2.text, users_node2.company_id, users_node2.id, users_node2.created_at, users_node2.name
Relations: (public.documents_node2) INNER JOIN (public.users_node2)
Remote SQL: SELECT r1.company_id, r1.id, r1.user_id, r1.created_at, r1.text, r2.company_id, r2.id, r2.created_at, r2.name FROM (public.documents_node2 r1 INNER JOIN public.users_node2 r2 ON (((r1.user_id = r2.id)) AND ((r2.company_id = 5)) AND ((r1.company_id = 5))))
Planning Time: 0.217 ms
Execution Time: 206.470 msYou can see that execution time has dropped significantly — by more than 3 times.