Take this query for example.
select c1
from table_main
where c2 in (select c2_inner from table_subquery where c1_inner > 300)
and c3 = 20;Citus does not support it as written, but there are three workarounds.
- Pros
- Able to distribute the query across shards
- Straightforward
- Cons
- Constructed query suitable only for subqueries returning few elements
- Requires building custom SQL
In this technique the client application first runs the subquery directly and collects the results. It serializes them into an explicit list within an IN clause of a new query.
-- First run the subquery
select c2_inner from table_subquery where c1_inner > 300;
-- Expand its results to construct a new query
select c1
from table_main
where c2 in (2,3,5,7,11,13,17,19)
and c3 = 20;- Pros
- Works for subqueries returning any amount of results
- Cons
- Must transmit full rows from the queries back to the master
- Requires extra storage space in the database
In this workaround the client will run the outer- and sub-query independently, save their results, and join the two.
-- Capture the outer query results
create temp table main_temp as
select c1, c2
from table_main
where c3 = 20;
-- Capture the subquery results
create temp table subquery_temp as
select c2_inner
from table_subquery
where c1_inner > 300;
-- Join them
select c1
from main_temp, subquery_temp
where c2 = c2_inner;- Pros
- Temp table proportional in size to results, not full query
- Able to distribute the query across shards
- Constructed query executed by function can get long, but it is built in database
- Cons
- More complicated with the quoting, construction, and array manipulation
-- create temp_table with results
do language plpgsql $$
declare c2_array text[];
begin
execute 'select c2_inner from table_subquery'
into c2_array;
execute format(
'create temp table results_temp as '
'select c1'
' from table_main'
' where c2 = any(array[%s])'
' and c3 = 20',
array_to_string(c2_array, ','));
end;
$$;
-- read results, remove temp table
select * from results_temp;
drop table results_temp;