Skip to content

Instantly share code, notes, and snippets.

@begriffs
Last active July 19, 2016 23:41
Show Gist options
  • Select an option

  • Save begriffs/f75c8e2f8fa9b52121f3a9d8dff7beec to your computer and use it in GitHub Desktop.

Select an option

Save begriffs/f75c8e2f8fa9b52121f3a9d8dff7beec to your computer and use it in GitHub Desktop.
Workarounds for subquery in WHERE clause

Problem: WHERE clause containing subquery

Take this query for example.

select c1
  from table_main
 where c2 in (select c2_inner from table_subquery where c1_inner > 300)
   and c3 = 20;

Citus does not support it as written, but there are three workarounds.

Workaround 1. Collect explicit values, use IN query

  • Pros
    • Able to distribute the query across shards
    • Straightforward
  • Cons
    • Constructed query suitable only for subqueries returning few elements
    • Requires building custom SQL

In this technique the client application first runs the subquery directly and collects the results. It serializes them into an explicit list within an IN clause of a new query.

-- First run the subquery
select c2_inner from table_subquery where c1_inner > 300;

-- Expand its results to construct a new query
select c1
  from table_main
 where c2 in (2,3,5,7,11,13,17,19)
   and c3 = 20;

Workaround 2. Create and join temporary tables

  • Pros
    • Works for subqueries returning any amount of results
  • Cons
    • Must transmit full rows from the queries back to the master
    • Requires extra storage space in the database

In this workaround the client will run the outer- and sub-query independently, save their results, and join the two.

-- Capture the outer query results
create temp table main_temp as
select c1, c2
  from table_main
 where c3 = 20;

-- Capture the subquery results
create temp table subquery_temp as
select c2_inner
  from table_subquery
 where c1_inner > 300;

-- Join them
select c1
  from main_temp, subquery_temp
 where c2 = c2_inner;

Workaround 3. PL/pgSQL

  • Pros
    • Temp table proportional in size to results, not full query
    • Able to distribute the query across shards
    • Constructed query executed by function can get long, but it is built in database
  • Cons
    • More complicated with the quoting, construction, and array manipulation
-- create temp_table with results
do language plpgsql $$
  declare c2_array text[];
begin 
  execute 'select c2_inner from table_subquery'
     into c2_array;
  execute format(
    'create temp table results_temp as '
    'select c1'
    '  from table_main'
    ' where c2 = any(array[%s])'
    '   and c3 = 20',
    array_to_string(c2_array, ','));
end;
$$;

-- read results, remove temp table
select * from results_temp;
drop table results_temp;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment