Two tables share a user_id
, a tstzrange
called period
, then each have another column called category
and place_id
respectively.
Both have constraints allowing a user to only have one 'other column' value at any time:
user_id | integer | not null
period | tstzrange | not null
category | character varying(16) |
"category_events_user_id_period_excl" EXCLUDE USING gist (user_id WITH =, period WITH &&)
user_id | integer | not null
period | tstzrange | not null
place_id | integer |
"dwell_events_user_id_period_excl" EXCLUDE USING gist (user_id WITH =, period WITH &&)
I can use union to join them to merge both tables for a user, thus:
SELECT period, place_id AS place , '' AS category
FROM dwell_events WHERE user_id = 14845
UNION SELECT period, null, category
FROM category_events WHERE user_id = 14845;
period | place | category
-----------------------------------------------------+-------+----------
["2014-06-10 22:43:26+00","2014-06-10 23:04:43+00"] | 1022 |
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 |
["2014-06-10 22:43:26+00","2014-06-10 23:04:43+00"] | | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 |
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | | social
Next I added group and a dubious use of MAX
to merge and sort them, thus:
SELECT period, MAX(place) AS place, MAX(category) AS category FROM (
SELECT user_id, period, place_id AS place , '' AS category
FROM dwell_events
UNION SELECT user_id, period, null, category
FROM category_events
) t
WHERE user_id = 14845
GROUP BY period
ORDER BY period;
-----------------------------------------------------+-------+----------
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 | social
["2014-06-10 22:43:26+00","2014-06-10 23:04:43+00"] | 1022 | work
Unfortunately it's not always the case that the periods in the two tables will line up so well.
In this next example the later work
category row has been deleted, and the preceding social
one has been expanded in to its range.
You can see that my current naive implementation starts to fall over:
-----------------------------------------------------+------+--------
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 |
["2014-06-07 23:44:41+00","2014-06-10 23:04:43+00"] | | social
["2014-06-10 22:43:26+00","2014-06-10 23:04:43+00"] | 1022 |
In this situation I'd like the nulls to be filled in with the corresponding overlapping range value, thus:
-----------------------------------------------------+------+--------
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 | social
["2014-06-10 22:43:26+00","2014-06-10 23:04:43+00"] | 1022 | social
And this is where I'm stuck!
In some cases we may also not have a complete matching range available in either table.
For example here I have reduced the length of the range for the latter place 1022
row by 4 minutes, producing this output with my current naive query:
-----------------------------------------------------+------+--------
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 |
["2014-06-07 23:44:41+00","2014-06-10 23:04:43+00"] | | social
["2014-06-10 22:43:26+00","2014-06-10 23:00:00+00"] | 1022 |
In this case I'd like to minimize the duration of the range which contain a null.
So we see that we have a consolidated range during which 1022
and social
is true, then we have a row with a null for the place for the remaining 4 minutes (but see social
).
-----------------------------------------------------+------+--------
["2014-06-04 18:14:37+00","2014-06-04 18:32:02+00"] | 1022 | work
["2014-06-07 23:44:41+00","2014-06-08 00:00:14+00"] | 3381 | social
["2014-06-10 22:43:26+00","2014-06-10 23:00:00+00"] | 1022 | social
["2014-06-07 23:00:00+00","2014-06-10 23:04:43+00"] | | social
There should never be a case when both place_id
and category
are null.