Skip to content

Instantly share code, notes, and snippets.

@bruth
Created November 14, 2013 14:30
Show Gist options
  • Save bruth/7467784 to your computer and use it in GitHub Desktop.
Save bruth/7467784 to your computer and use it in GitHub Desktop.
UNION ALL + semantic graph idea

Notes

  • UNION removes duplicates from the set while UNION ALL does not which makes it much faster
    • Since this technique is being used for data integration, the UNION ALL is the
  • The number of columns must be the same for each select statement
    • A trick to fill in columns is to select a constant value such as null or ''

Example

table1.sex, table2.gender, and table3.gender are semantically similar.

Table 1

1 | male
2 | female
3 | male
4 | male

Table2

1 | Male
2 | Male
3 | Female

Table3

1 | f
2 | m
3 | f
4 | f

Query

SELECT id, sex FROM table1 WHERE lower(sex) like 'm%'
UNION ALL
SELECT id, gender FROM table2 WHERE lower(gender) like 'm%'
UNION ALL
SELECT id, gender FROM table3 WHERE lower(gender) like 'm%'

Result

1 | male
3 | male
4 | male
1 | Male
2 | Male
2 | m

Discussion

  • The first SELECT can be thought of as the canonical one. It's column names are used to label the resulting table, i.e. id and sex.
  • The queries can be independently construct and concatentated prior to the query being executed.

Implementation

  • A graph database can be used to keep the relationship between fields
    • table1.sex relto table2.gender
    • table1.sex relto table3.gender
  • Query graph database to find all field realted to canonical
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment