Skip to content

Instantly share code, notes, and snippets.

@michellemho
Last active October 6, 2017 14:49
Show Gist options
  • Save michellemho/f65009e292dea4574500a3360da5be00 to your computer and use it in GitHub Desktop.
Save michellemho/f65009e292dea4574500a3360da5be00 to your computer and use it in GitHub Desktop.
Get Competitors Analysis

This analysis will allow a user to find competitors of a selected company and style a map based on the number of competitors found in each area. Two datasets are required:

  1. A dataset (without geometries) of a list of retailers and their industry. Retailers in the same industry are considered competitors in this analysis.
  2. A dataset (with geometries) with a string column containing JSONs with key-value pairs for all the retailers and number of their stores found in that geometry.

Two custom SQL functions are needed:

DEP_EXT_sum_competitors and DEP_EXT_get_competitors

Steps:

  1. In a new CARTO map, add a layer with the retailer industry dataset. Nothing should appear on the map since there are no geometries.
  2. Add a category widget, aggregated by the retailer name. The widget should not behave dynamically.
  3. Add a custom SQL analysis node to this layer, using the DEP_EXT_get_competitors function
  4. Fill in the parameter fields according to the function documentation
  5. Drag out the custom SQL analysis to a separate layer and name it "competitors"
  6. Add another category widget, again aggregated by retailer name. The widget should also not behave dynamically. Name the widget "competitors".
  7. In the original layer, add another custom SQL using the DEP_EXT_sum_competitors function
  8. Fill in the parameter fields according to the function documentation
  9. When the analysis has finished running, style the analysis by selecting a color ramp on the newly added "competitors" column
  10. Add pop-ups, suffixes, change names, titles, add legends as necessary

Map

Demo

/*
DEP_EXT_get_competitors
Return a table of competitors in the same industry of a selected retailer in the underlying source query
Example: A user has a dataset of retail brands and the industry. In the underlying source query (via a widget),
a single retailer is selected (ie. Dunkin Donuts). Then, this function will return a table of the competing retailers
in the coffee industry (ie. Starbucks, Peets, etc.).
If more than one retailer is selected, then all competitors will be returned.
So if Dunkin Donuts AND Stop&Shop are selected in the widget, then Starbucks, Peets, Wegmans, Whole Foods, etc. will be
returned in the table. If two retailers in the same industry are selected, the competitors are returned in the table
without the two retailers.
Inputs managed by CARTO, common to all DEP_EXT functions:
operation text: 'create' or 'populate'
table_name text: the name of the previous node table
primary_source_query text: the query on the previous node table
primary_source_columns text: the columns of the previous node table
User input:
retailer_tablename text: the name of the table containing retailer and industry information
industry_column text: the name of the column containing industry (which indicates competitors)
dummy integer: input any integer and change in order to override cached table
*/
CREATE OR REPLACE FUNCTION DEP_EXT_get_competitors(
operation text,
table_name text,
primary_source_query text,
primary_source_columns text[],
retailer_tablename text,
industry_column text,
dummy integer
)
RETURNS VOID AS $$
DECLARE
selected_columns text;
BEGIN
-- grab the source columns
selected_columns := array_to_string(primary_source_columns, ',');
IF operation = 'create' THEN
EXECUTE 'DROP TABLE IF EXISTS ' || table_name;
-- the competitor table will have exactly the same schema as the retailer table
EXECUTE
'CREATE TABLE '
|| table_name
|| ' AS SELECT ' || selected_columns
|| ' FROM (' || primary_source_query || ') _source'
|| ' LIMIT 0';
ELSEIF operation = 'populate' THEN
-- let's get the competitors in the same industry
EXECUTE format( $QUERY$
WITH selected AS (%s),
result AS (
SELECT * FROM %I
WHERE %s in (SELECT %s FROM selected)
)
INSERT INTO %I SELECT * FROM result
EXCEPT (SELECT * from selected)
$QUERY$,
primary_source_query,
retailer_tablename,
industry_column,
industry_column,
table_name
);
END IF;
END;
$$ LANGUAGE plpgsql;
-- tests ----------
-- try with two selections
select * from DEP_EXT_get_competitors(
'create',
'temptable_1',
'select * from retailers_sheet1 where company in (''Costco'', ''The Home Depot'')',
ARRAY['cartodb_id', 'the_geom', 'the_geom_webmercator', 'rank', 'company', '_2016_retail_sales_000', 'headquarters_u_s_headquarters', '_2016_stores', 'industry'],
'retailers_sheet1',
'industry',
2
);
select * from DEP_EXT_get_competitors(
'populate',
'temptable_1',
'select * from retailers_sheet1 where company in (''Costco'', ''The Home Depot'')',
ARRAY['cartodb_id', 'the_geom', 'the_geom_webmercator', 'rank', 'company', '_2016_retail_sales_000', 'headquarters_u_s_headquarters', '_2016_stores', 'industry'],
'retailers_sheet1',
'industry',
2
);
/*
DEP_EXT_sum_competitors
Return a table with two additional columns: one for the selected retailer and one for the sum of competitors
Example: A user has two datasets: a dataset of retailers and their industry (indicated competitors),
and a geometry dataset with a JSON column of retailers.
In the underlying source query (via a widget),
a single retailer is selected from the simple retailer table dataset (ie. Dunkin Donuts).
This function will return the geometry table with two additional columns of data pulled from the JSON column:
one column of the total competing retailers in the same industry (ie. Starbucks, Peets, etc.)
and one column of the number of Dunkin Donuts found in each count.
If more than one retailer is selected, this function will only reflect the first retailer.
Inputs managed by CARTO, common to all DEP_EXT functions:
operation text: 'create' or 'populate'
table_name text: the name of the previous node table
primary_source_query text: the query on the previous node table
primary_source_columns text: the columns of the previous node table
User input:
retailer_tablename text: the name of the table containing retailer and industry information
industry_column text: the name of the column containing industry type in the retailer dataset
json_table text: the name of the geometry dataset containing retailer information in JSON column
json_column text: the name of the column containing the JSON objects
key_field text: name of the column containing the key field in the underlying node (ie. 'company_name')
dummy integer: input any integer and change in order to override cached table
*/
CREATE OR REPLACE FUNCTION DEP_EXT_sum_competitors(
operation text,
table_name text,
primary_source_query text,
primary_source_columns text[],
retailer_tablename text,
industry_column text,
json_table text,
json_column text,
key_field text,
dummy integer
)
RETURNS VOID AS $$
DECLARE
selected_retailer text;
selected_columns text;
selected_keys text[];
company text;
query_string text;
initial_query_string text;
BEGIN
-- grab all the columns from the json table
EXECUTE 'SELECT array_to_string(ARRAY('
'SELECT column_name::text from INFORMATION_SCHEMA.COLUMNS '
'WHERE table_name = '''||json_table||'''),'','')'
INTO selected_columns;
-- grab the FIRST selected retailer...
EXECUTE
'WITH CTE as ('||primary_source_query||
') SELECT array_to_string(ARRAY(SELECT '||key_field||
' from CTE LIMIT 1), '','')' INTO selected_retailer;
IF operation = 'create' THEN
EXECUTE 'DROP TABLE IF EXISTS ' || table_name;
-- the returned table will have the same schema as the geometry table plus ONE column
EXECUTE
'CREATE TABLE '
|| table_name
|| ' AS SELECT ' || selected_columns
|| ', null::numeric AS competitors'
|| ', null::numeric AS selected_retailer'
|| ' FROM (SELECT * FROM ' || json_table || ') _source'
|| ' LIMIT 0';
ELSEIF operation = 'populate' THEN
-- grab the keys (retailers)
EXECUTE format( $QUERY$
WITH selected AS (%s),
competitors AS (
SELECT * FROM %I
WHERE %s in (SELECT %s FROM selected))
SELECT ARRAY(SELECT %s from competitors
EXCEPT (SELECT %s from selected))
$QUERY$,
primary_source_query,
retailer_tablename,
industry_column,
industry_column,
key_field,
key_field)
INTO selected_keys;
-- build the table
-- initialize first query_string
query_string := '('||json_column||'::json->>$a$'||selected_retailer||'$a$)::numeric';
initial_query_string := query_string;
-- for competitor in selected_keys:
-- continue to build query string
FOREACH company IN ARRAY selected_keys
LOOP
query_string := query_string || ' + (' || json_column || '::json->>$a$'||company||'$a$)::numeric';
END LOOP;
EXECUTE format( $QUERY$
WITH result AS (
SELECT %s, (%s - %s)::numeric as competitors, %s as selected FROM %I
)
INSERT INTO %I SELECT * FROM result
$QUERY$,
selected_columns,
query_string,
initial_query_string,
initial_query_string,
json_table,
table_name
);
END IF;
END;
$$ LANGUAGE plpgsql;
-- tests ----------
-- try with two selections
select * from DEP_EXT_sum_competitors(
'create',
'temptable_1',
'select * from retailers_sheet1 where company in (''Dollar General'')',
ARRAY['cartodb_id', 'the_geom', 'the_geom_webmercator', 'rank', 'company', '_2016_retail_sales_000', 'headquarters_u_s_headquarters', '_2016_stores', 'industry'],
'retailers_sheet1',
'industry',
'county_company_counts',
'company_count',
'company',
2
);
select * from DEP_EXT_sum_competitors(
'populate',
'temptable_1',
'select * from retailers_sheet1 where company in (''Target'')',
ARRAY['cartodb_id', 'the_geom', 'the_geom_webmercator', 'rank', 'company', '_2016_retail_sales_000', 'headquarters_u_s_headquarters', '_2016_stores', 'industry'],
'retailers_sheet1',
'industry',
'county_company_counts',
'company_count',
'company',
2
);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment