Skip to content

Instantly share code, notes, and snippets.

View jmindek's full-sized avatar

Jerry Mindek jmindek

View GitHub Profile
@jmindek
jmindek / fixing-gazetteer-formatting-with-clojure.clj
Last active August 29, 2015 13:57
Fixing US Census Gazetteer file formatting with Clojure
;; The US Census data provides a lot of interesting data.
;; Currently, I am interested in latitudes and longitudes for cities.
;; The Gazetteer file provides that for cities in the US.
;; You can grab the 2010 data here - http://www.census.gov/geo/maps-data/data/gazetteer2010.html
;; At first glance the contents of the file looks well formatted.
;; However, there are some minor formatting issues that will cause problems when importing this data into a data store.
;; Here are two small Clojure functions to fix those issues and output a well formatted file.
@jmindek
jmindek / gist:62c50dd766556b7b16d6
Last active January 31, 2024 15:48
DISTINCT ON like functionality for Redshift

distinct column -> For each row returned, return only the unique members of a set. Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.

test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
@jmindek
jmindek / generate_sql_with_col_names.sql
Created March 28, 2017 18:25
Using Redshift table column names for use in generated SQL
/*
* Have tried to generate SQL statements by first retrieving table column names
* from Redshift pg_* tables?
*
* You probably give up due to errors regarding invalid types.
*
* Here is my suggestion for accomplishing this.
*
* First, use the create table statement below to create a temp table with the col_names for our table of interest.
* We must create a table otherwise we get an error that listagg must be used on a user-created table.