At this link <omitted> is a tab-delimited text file containing a list of members of the 114th Congress. Use these names for the following exercise. Import the list of names into a database - Oracle, SQL Server, or HANA. (PostgreSQL was used instead due to its similarity to the selections given. The other options were not readily available.)
Create a list of all possible three-letter combinations (“trigrams”), from ‘AAA’ to ‘ZZZ’, in order, with an ID column. ‘AAA’ gets ID = 1.
Parse each last name into a set of trigrams; e.g., “Brandt” contains BRA, RAN, AND, and NDT. Determine the five most common trigrams in the Congressional roster. For each last name, sum the ID values of its trigrams and determine the last names with the five highest scores.