Skip to content

Instantly share code, notes, and snippets.

@rainbowdash12345
Last active November 17, 2016 13:45
Show Gist options
  • Save rainbowdash12345/c188043f82b9123781400fb42026f307 to your computer and use it in GitHub Desktop.
Save rainbowdash12345/c188043f82b9123781400fb42026f307 to your computer and use it in GitHub Desktop.
Process for cleaning a db of comments and separating out the email addresses

This process is for creating a list of leads in order to seed a new Facebook audience. Our goal was to identify a subset of leads that would be more likely to purchase our books.

We ran a contest using SumoMe, and once individuals signed up for our email list, they were asked a simple question "What is your favorite author?" Those who answered with a non-sequitor or who just said "can't wait to win your contest" should be filtered out.

We received over 900 entries and around 150 emails, plus we plan to run these contests every month, so it's imperative to have a great process to do this quickly!

How To:

  1. Copy the "feedback" out of word press and paste it into a Google Docs spreadsheet, page by page (feedback items, like comments are listed 10 per page.

  2. Skim through the comments and delete rows that don't meet our criteria (answering the question asked)

  3. Remove duplicates // In Google Sheets, highlight the column B // Go to Format --> Conditional Formatting // Apply a "countif" formula (ref: http://stackoverflow.com/questions/21899516/highlight-cell-if-value-duplicate-in-same-column-for-google-spreadsheet) // In Excel, go to Data --> Remove Duplicates

  4. Add a new sheet and use an array formula to split the cells in column B by line break (ref: https://productforums.google.com/forum/#!topic/docs/6MO1M585WQs)

  5. Copy all the data and paste values. Delete extraneous column.

  6. Filter remaining column for all cells containing @.

  7. Capture the name next to the email by applying a formula to the right of the email address. If email is in B3, forumla in C3 = B2

  8. Unfilter, copy and paste all values. Filter column C on empty cells (or column B on cells that Do Not Contain @), delete rows that show up after filtering.

  9. Unfilter and sort data A-z.

  10. Select column C and convert Text to Columns, using space as the deliminator. This will give you a column of First names and one of middle initials or last names. Clean up the remaining data.

VOILA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment