Skip to content

Instantly share code, notes, and snippets.

@sheac
Created May 5, 2018 00:35
Show Gist options
  • Save sheac/4608e8286467dd31d86a71300ff811f6 to your computer and use it in GitHub Desktop.
Save sheac/4608e8286467dd31d86a71300ff811f6 to your computer and use it in GitHub Desktop.

Goal: Side-by-side comparison of the same jobs, but two different views

We're introducing a new UI that will be feature-flagged. In order to determine that the new UI doesn't depart from the old UI in the wrong ways (e.g. not showing documents, or leaving off steps) we need to compare the new and old UIs when presenting the same jobs We'll select the three most-used recurring jobs from the six most active customers for this purpose.

Proposal: Two teams with identical jobs in a non-production environment

In environment E, we have two teams, T1 and T2. T1 has the new-UI feature flag on, but T2 doesn't. If T1 and T2 both have all the jobs that we care about, then we can easily compare their views by logging into both teams, but in different clients.

In order to accomplish this, we need to manipulate data in two ways:

  1. We need to grab production data from the teams we care about and restore it to an environment we can mess around in

  2. We need to ensure all the jobs we care about are on two different teams with divergent feature flags

Restoring production data to a development environment

The easiest thing would be to refresh preprod and do all the work in there. However, we can't be guaranteed a P0 won't occur that will require the free use of preprod to address.

So perhaps a pg_dump/pg_restore of the entire mothership database is the best option. It would be nice to only dump/restore the rows that belong to teams we care about. But this would be very complex.

NOTE: these are reasons for not wanting to COPY within environments, but some still hold

Many team-specific tables (like data_sheet_definitions, or document_owners) don't have team_id columns. Instead, they're related to teams through other tables (like data_sheet_sets or documents) that do have team_ids. Following the links is non-trivial. But it's even hard to duplicate the trees of relations, because it requires remember row ids from newly-created rows and using them in the proper places. (For example, I might create a new data_sheet_sets row, and for every "child" of the original, I'd need to create a new copy that uses the new "parent" id.) This is made more difficult by the fact that:

  1. There are many tables for which this is true
  2. In the case of steps, the table will get quite large
  3. Also in the case of steps, the entanglements are many-to-many and tricky for some of us to remember ;)

Once the data has been restored to its new environment, we should steps to sanitize the data and make it more helpful to us. First, we should add an extra _parsable to the local part and to the domain of every email address. That way, we won't wind up emailing any users. We should also set every password to the the hash of something like Pass1234! so that every account is available.

We'd also need to either copy over all the documents in S3 (about 56K), or give the development environment access to production S3.

Adding jobs to two teams with divergent feature flags

I recommend we accomplish this by creating another environment and re-dumping (and sanitizing, etc) the data into that environment as well. In one of the two environments with duplicate data, we can set the feature flag on for all teams, and leave it off in the other.

It might seem easiest to work in the same environment and create new teams for the jobs we wish to check on. But for the same reasons as we considered it difficult to find only job-related rows in the pg_dump/pg_restore above, we'll find it more difficult to copy.

<Also, don't forget that documents in S3 are keyed on their mothership db id....>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment