We're introducing a new UI that will be feature-flagged. In order to determine that the new UI doesn't depart from the old UI in the wrong ways (e.g. not showing documents, or leaving off steps) we need to compare the new and old UIs when presenting the same jobs We'll select the three most-used recurring jobs from the six most active customers for this purpose.
In environment E, we have two teams, T1 and T2. T1 has the new-UI feature flag on, but T2 doesn't. If T1 and T2 both have all the jobs that we care about, then we can easily compare their views by logging into both teams, but in different clients.
In order to accomplish this, we need to manipulate data in two ways:
-
We need to grab production data from the teams we care about and restore it to an environment we can mess around in
-
We need to ensure all the jobs we care about are on two different teams with divergent feature flags
The easiest thing would be to refresh preprod and do all the work in there. However, we can't be guaranteed a P0 won't occur that will require the free use of preprod to address.
So perhaps a pg_dump/pg_restore of the entire mothership
database is the best option.
It would be nice to only dump/restore the rows that belong to teams we care about.
But this would be very complex.
NOTE: these are reasons for not wanting to COPY within environments, but some still hold
Many team-specific tables (like data_sheet_definitions
, or document_owners
) don't have team_id
columns.
Instead, they're related to teams through other tables (like data_sheet_sets
or documents
) that do have team_id
s.
Following the links is non-trivial.
But it's even hard to duplicate the trees of relations, because it requires remember row ids from newly-created rows and using them in the proper places.
(For example, I might create a new data_sheet_sets
row, and for every "child" of the original, I'd need to create a new copy that uses the new "parent" id.)
This is made more difficult by the fact that:
- There are many tables for which this is true
- In the case of
steps
, the table will get quite large - Also in the case of
steps
, the entanglements are many-to-many and tricky for some of us to remember ;)
Once the data has been restored to its new environment, we should steps to sanitize the data and make it more helpful to us.
First, we should add an extra _parsable
to the local part and to the domain of every email address.
That way, we won't wind up emailing any users.
We should also set every password to the the hash of something like Pass1234!
so that every account is available.
We'd also need to either copy over all the documents in S3 (about 56K), or give the development environment access to production S3.
I recommend we accomplish this by creating another environment and re-dumping (and sanitizing, etc) the data into that environment as well. In one of the two environments with duplicate data, we can set the feature flag on for all teams, and leave it off in the other.
It might seem easiest to work in the same environment and create new teams for the jobs we wish to check on. But for the same reasons as we considered it difficult to find only job-related rows in the pg_dump/pg_restore above, we'll find it more difficult to copy.
<Also, don't forget that documents in S3 are keyed on their mothership db id....>