Often we need to change the way a piece of data is modelled on our backend. Recently I was faced with refactoring a large model into component pieces and needed a cheat sheet for the best practices to use when migrating live data on our production server to a new schema.
This is an explanation of how to perform an extractClass refactoring from within the context of a Django models.Model. For this tutorial, I'll assume you are familiar with schemamigration using South.
The basic purpose of this type of refactoring is usually one of the following:
- to DRY up your data model, such that a similar schema can be reused in multiple places
- to implement a One to Many relationship on data that is currently One to One, and tightly coupled to an existing Model
Let's say we have a model:
from django.db import models
class FootballMatch(models.Model):
home_team_name = models.CharField(max_length=128)
home_team_coach = models.CharField(max_length=128)
home_team_city = models.CharField(max_length=128)
away_team_name = models.CharField(max_length=128)
away_team_coach = models.CharField(max_length=128)
away_team_city = models.CharField(max_length=128)
But we decide to change the schema for our team, and add a team mascot
. Thus we want to convert our code to:
class FootballMatch(models.Model):
home_team_name = models.CharField(max_length=128)
home_team_coach = models.CharField(max_length=128)
home_team_city = models.CharField(max_length=128)
home_team_mascot = models.CharField(max_length=128)
away_team_name = models.CharField(max_length=128)
away_team_coach = models.CharField(max_length=128)
away_team_mascot = models.CharField(max_length=128)
But now we're starting to notice a lot of redundancy, so we decide to perform an extractClass refactoring. We hope to end up with this:
class FootballTeam(models.Model):
name = models.CharField(unique=True, max_length=128)
coach = models.CharField(max_length=128)
city = models.CharField(max_length=128)
mascot = models.CharField(max_length=128)
class FootballMatch(models.Model):
home_team = models.ForeignKey(FootballTeam, related_name='home_matches')
away_team = models.ForeignKey(FootballTeam, related_name='away_matches')
We punch this in to our models.py and because we're familiar with South, we run python manage.py schemamigration app_name --auto
but, alas, we are greeted by the following message:
(env)[~/src/football]$ python manage.py schemamigration football --auto
+ Added model football.FootballTeam
? The field 'FootballMatch.home_team_city' does not have a default specified, yet is NOT NULL.
? Since you are removing this field, you MUST specify a default
? value to use for existing rows. Would you like to:
? 1. Quit now, and add a default to the field in models.py
? 2. Specify a one-off value to use for existing columns now
? 3. Disable the backwards migration by raising an exception.
? Please select a choice:
And, it seems that none of these choices are adequate for our purpose. What we really want is a data migration. We don't want to lose our data during the migration of our schema. Our schema migration does not understand that we'd like to move our team information into the FootballTeam model from the FootballMatch model. So, we need to teach it.
The procedure I'm about to describe will follow a pattern you can apply to other data migrations. It is a three step process.
- Create an expanded schema which encompasses the new model
- Migrate the existing data to the new model
- Eliminate the old data model elements
Let's back up to our original model, and create our expanded schema.
class FootballTeam(models.Model):
name = models.CharField(key=True, max_length=128)
coach = models.CharField(max_length=128)
city = models.CharField(max_length=128)
mascot = models.CharField(max_length=128)
class FootballMatch(models.Model):
home_team_name = models.CharField(max_length=128)
home_team_coach = models.CharField(max_length=128)
home_team_city = models.CharField(max_length=128)
away_team_name = models.CharField(max_length=128)
away_team_coach = models.CharField(max_length=128)
away_team_city = models.CharField(max_length=128)
home_team = models.ForeignKey(FootballTeam, related_name='home_matches', blank=True, null=True)
away_team = models.ForeignKey(FootballTeam, related_name='away_matches', blank=True, null=True)
And run our first schema migration.
(env)[~/src/football]$ python manage.py schemamigration football --auto
...
(env)[~/src/football]$ python manage.py migrate
...
If you look closely at the new intermediate schema, you'll notice that I made 'name' a key in the new FootballTeam model. This was a choice based on the semantic meaning of this refactoring. For this demonstration, I'm assuming that all home_team_*
tuples will be identical, and the same for away team. The rationale here is that I don't want duplicate teams in my FootballTeam table.
Let's write a data migration that will migrate the home_team_*
and away_team_*
data into associated FootballTeams.
(env)[~/src/football]$ python manage.py datamigration football make_teams
Created 0003_make_teams.py.
Now, in order to construct our forward migration, let's alter this new migration file.
Replace the forwards
and backwards
routines in your data-migration script (mine is called 0003_make_teams.py
) as follows:
class Migration(DataMigration):
def forwards(self, orm):
for match in orm.FootballMatch.objects.all():
team, created = orm.FootballTeam.objects.get_or_create(name=match.home_team_name)
if created:
print "Added team '{}'".format(team.name)
team.coach = match.home_team_coach
team.city = match.home_team_city
team.save()
match.home_team = team
team, created = orm.FootballTeam.objects.get_or_create(name=match.away_team_name)
if created:
print "Added team '{}'".format(team.name)
team.coach = match.away_team_coach
team.city = match.away_team_city
team.save()
match.away_team = team
match.save()
def backwards(self, orm):
for match in orm.FootballMatch.objects.all():
home_team = match.home_team
if home_team:
match.home_team_name = home_team.name
match.home_team_coach = home_team.coach
match.home_team_city = home_team.city
away_team = match.away_team
if away_team:
match.away_team_name = away_team.name
match.away_team_coach = away_team.coach
match.away_team_city = away_team.city
match.home_team = None
match.away_team = None
match.save()
models = {
...
}
...
It's worth noting here that if the incoming data is inconsistent, (ie: in different matches there's a different coach for the same team) you'll get somewhat arbitrary behavior (since the coach will override).
At this point, we have crafted our forwards and backwards data-migration scripts, so we should be able to run python manage.py migrate
in order to run the forward migration. To go backward, we specify our target revision. In my example, the script before my data-migration script was 0002. python manage.py migrate 0002
would get us back there.
The last step in this refactoring exercise will get us to the originally desired clean version of our data model. That is, we will finalize the removal of the redundant data structures in the FootballMatch.
class FootballTeam(models.Model):
name = models.CharField(key=True, max_length=128)
coach = models.CharField(max_length=128)
city = models.CharField(max_length=128)
mascot = models.CharField(max_length=128)
class FootballMatch(models.Model):
home_team = models.ForeignKey(FootballTeam, related_name='home_matches', blank=True, null=True)
away_team = models.ForeignKey(FootballTeam, related_name='away_matches', blank=True, null=True)
And now create our final schema migrations:
(env)[~/src/football]$ python manage.py schemamigration football --auto
? The field 'FootballMatch.home_team_city' does not have a default specified, yet is NOT NULL.
? Since you are removing this field, you MUST specify a default
? value to use for existing rows. Would you like to:
? 1. Quit now, and add a default to the field in models.py
? 2. Specify a one-off value to use for existing columns now
? 3. Disable the backwards migration by raising an exception.
? Please select a choice:
This doesn't look good, but it's OK. Just select option #2 and enter a temporary string such as 'temp'
for all of these default field values, as they will be replaced by data from the FootballTeam models when we are migrating backwards.
Understanding this three step process to data migration is necessary in order to adapt your code to changing feature requirements. Please let me know about the errors or mistakes I have made!
Will Bradley
@wbbradley