Django Database Migrations: A Comprehensive Overview
Note: This article originally appeared on Kite
Introduction to Django databases
The Django web framework is designed to work with an SQL-based relational database backend, most commonly PostgreSQL or MySQL. If you’ve never worked directly with a relational database before, managing how your data is stored/accessed and keeping it consistent with your application code is an important skill to master.
You’ll need a contract between your database schema (how your data is laid out in your database) and your application code, so that when your application tries to access data, the data is where your application expects it to be. Django provides an abstraction for managing this contract in its ORM (Object-Relational Mapping).
Over your application’s lifetime, it’s very likely that your data needs will change. When this happens, your database schema will probably need to change as well. Effectively, your contract (in Django’s case, your Models) will need to change to reflect the new agreement, and before you can run the application, the database will need to be migrated to the new schema.
Django’s ORM comes with a system for managing these migrations to simplify the process of keeping your application code and your database schema in sync.
Django’s database migration solution
Django’s migration tool simplifies the manual nature of the migration process described above while taking care of tracking your migrations and the state of your database. Let’s take a look at the three-step migration process with Django’s migration tool.
1. Change the contract: Django’s ORM
In Django, the contract between your database schema and your application code is defined using the Django ORM. You define a data model using Django ORM’s models and your application code interfaces with that data model.
When you need to add data to the database or change the way the data is structured, you simply create a new model or modify an existing model in some way. Then you can make the required changes to your application code and update your unit tests, which should verify your new contract (if given enough testing coverage).
2. Plan for change: generate migrations
Django maintains the contract largely through its migration tool. Once you make changes to your models, Django has a simple command that will detect those changes and generate migration files for you.
3. Execute: apply migrations
Finally, Django has another simple command that will apply any unapplied migrations to the database. Run this command any time you are deploying your code to the production environment. Ideally, you’ll have deploy scripts that would run the migration command right before pushing your new code live.
Tracking changes with Django
Django takes care of tracking migrations for you. Each generated migration file has a unique name that serves as an identifier. When a migration is applied, Django maintains a database table for tracking applied migrations to make sure that only unapplied migrations are run.
The migration files that Django generates should be included in the same commit with their corresponding application code so that it’s never out-of-sync with your database schema.
Rolling back with Django
Django has the ability to rollback to a previous migration. The auto-generated operations feature built-in support for reversing an operation. In the case of a custom operation, it’s on you to make sure the operation can be reversed to ensure that this functionality is always available.
A simple Django database migrations example
Now that we have a basic understanding of how migrations are handled in Django, let’s look at a simple example of migrating an application from one state to the next. Let’s assume we have a Django project for our blog and we want to make some changes.
First, we want to allow for our posts to be edited before publishing to the blog. Second, we want to allow people to give feedback on each post, but we want to give them a curated list of options for that feedback. In anticipation of those options changing, we want to define them in our database rather than in the application code.
The initial Django application
For the purposes of demonstration, we’ll setup a very basic Django project called Foo
:
django-admin startproject foo
Within that project, we’ll set up our blogging application. From inside the project’s base directory:
./manage.py startapp blog
Register our new application with our project in foo/settings.py
by adding `blog` to INSTALLED_APPS
:
INSTALLED_APPS = [
...
'blog',
]
In blog/models.py
we can define our initial data model:
class Post(models.Model):
slug = models.SlugField(max_length=50, unique=True)
title = models.CharField(max_length=50)
body = models.TextField()
In our simple application, the only model we have represents a blog post. It has a slug for uniquely identifying the post, a title, and the body of the post.
Now that we have our initial data model defined, we can generate the migrations that will set up our database:
./manage.py makemigrations
Notice that the output of this command indicates that a new migration file was created at
blog/migrations/0001_initial.py
containing a command to CreateModel name=‘Post’
.
If we open the migration file, it will look something like this:
# Generated by Django 2.2 on 2019-04-21 18:04
from django.db import migrations, models
class Migration(migrations.Migration):
initial = True
dependencies = [
]
operations = [
migrations.CreateModel(
name='Post',
fields=[
('id', models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name='ID'
)),
('slug', models.SlugField(unique=True)),
('title', models.CharField(max_length=50)),
('body', models.TextField()),
],
),
]
Most
of the migration’s contents are pretty easy to make sense of. This
initial migration was auto-generated, has no dependencies, and has a
single operation: create the Post Model
.
Now let’s set up an initial SQLite database with our data model:
./manage.py migrate
The default Django configuration uses SQLite3, so the above command generates a file called db.sqlite3
in your project’s root directory. Using the SQLite3 command line
interface, you can inspect the contents of the database and of certain
tables.
To enter the SQLite3 command line tool run:
sqlite3 db.sqlite3
Once in the tool, list all tables generated by your initial migration:
sqlite> .tables
Django comes with a number of initial models that will result in database tables, but the 2 that we care about right now are blog_post
, the table corresponding to our Post Model
, and django_migrations
, the table Django uses to track migrations.
Still in the SQLite3 command line tool, you can print the contents of the django_migrations
table:
sqlite> select * from django_migrations;
This
will show all migrations that have run for your application. If you
look through the list, you’ll find a record indicating that the 0001_initial migration
was run for the blog application. This is how Django knows that your migration has been applied.
Changing the Django data model
Now that the initial application is setup, let’s make changes to the data model. First, we’ll add a field called published_on
to our Post Model
. This field will be nullable. When we want to publish something, we can simply indicate when it was published.
Our new Post Model
will now be:
from django.db import models
class Post(models.Model):
slug = models.SlugField(max_length=50, unique=True)
title = models.CharField(max_length=50)
body = models.TextField()
published_on = models.DateTimeField(null=True, blank=True)
Next, we want to add support for accepting feedback on our posts. We want 2 models here: one for tracking the options we display to people, and one for tracking the actual responses
from django.conf import settings
from django.db import models
class FeedbackOption(models.Model):
slug = models.SlugField(max_length=50, unique=True)
option = models.CharField(max_length=50)
class PostFeedback(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL, related_name='feedback',
on_delete=models.CASCADE
)
post = models.ForeignKey(
'Post', related_name='feedback', on_delete=models.CASCADE
)
option = models.ForeignKey(
'FeedbackOption', related_name='feedback', on_delete=models.CASCADE
)
Generate the Django database migration
With our model changes done, let’s generate our new migrations:
./manage.py makemigrations
Notice that this time, the output indicates a new migration file, blog/migrations/0002_auto_<YYYYMMDD>_<...>.py
, with the following changes:
- Create model
FeedbackOption
- Add field
published_on
toPost
- Create model
PostFeedback
These are the three changes that we introduced to our data model.
Now, if we go ahead and open the generated file, it will look something like this:
# Generated by Django 2.2 on 2019-04-21 19:31
from django.conf import settings
from django.db import migrations, models
import django.db.models.deletion
class Migration(migrations.Migration):
dependencies = [
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
('blog', '0001_initial'),
]
operations = [
migrations.CreateModel(
name='FeedbackOption',
fields=[
('id', models.AutoField(
auto_created=True,
primary_key=True,
serialize=False, verbose_name='ID'
)),
('slug', models.SlugField(unique=True)),
('option', models.CharField(max_length=50)),
],
),
migrations.AddField(
model_name='post',
name='published_on',
field=models.DateTimeField(blank=True, null=True),
),
migrations.CreateModel(
name='PostFeedback',
fields=[
('id', models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name='ID'
)),
('option', models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name='feedback',
to='blog.FeedbackOption'
)),
('post', models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name='feedback',
to='blog.Post'
)),
('user', models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name='feedback',
to=settings.AUTH_USER_MODEL
)),
],
),
]
Similar to our first migration file, each operation maps to changes that we made to the data model. The main differences to note are the dependencies. Django has detected that our change relies on the first migration in the blog application and, since we depend on the auth user model, that is marked as a dependency as well.
Applying the Django database migration
Now that we have our migrations generated, we can apply the migrations:
./manage.py migrate
The
output tells us that the latest generated migration is applied. If we
inspect our modified SQLite database, we’ll see that our new migration
file should be in the django_migrations
table, the new tables should be present, and our new field on the Post Model
should be reflected in the blog_post
table.
Now, if we were to deploy our changes to production, the application code and database would be updated, and we would be running the new version of our application.
Bonus: data migrations
In this particular example, the blog_feedbackoption
table (generated by our migration) will be empty when we push our code
change. If our interface has been updated to surface these options,
there is a chance that we forget to populate these when we push. Even if
we don’t forget, we have the same problem as before: new objects are
created in the database while the new application code is deploying, so
there is very little time for the interface to show a blank list of
options.
To help in scenarios where the required data is somewhat tied to the application code or to changes in the data model, Django provides utility for making data migrations. These are migration operations that simply change the data in the database rather than the table structure.
Let’s say we want to have the following feedback options: Interesting, Mildly Interesting, Not Interesting and Boring. We could put our data migration in the same migration file that we generated previously, but let’s create another migration file specifically for this data migration:
./manage.py makemigrations blog --empty
This time when we run the makemigrations
command, we need to specify the application we want to make migrations
for, because there are no changes for Django to detect. In fact, if you
remove the --empty
, Django will indicate that it detected no changes.
With the --empty
flag, it will create an empty migration file that looks like this:
# Generated by Django 2.2 on 2019-04-22 02:07
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_auto_20190421_1931'),
]
operations = [
]
We’ll now use the RunPython
operation to execute a function that allows us to populate the table.
Our migration file should look like this:
# Generated by Django 2.2 on 2019-04-22 02:07
from django.db import migrations
initial_options = (
('interesting', 'Interesting'),
('mildly-interesting', 'Mildly Interesting'),
('not-interesting', 'Not Interesting'),
('boring', 'Boring'),
)
def populate_feedback_options(apps, schema_editor):
FeedbackOption = apps.get_model('blog', 'FeedbackOption')
FeedbackOption.objects.bulk_create(
FeedbackOption(slug=slug, option=option) for slug, option in initial_options
)
def remove_feedback_options(apps, schema_editor):
FeedbackOption = apps.get_model('blog', 'FeedbackOption')
slugs = {slug for slug, _ in initial_options}
FeedbackOption.objects.filter(slug__in=slugs).delete()
class Migration(migrations.Migration):
dependencies = [
('blog', '0002_auto_20190421_1931'),
]
operations = [
migrations.RunPython(
populate_feedback_options, remove_feedback_options
)
]
As you can see, we pass the RunPython
operation two parameters: a function that applies the changes we want
to make, and a second function that reverts those changes. The second
function is not technically required, but in order for you to support
rolling back your data migration, you need to supply one. If nothing
needs to be done to undo your changes, Django provides RunPython.noop
.
Next, let’s apply our migration:
./manage.py migrate
If you inspect the database now, the blog_feedbackoption
table will be populated with the initial options we specified in the data migration.
Rolling back the Django database example
All of our migrations, the ones we generated and the one we created, support being reversed. We took care in creating our data migration to make sure we can still rollback if necessary. Let’s go ahead and roll back all our changes.
To do this in Django, use the migrate command and specify a migration to roll back to. This will roll back all migrations that have been applied past that migration (not including it).
To roll back to the initial state, run:
./manage.py migrate blog 0001_initial
The output of this command should indicate that the two migrations we have created were both unapplied.
What next?
This article is a very quick introduction to Django’s migrations and contains probably 90% of what you need to know for using migrations day-to-day. However, migrations are a complex topic and fully understanding how Django deals with migrations and what pitfalls come with the approach is important. The Django documentation is worth reading, particularly when you have specific questions related to managing your migrations.
Addendum: Life without a migration tool
To understand what Django provides with its migration tool, let’s look at life without such a tool. We’ll do so by exploring the process of manually making changes that impact the database contract in an existing application. Typically, when you want to make changes to your application that require a change to your database schema, you’ll need to take the following steps.
1. Change the contract: database schema and application code changes
First, you’ll want to change your database schema. Starting with a copy of your production schema, you can either write out raw SQL commands to modify the structure of your database tables, or you can use a tool with a graphical interface to help you visualize the layout of your data.
Once you’re happy with the new database schema, you’ll need to make changes to your application code to access the new structure. In some cases, this may be completely new code, and in others, it may just be a change to existing access methods. Finally, you’ll want to test your code against the new schema to make sure the new contract is valid.
2. Plan for change: write a migration script in SQL
Once you’ve made changes to the schema and application code, you need to have a strategy for getting those changes into your production environment. Ideally, you’ll migrate your database to the new schema at the same time that you deploy your new application code to production.
To minimize the amount of time your application code and database schema are out-of-sync, have a script that migrates your schema from the old state to the new state. If in the first step you modified your database schema by manually writing SQL commands, you can simply put those commands in a .sql file that can then be used to apply changes directly to your database at the time of migration. If you used a tool to modify your database, you need to go back and write a series of commands that will take your database from the old schema to the new.
3. Execute: deploy code and run migration in unison
Now that you have your application changes and migration script, you’re ready to deploy an application code. You can run the migration script on your database, and the new application code should have a valid contract with your new database schema if done properly.
Tracking changes without Django
A single migration by itself is not a big deal. This process is a bit manual, but ultimately it can work. As you make more and more changes, however, particularly if you are working on a team with other developers, keeping track of your database can be complicated.
When multiple people are making changes to a code base, each of which require a migration, it’s confusing to track what migrations have been applied to the database and what migrations have not. It’s also important to tie specific changes in your application code to specific migrations. This way, the migration is applied at the same time the code change goes live in production.
Rolling back without Django
Inevitably there are scenarios where you will want the ability to go back to a previous application state. For example, if a bug gets into production that you cannot address quickly, sometimes the best solution is to simply back out your changes until the bug can be addressed.
In this scenario, your migrations need the ability to roll back as well, otherwise you are locked into the most recent changes. Ideally this comes in the form of a script that undoes your migration so that if the need arises, the migration can be reverted quickly.
source:
https://kite.com/blog/python/django-database-migrations-overview