- Abstract
- Motivation
- Proposed solution and specs
- Timeline
- About me
This proposal aims to bring flexibility in Django's operations related to database indexes - their creation, modification. Another goal is to allow Django to support creation of more complex indexes using different methods (like RTree
, Hash
) on relevant backends (most notably PostgreSQL), such as functional indexes and indexes with different algorithms (e.g. GiST
).
Right now, Django isn't able to offer much customisability to it's users when it comes to creating indexes. In fact, db_index
and index_together
only give the ability to create some specific indexes. But much more can be done with indexes, like declaring indexing in ascending/descending order, for example. Indexes if used properly can speed up a database scan significantly by giving sub-linear time lookup to improve performance.
There are new fields being introduced in contrib.postgres
which have similar issues to the rationale for spatial_index
, a BTree
index is either insufficent, inefficient or does not work and an alternative index type is needed for that field. A field should be able to declare the type of index which is appropriate to it.
PostgreSQL also supports partial indexes using the WHERE clause.
In addition, both postgres and oracle support powerful functional indexes (and mysql supports indexing on the first few characters of a string), which allow significant performance benefits for certain queries. You can create indexes on expressions like lower(col1)
to speed up queries like SELECT * FROM test1 WHERE lower(col1) = 'value';
. Of course, it is possible to create these indexes by hand if you know what you are doing using RunSQL
based migrations but this is a place for improvement.
Introduce django.db.models.indexes
with deconstructable Index
class.
This class would allow us to have the power over creation of indexes depending on the various arguments passed to it.
Index()
classes will be able to define their own SQL using the as_sql
pattern similar to that used by Lookup()
, but will be passed the
SchemaEditor
class instead of a compiler. By default, this will defer to
the SchemaEditor
class, but it allows non-standard index creation such as
that for spatialite (using a SELECT
statement and a function) without
having to subclass SchemaEditor
.
Introduce Meta.indexes
for other indexes. This will hold a list of
Index()
instances and will keep a track of all indexes for a given model. index_together
will continue to be supported and would also be added to Meta.indexes
internally.
The general workflow for index addition/deletion would be - Migrations framework (autodetector) detects that Meta.indexes has changed and adds an Operation => The operations.models layer tells migrations that due to the change a new index needs to be added/deleted/altered/renamed => migrations framework implements this change.
In order to have a robust way to detect name changes in indexes, the index name will be explicitly passed from the operations to migrations, whether they are auto-generated or have been defined by user. It would be a part of the deconstructed form of the index.
db_index
of the Field
class will support index classes
(e.g. IntegerField(db_index=indexes.Hash)
) other than the values
True
and False
.
Field
will gain a method get_default_index
which will be called when
db_index=True
is passed to the __init__
. The default value would be
the SpatialIndex
in gis, which will mark the start of deprecation of
spatial_index
which would be supported with deprecation warnings.
db_index
, index_together
would actually be shortcuts for
editing Meta.indexes
.
The FunctionalIndex
class will be introduced to allow use of functional indexing. It would let expressions to be used in index definitions. It would take in expressions (e.g. F
, lower
) based on which indexes would be created.
The as_sql
of this class would use the power of django expressions and their already existing methods (like resolve_expression
, as_sql
, etc.) to create the sql query. For example, an expression like F('col1') + F('col2')
creates a CombinedExpression
which can be used to get the corresponding sql statement of this expresion. The connection would be used from the passed SchemaEditor
class object.
Index
classes will take the fields
, name of the index, method
to be used while creating the index, etc. as its arguments.
name
would allow the user to pick an explicit name for their index if they want. Also, passing the name as an argument to the class would be easier to implement and would be helpful while checking for duplicate index names (as compared to an index_name = Index('field1')
approach).
The idea is to create an index for each value in the Meta.indexes list. This way an index like Index(['field1', 'field2'])
would basically imply a multi-column index. Also, names of the index would be passed as an argument to the Index class.
# models.py
class Town(models.Model):
name = models.CharField(max_length=255)
slug = models.SlugField(unique=True)
region = gis.PolygonField(db_index=True)
area = models.FloatField()
population_density = models.FloatField()
class Meta:
indexes = [
models.Index('slug', 'name'), # multi-column indexing.
models.Index('name', name='user_given_index_name'),
models.FunctionalIndex(F('area') * F('population_density'), type=models.BTree),
]
spatial_index
will be deprecated as setting db_index
to True
on a spatial field
would be the new way.
To avoid the whole 12-week work getting merged as a single jumbo patch, the workflow has been designed in such a way that different parts of the work can be merged independently as individual patches. To indicate such areas the timeline contains certain checkpoints meaning that work between two checkpoints can be reviewed and merged separately (although checkpoint 1 is necessary to get any other patch merged). This will facilitate the review process and also allow a better understanding of the changes taking place in an incremental way.
- Creating the
django.db.models.indexes
module and a base class likeBaseIndex
. - Creating the deconstructible
Index
class, defining its various attributes and methods -deconstruct
,as_sql
,remove_as_sql
and other private methods. - Modify the schema(s) in
django.db.backends
to incorporate a more flexible index creation/deletion definitions and introduce new method for the same.
- Creating classes for various index methods like
BTree
,Hash
. - Add support for various opclasses and ensure backward compatibility.
- Add support for using index ordering - #20888.
- Create documentation for indexes - introduction part and types of indexes available.
- Quarterly rebasing against master.
- Add indexes in models.options i.e. Create Meta.indexes to store
indexes = [IndexType('field')]
. - Add function to detect changes in
django.db.migrations.autodetector
. - Create
AlterIndexes
operation in migration layer. - Other changes in migrations layer (
state
,auto_detector
etc.). Stitching different parts together.
- Write tests for for autodetector.
- Write tests for migrations.
- Write tests for schema.
- Add tests for #23577 check if everything works fine.
- Mentions in release notes and other areas of documentation as per requirement.
Checkpoint 1 - Add basic class based index.
This commit would add the basic Index
class, it's functionality and would allow it's usage through Meta.indexes
. This part is the skeleton of custom indexes and hence is a prerequisite for the rest of the work (checkpoints).
- Make
index_together
use the newindexes
's functions. - Remove/Deprecate
AlterIndexTogether
operation,alter_index_together
and other related functions. - Quarterly rebasing against master.
- Allow support for custom index classes to
db_index
. - Get this change get detected in autodetector as AlterIndexes and not as AlterField.
- Addition of
get_default_index
toField
(and it's subclasses, as required). - Set
db_index
to True by default forunique=True
andForeignKey
fields. - Updating docs.
- Writing tests.
Checkpoint 2 - Integrate older ways of index creation to use class based indexes internally.
This merge would update the existing ways of modifying indexes i.e. index_together
and db_index
so that they use the new class based indexes. It can be further broken into separate commits for index_together
and db_index
if required.
- Creating class for SpatialIndex type in
gis.db.models.indexes
. - Take into account definitions of various backends.
- Tuning it up to be used with db_index in
BaseSpatialField
. - Writing test for
db_index=SpatialIndex()
. - Updating docs of both indexes and gis.
- Removing any non-required code. Deprecation of of
spatial_index
. - Quarterly rebasing against master.
Checkpoint 3 - Integration and deprecation of spatial_index
.
- Allow expressions as arguments.
- Integrating the expressions api.
- Resolving the expression and using it's sql for index creation.
- Write various methods
- Writing the FunctionalIndex.
- Write extensive/rigorous tests for functional indexes.
- Writing In-detail documentation for indexes - usage, specs, etc.
- Mentioning in release notes and other areas of documentation that reqiure updation.
- Final rebase against master.
Checkpoint 4 - Add functional indexes.
This merge would allow Django to let it's users create functional indexes.
If I finish early, I would try to resolve the tickets which are dependent on the work of this proposal, namely #24530 and #23577 by writing their tests and seeing if anything else is required to resolve them. I would also complete anything else that is required to resolve and close #26167.
My name is Akshesh Doshi and I am an undergraduate student at Indian Institute of Technology Roorkee (IITR), India. My time zone is UTC+05:30. I have been programming for 4 years now and have also worked in C++, PHP, Java, C# other than Python.
I am the coordinator of Information Management Group (IMG), a student group responsible for the development of the main website and the intranet portal of the institute (which is obviously made using Django ;)). I had joined the group 3 years ago and that is when I had my first experience with Django and fell in love with Python. You can find more about the work over here. I have used Django for many other projects and have used Python for things other than web development like image processing, data analysis and machine learning as well. I have successfully completed many big projects in the past and take responsibilities assigned to me very seriously.
It's been some months since I started contributing to Django. While there have been some merged patches, there are other works which are under development. My experience till now has been really great and I have found the community very welcoming.
I can communicate in English. You can find me hanging out on #django-dev
IRC channel as akki
. My email id is [email protected].