Proposal for Django in Google Summer of Code 2019 by Sage M. Abdullah.
Table of Contents
- 1 Abstract
- 2 Implementation
- 3 Schedule and milestones
- 4 Notes
- 5 About me
Django currently has limited support for JSONField
. The support is
provided in contrib.postgres.fields module, meaning it can only be
used for PostgreSQL database backend. The current implementation of
JSONField
for PostgreSQL uses the jsonb data type.
Django officially supports PostgreSQL, MySQL, SQLite, and Oracle Database
Server. With the JSON1 extension available for SQLite, all of the
supported backends now have native JSON handling. Aside from JSONField
,
Django also supports related fields such as ArrayField
and
HStoreField
for PostgreSQL.
The current JSONField
implementation supports key, index, path, null
value, and missing key lookups. It also shares lookups relating to
containment and keys with HStoreField
, such as contains
(overridden),
contained_by
, has_key
, has_any_keys
, and has_key
.
ArrayField
currently supports contains
(overridden), contained_by
,
overlap
, and len
lookups along with index and slice transformations.
Native implementation of JSONField
is available for MySQL through
the use of django-mysql package. For Oracle Database, there is the
oracle-json-field package. There is no native implementation for SQLite
yet, though there is jsonfield and django-jsonfield that implement
the field as a CharField
or TextField
.
The main goal is to bring the fields from contrib.postgres
into core
(db
), with JSONField
as the first priority. Other fields include
ArrayField
and HStoreField
, but the latter does not seem to be
worth to implement. Achieving this will enable developers to use the
fields with any supported database backends.
Since there are examples for all other database backends except SQLite, the first step is to create a POC field for SQLite. The next step is to create a unified field that works for all database backends. Tests will also be created to make sure the implementation works as expected.
Once each field is completed, the next step is to implement lookups that work across all database backends. Lookups that are only supported for certain database backends will be implemented with checks that make sure they are only available if supported.
To avoid conflicts for users of the existing PostgreSQL fields, the
implementation will not immediately replace the implementation in
contrib.postgres
. Instead, a migration path will be defined in the
documentation. A warning may also be raised if developers are still using
those fields.
The documentation will not only cover the implementation and migration path, but also some kind of a how-to guide on compiling SQLite with the JSON1 extension so that it can be used with Python and Django (though it seems to be enabled by default on some instances).
A database-agnostic JSONField
has been discussed since 2010 in #12990.
It has also been discussed in django-developers since 2016. The
accomplishment of this task would benefit greatly for the community, since
JSON is a common data type to be used in web development and the community
seems to be very interested in a database-agnostic JSONField
. Robust ORM
lookups for JSON data supported natively by the backends are also very
desirable.
Implementing a field that works on all supported database backends will require a lot of analysis beforehand. This is due to different implementations of JSON data type in each database backend. Even though there are some common lookups, the implementations differ between one another.
After creating a POC field for SQLite, there probably needs to be a table of some kind to show the differences (available lookups, syntax, etc.) between implementations of JSON data type in each database backend. Once it's all done, we can figure out which features we want to include in the cross-DB field.
There are existing implementations of native JSONField
for all database
backends except SQLite, so we are not starting from scratch. I will try to
reuse those existing implementations to make sure they're as compatible as
possible.
The documentation for JSON data type in PostgreSQL, SQLite, MySQL, and Oracle Database are pretty thorough, so it looks like they'll be helpful in the future.
As already said, the implementations of JSON data type in each database
backend are varyingly different. Maintaining compatibility across all
backends will be quite tricky. We may need to do something like
as_sqlite()
, as_oracle()
, etc. to separate the different
implementations.
Before starting to work on this project, I would like to do some preparation:
- Researching how Django models work with the database backends.
- Researching how JSON data type is implemented in all database backends.
- Familiarizing with all database backends and the best way to use JSON in each one.
On June 5-6, I will be on holiday (Eid al-Fitr), so I won't be able to work on those dates.
I will need to do academic registration sometime in early August (no certain date yet). This will probably take a whole day, depending on my university. Best case, this will take less than an hour. Worst case, this will take two days (if there are technical difficulties with the server).
On August 20-22 (tentative), I need to help my faculty in guiding the freshmen for their first days in college. I won't be very busy, so I'll still be able to work (part-time) on those dates.
Beside those dates, I won't have anything else to do. So, that means more time for me to work on the project.
(From May 7 until May 27 -- 3 weeks)
During the community bonding period, I will try to:
- Do more research about Django codebase, especially the database layer.
- Review existing tickets, make some new ones, and submit patches.
- Hang out more in Django IRC channels and mailing lists to get to know the community and help them.
(From May 27 until June 24 -- 4 weeks)
I will create a POC JSONField
for SQLite (3.2.1), write documentation
and tests for it (3.2.2), and create an analysis about lookups for JSON
data type in all database backends (3.2.3).
(Holidays on June 5 - 6)
During the first few days, I may need to do research to find the best way
to create a POC JSONField
for SQLite. I'll consult to the mentors as I
create it to make sure I'm on the right track. I will try to reuse the
implementation of JSONField
in contrib.postgres
as much as possible.
Once the POC field is done, I will spend the next week writing tests and documentation for it. This work may be merged if desired.
In the last week of the first milestone, I will analyze all of the available JSON lookups in every database backend. From the analysis, I will create a table of some kind to compare which lookups are available across all database backends. This will help me in implementing the unified field that works for all database backends. If I have some free time, I will also dig deeper about how JSON data type is implemented in all database backends (especially those other than SQLite and PostgreSQL).
The first evaluation will take place on June 25 - 29. By this evaluation, I hope I will have completed:
- POC
JSONField
for SQLite, along with tests and docs. - JSON lookups analysis.
(From June 30 until July 22 -- ~3.5 weeks)
The first evaluation may finish early, so I may be able to start working
for this second milestone earlier. In this second milestone, I will
implement a unified JSONField
and its lookups (3.3.1) and write tests
and docs for them (3.3.2).
This is the most important step of this project. This probably can be divided
into parts: making sure the unified JSONField
works for both SQLite and
PostgreSQL, extending the field to work for MySQL (and MariaDB) and Oracle
Database, and implementing the lookups. However, I'm still not sure how long
each part would take, so I combine them into a single step that takes 2.5
weeks to complete.
The existing implementation for MySQL and Oracle Database will be very helpful in this step.
I probably will have written some of the tests and docs while implementing, but I'm sure there will be a room for improvements. During the rest of this milestone, I will write additional tests and docs. I may also make general improvements to the implementation as needed.
Again, this work may be merged if desired.
The second evaluation will take place on July 23 - 27. By this evaluation, I hope I will have completed:
- Unified
JSONField
and its lookups, along with tests and docs.
(From July 28 until August 20 -- 3 weeks)
(Academic registration in early August, a bit busy on August 20 - 22)
As before, I may start earlier if the evaluation didn't take as long
(< 4 days). In this third and final milestone, I aim to implement
ArrayField
(3.4.1). I will also write documentation, especially
a migration path from contrib.postgres
to the unified field and
a how-to on compiling SQLite with JSON1 extension to make it work
with Django (3.4.2). The rest of the milestone will be spent in merging
the whole project into master
(3.4.3).
SQLite and
MySQL
have the json_array()
function which can be used to store arrays.
Meanwhile, Oracle Database
stores JSON as a VARCHAR2
, BLOB
, or CLOB
with IS JSON
constraint, and arrays are a valid JSON value.
If I'm correct, this shouldn't be too hard to implement, since some lookups
in JSONField
can also be used for ArrayField
. Also, I'll already be
familiar with JSONField
and its implementation. I aim to finish this in
1 week, including writing the tests.
During this week, I will write documentation for ArrayField
. After that,
I will write about the migration path from contrib.postgres
to the
unified fields.
I will also write a how-to on compiling SQLite with JSON1 extension. One
option is to write about how to compile the JSON1 extension and
loading it with the .load
command in SQLite. Another option is to
write about how to compile SQLite amalgamation with the
-DSQLITE_ENABLE_JSON1
option.
In the rest of the week, I aim to iron out any minor issues that may be left.
Then, the whole project should be ready to be merged into master
. If the
first two milestones are already merged, this should be a smaller patch.
The final evaluation will take place on August 20 - 27. By this evaluation, I hope I will have completed:
ArrayField
implementation, tests, and docs.- Documentation about the migration path and SQLite+JSON1 guide.
- The whole project, merged into
master
.
If I'm able to complete the project earlier than expected, I aim to work on existing tickets on Trac, especially those for the database layer. I have yet to decide which tickets I want to work on, since I can't tell which ones will still be open once I finish this project.
I will make sure to communicate with the mentors regularly during the whole project (not just for evaluations). I also plan to write weekly blog posts about my progress on this project. I will also try to be a regular contributor to Django after this project ends.
My name is Sage M. Abdullah and I'm in my second year as a Computer Science student at Universitas Indonesia. I am 19 years old (I'll be 20 this June). I live in Jakarta, Indonesia (UTC+07:00). I've been coding in Python for the past 2 years. I also code in Java and a little bit of C# and C.
I started contributing to Django in March this year. So far, I have submitted two patches (#30294 and #30295) and they both have been merged. I have also reported a ticket (#30326). They're easy pickings, but I look forward to contribute more to Django's codebase in the future. I have also contributed to Django's localization for Indonesian language on transifex since 6 months ago.
During my first few months in using Python, I made a simple IRC-style chatbot for LINE using Flask called AidenBot. I also made a few web scrapers, one of them being kbbi-python, a simple module that scraps the online Indonesian dictionary and returns the serialized result as a JSON object.
My email is laymonage (at) gmail.com
. You can find me in #django-dev
,
#django
, and #gsoc
IRC channels as laymonage
.