GSOC Proposal for Django Secrets Manager by Muhammad Ashlah Shinfain
Table of Contents:
settings.py
: the source of configs
Django's users usually create settings.py
file to define configurations needed to run their application. These
configurations can be obtained easily by using django.conf.settings
module-level variable. Unfortunately, this kind of
setting is hard to maintain when we bring the concept of multiple deployments. Each deploy can have its own
configuration to handle, for example, backing services resource handles, external services credentials, and other per
deploy values, as stated in 12factor. That's why some of Django projects define
multiple settings (like
djangoproject.com and
readthedocs.org) for each
environment. But still, we must maintain each one of deploys settings, if any deploy configs changes we need to also
change the codebase. There's also potential security issue when we (accidentally) expose secret credentials.
Various outsource config solution
To add more flexibilities, we can use other non-codebase resources. The most basic source is using environment
variables that can be easily obtained by Python's os.environ
. It is very simple, but the format is very limited and
kind of hard to maintain as there's no fixed source configuration (its mixed with other systems environment variables).
Another alternatives are using a file and parse it inside the code. There are various formats to choose for the file:
.env
(this one is pretty popular in several programming languages), json
(like what
djangoproject.com use),
yaml
, cfg
, ini
, etc. A more advanced option is using a secret manager (such as Google Secret Manager, Ansible
Vault and HashiCorp Vault) which usually used remotely through the internet so applications might share some of their
settings. These various options may complicate things to achieve one simple thing: get a config, and unfortunately,
Django hasn't had any solution for this yet.
What's this project about
However, fortunately, Django has initiated the solution by putting this as one of Django's Google Summer of Code 2020
ideas list. As mentioned on the page, this
project aims to "design and add an abstraction interface over secrets managers that allows users to easily map to an
external secret in a settings file". In addition to Django's defined goal, I want to make this really simple for
developers to use, just like using django.conf.settings
or os.environ
anywhere they need it. There should also be
flexibilities for developers to extend the predefined logic (to retrieve the secrets) easily.
For convenience in this proposal, we will call the type of config that does not vary between deploys as config and the one that varies between deploys as secret.
This section will explain how basic usage and implementation of the proposed secret manager, how developers extend predefined logic, and some possible additional features.
Since we need to give the best design of API to Django's user, as mentioned in one of Django's Forum
topics, I think we can give
them something that they're familiar with: django.conf.settings
and os.environ
. I propose to give access to the
secrets using a module-level variable, which gives us singleton capability without having the user to instantiate it
(this pattern can be found many times in Django: django.apps.app
, django.admin.sites.site
, etc). Once imported, this
module-level variable should have capabilities to be used as a mapping object just like os.environ
.
Thus, assuming this secrets
module-level variable will live in django.conf.secrets
module, developers can do
something like this anywhere the secrets
needed:
from django.conf.secrets import secrets
secrets['KEY']
Django's user can have multiple sources of secrets and we need to facilitate this. As part of Django components, the
secrets manager can use the advantages of settings.py
to configure its behavior. I propose a SECRET_BACKENDS
config
to let the users choosing the secrets sources they needed. Its simplest form would be just a list of strings that
represent the dotted Python path to secrets backend class. But, to add more flexibilities on configuring the secrets I
think it's better to define it as a list of dictionary of backends configuration, just like Django's
TEMPLATES
and
AUTH_PASSWORD_VALIDATORS
.
SECRET_BACKENDS = [
{
'BACKEND': 'django.conf.secrets.backends.DotEnvSecret',
'OPTIONS': {
'PATH': secrets['DOTENV_PATH'],
},
},
{
'BACKEND': 'custom_secret_backends.GoogleSecretManager',
'OPTIONS': {
'URL': secrets['GOOGLE_SECRETS_URL'],
'CREDENTIALS': secrets['GOOGLE_SECRETS_CREDENTIALS'],
},
},
]
The secret backends will be evaluated with the order from top to bottom. This will take advantage when the setup of
later secret backend depends on the previous secrets. For example, GoogleSecretManager
will need GOOGLE_SECRETS_URL
and GOOGLE_SECRETS_CREDENTIALS
secrets which can be retrieved from the previous backend. The system environment
variables can also be used by default -- unless the Django Community decides its better to exclude from default and give
an option to include using an environment variable backend. Even though there are multiple secrets backend, which each
secret may share some same variables, the later backend will overwrite it. The reasoning behind this is each Django
project will only use one value for each variable on each deployment.
The secrets variable will be the container of all secrets loaded from defined backends. This variable should have the
capabilities that a mapping object has (similar to
os.environ
). This includes implementing all mapping methods, such as __getitem__
, __iter__
, __len__
, etc. We can
also add some additional functionalities to this container, such as retrieving specific source secrets, reloading
secrets, etc.
If you are aware, in the previous example of settings.py
, we use the secrets
variable to define the
SECRET_BACKENDS
which will be used to configure the secrets
. This sounds like a circular import which can cause a
serious problem. Fortunately, Django has tackled this kind of problem and create some lazy functionalities in
django.utils.functional
module. The snippets below roughly describe how secrets handle a query when it's not
configured yet:
from django.conf import settings
from django.utils.functional import empty, SimpleLazyObject
from django.utils.module_loading import import_string
class Secrets:
_secrets = empty
def __init__(self):
self._setup()
def _setup(self):
if settings.configured and not self.configured:
secret_backends = settings.SECRET_BACKENDS
self._secrets = {}
for backend_str in secret_backends:
backend_cls = import_string(backend_str)
backend = backend_cls()
self._secrets.update(backend.get_secrets())
def __getitem__(self, item):
self._setup()
if self.configured:
return self._secrets[item]
def proxy_getitem():
if not self.configured:
self._setup()
return self._secrets[item]
return SimpleLazyObject(proxy_getitem)
@property
def configured(self):
return self._secrets is not empty
secrets = Secrets()
The only time this secrets
variable would return a SimpleLazyObject
is when the django.conf.settings
is not
configured yet. After being configured, all interactions on this lazy object will be evaluated and used as a normal
object.
The backend class is the key to flexibility in this proposed secret management. Each concrete backend class will represent how the system retrieves the secrets. For now, there will be 3 base backend classes, one for the root base, one for secrets retrieved from some filesystem, and another for secrets retrieved from the internet.
-
BaseSecretBackend
This backend class will be the root base for all secrets backend classes. Even though current implementation will only have one method, it's better to provide a base so anyone using this base will have future updates from the base. The most important method for this class (and for all of its subclasses) is the method for getting secrets from the corresponding source, which we called
get_secrets()
from the previous snippet. -
BaseFileSecretBackend
This backend class (and its derivatives) will take flexibility in parsing various filesystem formats such as
.env
,json
,yaml
, etc. In addition to whatBaseSecretBackend
defined, we need some other attributes: path to the filesystem, and the format parser. -
BaseHttpSecretBackend
This kind of backend will be used to retrieve secrets from the internet through HTTP. As all HTTP requests, there should be HTTP method, request URI, headers, and optional payload. The request URI may be formed as string format, so the users can inject some parameters inside it. After getting response through HTTP request, we still have to parse it to Python's mapping object before we can use it. These parsers may be shared with the filesystem secret parser.
For convenience, I would like to propose two things for all secret backends:
-
Each secrets backend should give rational default values whenever possible, so users can use the backend with minimum setup. The default values can be discussed with Django Community.
-
Each secrets backend should have capabilities to retrieve the required parameters from environment variables (or previously loaded secrets) without having to explicitly state them in
SECRET_BACKENDS
(using'OPTIONS'
key in the first snippet). If the parameters defined inSECRET_BACKENDS
, the secrets will use that parameter instead of the one defined in the environment. -
Both items above should be well documented, so users can easily refer to the docs when using the secrets.
When developers need to customize the defined backends, they can easily inherit the most appropriate backend class and
override some of the functionalities needed. The custom backend can be used easily by including its dotted Python path
in the SECRET_BACKENDS
config.
Belows are some possible additional features that can be implemented using the proposed secrets management.
Per-source secret management
Some times, developers might need to use some secrets from a specific source. We can facilitate the need by giving the
secrets
variable a method for retrieving secrets from a specific source. We may also give a method for checking a key
This would also be beneficial for debugging which variables come from which source.
Subset variables from each source
Using 'OPTIONS'
in SECRET_BACKENDS
, we can specify what variables we expect to retrieve from each source, thus
ignoring unnecessary secrets.
Generate source on startproject
We can create a Django management command that create a secrets source file based on the first secret backend
configuration in SECRET_BACKENDS
. The content might be something that shouldn't explicitly shows up in settings.py
,
such as SECRET_KEY
. With this, we can also introduce and promote this new feature to developers.
Runtime reload
We might need the secrets
variable reloads from the source in runtime. But I think we should check whether the reload
without restart will works well with all Django components.
Auto decode base64
Some secrets might need binary data that can't be represented using normal characters. That's when the base64 comes to rescue. Actually I was inspired by how Kubernetes require the value of their secrets as base64 encoded.
The list below describes roughly what I've decided to work during the GSOC coding period by default. This list might change if needed, conforming to what the community wants. I've ordered the list in the matter of importance and put the weight (difficulty level) of each task.
-
Module-level
secrets
variable-
Loading from backends relatively easy
-
Overriding the same variables based on
SECRET_BACKENDS
relatively easy -
Lazy resolution of secrets (when the
django.conf.settings
is not configured yet) relatively hard -
Gradual resolution of secrets (using previously loaded secret for loading the next one) relatively hard
-
Per-source secrets management intermediate
-
Retrieving only the subset of secrets from each source intermediate
-
-
Secret backends
-
BaseSecretBackend
relatively easy -
Filesystem secrets
-
BaseFileSecretBackend
relatively easy -
DotEnvSecretBackend
intermediate -
JsonSecretBackend
intermediate -
Other filesystem backends
-
-
Remote secrets
-
BaseHttpSecretBackend
intermediate -
AnsibleVaultSecretBackend
relatively hard -
HashicorpVaultSecretBackend
relatively hard
-
-
-
Miscellaneous
- Generate
.env
(or other preferred formats) andSECRET_BACKENDS
settings onstartproject
intermediate
- Generate
I pick the DotEnvSecretBackend
and JsonSecretBackend
for filesystem secrets and either AnsibleVaultSecretBackend
or HashicorpVaultSecretBackend
for remote secrets because I think those are the most popular source of secrets for
their category. If time allows, I will implement other secrets backend as well.
This timeline was designed based on the Google Summer of Code 2020 Program Rules
May 5, 2020 - June 2, 2020
I will use this time to make some adjustment about the things listed (but not limited to) below, by asking to the Django Community.
-
Where to place the secrets module
-
Secrets backends default values
-
Secrets backends required parameters environment variable naming
-
Deciding implementation priority (secrets backend, additional features, etc)
June 2, 2020 - August 25, 2020
The first phase will about working on basic implementation of secrets
and some of the secrets backend.
Due to the COVID-19 outbreak, the calendar at my university was pushed back for 2 weeks and will have the final exam on
June 2 - June 10. For this period of time, it will be hard for me to do the work. But, I will make sure by the end of
June 10, I will have the work on BaseSecretBackend
, BaseFileSecretBackend
, and DotEnvSecretBackend
done. The rest
of the week will be used for implementing JsonSecretBackend
and their test. Those two concrete backends
(DotEnvSecretBackend
and JsonSecretBackend
) will be used for simulating secrets retrieval by the secrets
module
level variable.
This week will be used for implementing some basic functionalities of secrets
. This includes loading secrets from the
backend class, overriding shared variables based on backends order in SECRET_BACKENDS
, and selecting the subset of
secrets for each secret sources. The implementation of these will include their test to make sure each functionality
works as expected.
The first part of this week will be use for implementing (including the test) the remaining basic functionality, which is per-source secret management. The later part of the week will be used for evaluation and documentation.
The second phase goal is to make the secrets
variable usable anywhere anytime using the Django's lazy
functionalities.
In this period of time, I will implement the functionality that makes the secrets can be used inside settings.py
without compromising circular import. Since I've worked some part of this (provided in one of the snippets above), I can
take the time to make sure that the implementation is working at its best as the concept of laziness is still a pretty
tricky concept for me.
In this period of time, I will implement the functionality that makes the secrets can use previously loaded secrets to configure the next secrets backend. This implementation will require more advanced trick on using laziness concept.
I will use this period of time flexibly based on some situation. If there was a problem in the previous week, I'll make
sure to finish the task this week. I also can do the implementation for generating .env
file on startproject
when
the times fit. Or maybe I will just start the next task early. Whatever the situation I will make some time to do the
evaluation and documentation of this phase.
In the final phase, I will work on the HTTP API based secrets manager. This require some exploration first, as I have limited experience on using this kind of secrets manager.
This week will be used for exploring HTTP API based secret managers. Some main information that I will look for: the
request schema for users to take their secret out of the secret manager, the response format of the secrets, and if
there's a CLI tool, does the tool use some environment variables to retrieve the credentials. While doing the
exploration, I will start the implementation for BaseHttpSecretBackend
. I will also do some setup on one of the secret
managers for me to use when implementing the secrets backend.
This week will be used for implementing the secrets backend of one of the HTTP API based secret managers previously
explored. The biggest candidate is either HashicorpVaultSecretBackend
or AnsibleVaultSecretBackend
. Considering I
have limited experience in using any of HTTP API based secret managers, I estimate this work will be done in two weeks.
I will do some documentation for this phase in this period of time.
In this period of time, I aim to iron out any issues left and do the final evaluation for all of the previous implementations. After the issues are solved, the whole work will be ready for Django's core developers to be merged into the master codebase.
Would this replace the use of os.environ?
It's not meant to replace the os.environ
. If developers need to retrieve environment variables only, they still can
use the os.environ
instead of the secrets
variable. Indeed, this secrets
variable includes the os.environ
variables by default, but it will be mixed up with other variables from configured secret sources. So it is still easier
to use os.environ
if developers want to retrieve environment variables only.
Can we remove the need for multiple settings.py?
The multiple settings.py
approach has its own purpose -- it has its own
wiki page. There are some logic that can reside inside the Python
settings.py
files that can't represented by secrets. But using secrets
, developers can tune their settings.py
files more conveniently.
The secrets
module-level variable that has laziness capability make developers can easily use it anywhere anytime
like django.conf.settings
and os.environ
. Developers can also tweak the secrets retrieval so easily thanks to
SECRET_BACKENDS
settings. Using this approach, it will make the future development of this secrets management easy.
And for a bonus point, this change will not break any of your current project when you update the Django version.
Hi! My name is Muhammad Ashlah Shinfain, my friends call me Ashlah. Currently, it's my final (4th) year as Computer Science student at University of Indonesia. I lived in Depok -- a city next to Jakarta -- Indonesia (UTC+7).
It's my 4th year knowing Python and has been using Django for 3 years now. There are several projects that I've done during these 3 years with Django. Most notably is when I lead an organization's dev team for one year. This is when I learn so much about Django. This dev team responsible for developing and maintaining the organization's system, such as recruitment system, publication request system, and book lending system. Another project that I'm proud of is when I single-handedly develop an API service for a health tracker mobile app. This is when I got the feel on how to code properly. Because I'm the only one who manage this API service development, I can easily maintain the best practices used in the project.
Recently when I code, I often look into its source code to learn the pattern they use and best practices the conform. It
can be said that I learn to code by examples. My favorite packages that I use, and usually became my code style
reference, are django-cas-ng
,
django-allauth
,
django-rest-framework
, and of course django
itself. For Django
projects, I often refer to djangoproject.com
and
readthedocs.org
, which I found as two of the most popular Django
project from djangopackages.org.
Currently, my contribution to Django codebase are these two PRs: django/django#12596 and django/django#12591. Although it's been reviewed, it's not merged yet. Even after finishing this GSOC milestone, I think Django will remain to be my favorite open source project to contribute.
If there's something to be discussed, I can be reached through my email: [email protected]. If you want to know me more, you can see my GitHub profile and my StackOverflow profile. I also shared my developer journey on Twitter