GSOC 2020: Django Secrets Manager

GSOC Proposal for Django Secrets Manager by Muhammad Ashlah Shinfain

Table of Contents:

Abstract
Usage and Implementations
Milestones
Timeline
- Community Bonding
- Coding
Q&A
Conclusion
About Me

Abstract

settings.py: the source of configs

Django's users usually create settings.py file to define configurations needed to run their application. These configurations can be obtained easily by using django.conf.settings module-level variable. Unfortunately, this kind of setting is hard to maintain when we bring the concept of multiple deployments. Each deploy can have its own configuration to handle, for example, backing services resource handles, external services credentials, and other per deploy values, as stated in 12factor. That's why some of Django projects define multiple settings (like djangoproject.com and readthedocs.org) for each environment. But still, we must maintain each one of deploys settings, if any deploy configs changes we need to also change the codebase. There's also potential security issue when we (accidentally) expose secret credentials.

Various outsource config solution

To add more flexibilities, we can use other non-codebase resources. The most basic source is using environment variables that can be easily obtained by Python's os.environ. It is very simple, but the format is very limited and kind of hard to maintain as there's no fixed source configuration (its mixed with other systems environment variables). Another alternatives are using a file and parse it inside the code. There are various formats to choose for the file: .env (this one is pretty popular in several programming languages), json (like what djangoproject.com use), yaml, cfg, ini, etc. A more advanced option is using a secret manager (such as Google Secret Manager, Ansible Vault and HashiCorp Vault) which usually used remotely through the internet so applications might share some of their settings. These various options may complicate things to achieve one simple thing: get a config, and unfortunately, Django hasn't had any solution for this yet.

What's this project about

However, fortunately, Django has initiated the solution by putting this as one of Django's Google Summer of Code 2020 ideas list. As mentioned on the page, this project aims to "design and add an abstraction interface over secrets managers that allows users to easily map to an external secret in a settings file". In addition to Django's defined goal, I want to make this really simple for developers to use, just like using django.conf.settings or os.environ anywhere they need it. There should also be flexibilities for developers to extend the predefined logic (to retrieve the secrets) easily.

For convenience in this proposal, we will call the type of config that does not vary between deploys as config and the one that varies between deploys as secret.

Usage and Implementations

This section will explain how basic usage and implementation of the proposed secret manager, how developers extend predefined logic, and some possible additional features.

Developer (Django's Users)

Retrieving secrets

Since we need to give the best design of API to Django's user, as mentioned in one of Django's Forum topics, I think we can give them something that they're familiar with: django.conf.settings and os.environ. I propose to give access to the secrets using a module-level variable, which gives us singleton capability without having the user to instantiate it (this pattern can be found many times in Django: django.apps.app, django.admin.sites.site, etc). Once imported, this module-level variable should have capabilities to be used as a mapping object just like os.environ.

Thus, assuming this secrets module-level variable will live in django.conf.secrets module, developers can do something like this anywhere the secrets needed:

from django.conf.secrets import secrets

secrets['KEY']

Choosing Backend

Django's user can have multiple sources of secrets and we need to facilitate this. As part of Django components, the secrets manager can use the advantages of settings.py to configure its behavior. I propose a SECRET_BACKENDS config to let the users choosing the secrets sources they needed. Its simplest form would be just a list of strings that represent the dotted Python path to secrets backend class. But, to add more flexibilities on configuring the secrets I think it's better to define it as a list of dictionary of backends configuration, just like Django's TEMPLATES and AUTH_PASSWORD_VALIDATORS.

SECRET_BACKENDS = [
    {
        'BACKEND': 'django.conf.secrets.backends.DotEnvSecret',
        'OPTIONS': {
            'PATH': secrets['DOTENV_PATH'],
        },
    },
    {
        'BACKEND': 'custom_secret_backends.GoogleSecretManager',
        'OPTIONS': {
            'URL': secrets['GOOGLE_SECRETS_URL'],
            'CREDENTIALS': secrets['GOOGLE_SECRETS_CREDENTIALS'],
        },
    },
]

The secret backends will be evaluated with the order from top to bottom. This will take advantage when the setup of later secret backend depends on the previous secrets. For example, GoogleSecretManager will need GOOGLE_SECRETS_URL and GOOGLE_SECRETS_CREDENTIALS secrets which can be retrieved from the previous backend. The system environment variables can also be used by default -- unless the Django Community decides its better to exclude from default and give an option to include using an environment variable backend. Even though there are multiple secrets backend, which each secret may share some same variables, the later backend will overwrite it. The reasoning behind this is each Django project will only use one value for each variable on each deployment.

Implementation

Module-level `secrets` variable

The secrets variable will be the container of all secrets loaded from defined backends. This variable should have the capabilities that a mapping object has (similar to os.environ). This includes implementing all mapping methods, such as __getitem__, __iter__, __len__, etc. We can also add some additional functionalities to this container, such as retrieving specific source secrets, reloading secrets, etc.

If you are aware, in the previous example of settings.py, we use the secrets variable to define the SECRET_BACKENDS which will be used to configure the secrets. This sounds like a circular import which can cause a serious problem. Fortunately, Django has tackled this kind of problem and create some lazy functionalities in django.utils.functional module. The snippets below roughly describe how secrets handle a query when it's not configured yet:

from django.conf import settings
from django.utils.functional import empty, SimpleLazyObject
from django.utils.module_loading import import_string

class Secrets:
    _secrets = empty

    def __init__(self):
        self._setup()

    def _setup(self):
        if settings.configured and not self.configured:
            secret_backends = settings.SECRET_BACKENDS
            self._secrets = {}
            for backend_str in secret_backends:
                backend_cls = import_string(backend_str)
                backend = backend_cls()
                self._secrets.update(backend.get_secrets())

    def __getitem__(self, item):
        self._setup()
        if self.configured:
            return self._secrets[item]

        def proxy_getitem():
            if not self.configured:
                self._setup()
            return self._secrets[item]

        return SimpleLazyObject(proxy_getitem)

    @property
    def configured(self):
        return self._secrets is not empty

secrets = Secrets()

The only time this secrets variable would return a SimpleLazyObject is when the django.conf.settings is not configured yet. After being configured, all interactions on this lazy object will be evaluated and used as a normal object.

Secret Backends

The backend class is the key to flexibility in this proposed secret management. Each concrete backend class will represent how the system retrieves the secrets. For now, there will be 3 base backend classes, one for the root base, one for secrets retrieved from some filesystem, and another for secrets retrieved from the internet.

BaseSecretBackend

This backend class will be the root base for all secrets backend classes. Even though current implementation will only have one method, it's better to provide a base so anyone using this base will have future updates from the base. The most important method for this class (and for all of its subclasses) is the method for getting secrets from the corresponding source, which we called get_secrets() from the previous snippet.
BaseFileSecretBackend

This backend class (and its derivatives) will take flexibility in parsing various filesystem formats such as .env, json, yaml, etc. In addition to what BaseSecretBackend defined, we need some other attributes: path to the filesystem, and the format parser.
BaseHttpSecretBackend

This kind of backend will be used to retrieve secrets from the internet through HTTP. As all HTTP requests, there should be HTTP method, request URI, headers, and optional payload. The request URI may be formed as string format, so the users can inject some parameters inside it. After getting response through HTTP request, we still have to parse it to Python's mapping object before we can use it. These parsers may be shared with the filesystem secret parser.

For convenience, I would like to propose two things for all secret backends:

Each secrets backend should give rational default values whenever possible, so users can use the backend with minimum setup. The default values can be discussed with Django Community.
Each secrets backend should have capabilities to retrieve the required parameters from environment variables (or previously loaded secrets) without having to explicitly state them in SECRET_BACKENDS (using 'OPTIONS' key in the first snippet). If the parameters defined in SECRET_BACKENDS, the secrets will use that parameter instead of the one defined in the environment.
Both items above should be well documented, so users can easily refer to the docs when using the secrets.

Extension

When developers need to customize the defined backends, they can easily inherit the most appropriate backend class and override some of the functionalities needed. The custom backend can be used easily by including its dotted Python path in the SECRET_BACKENDS config.

Additional Features

Belows are some possible additional features that can be implemented using the proposed secrets management.

Per-source secret management

Some times, developers might need to use some secrets from a specific source. We can facilitate the need by giving the secrets variable a method for retrieving secrets from a specific source. We may also give a method for checking a key This would also be beneficial for debugging which variables come from which source.

Subset variables from each source

Using 'OPTIONS' in SECRET_BACKENDS, we can specify what variables we expect to retrieve from each source, thus ignoring unnecessary secrets.

Generate source on startproject

We can create a Django management command that create a secrets source file based on the first secret backend configuration in SECRET_BACKENDS. The content might be something that shouldn't explicitly shows up in settings.py, such as SECRET_KEY. With this, we can also introduce and promote this new feature to developers.

Runtime reload

We might need the secrets variable reloads from the source in runtime. But I think we should check whether the reload without restart will works well with all Django components.

Auto decode base64

Some secrets might need binary data that can't be represented using normal characters. That's when the base64 comes to rescue. Actually I was inspired by how Kubernetes require the value of their secrets as base64 encoded.

Milestones

The list below describes roughly what I've decided to work during the GSOC coding period by default. This list might change if needed, conforming to what the community wants. I've ordered the list in the matter of importance and put the weight (difficulty level) of each task.

Module-level secrets variable
- Loading from backends relatively easy
- Overriding the same variables based on SECRET_BACKENDS relatively easy
- Lazy resolution of secrets (when the django.conf.settings is not configured yet) relatively hard
- Gradual resolution of secrets (using previously loaded secret for loading the next one) relatively hard
- Per-source secrets management intermediate
- Retrieving only the subset of secrets from each source intermediate
Secret backends
- BaseSecretBackend relatively easy
- Filesystem secrets
  - BaseFileSecretBackend relatively easy
  - DotEnvSecretBackend intermediate
  - JsonSecretBackend intermediate
  - Other filesystem backends
- Remote secrets
  - BaseHttpSecretBackend intermediate
  - AnsibleVaultSecretBackend relatively hard
  - HashicorpVaultSecretBackend relatively hard
Miscellaneous
- Generate .env (or other preferred formats) and SECRET_BACKENDS settings on startproject intermediate

I pick the DotEnvSecretBackend and JsonSecretBackend for filesystem secrets and either AnsibleVaultSecretBackend or HashicorpVaultSecretBackend for remote secrets because I think those are the most popular source of secrets for their category. If time allows, I will implement other secrets backend as well.

Timeline

This timeline was designed based on the Google Summer of Code 2020 Program Rules

Community Bonding

May 5, 2020 - June 2, 2020

I will use this time to make some adjustment about the things listed (but not limited to) below, by asking to the Django Community.

Where to place the secrets module
Secrets backends default values
Secrets backends required parameters environment variable naming
Deciding implementation priority (secrets backend, additional features, etc)

Coding

June 2, 2020 - August 25, 2020

First Phase

The first phase will about working on basic implementation of secrets and some of the secrets backend.

Week 1-2: June 2, 2020- June 14, 2020

Due to the COVID-19 outbreak, the calendar at my university was pushed back for 2 weeks and will have the final exam on June 2 - June 10. For this period of time, it will be hard for me to do the work. But, I will make sure by the end of June 10, I will have the work on BaseSecretBackend, BaseFileSecretBackend, and DotEnvSecretBackend done. The rest of the week will be used for implementing JsonSecretBackend and their test. Those two concrete backends (DotEnvSecretBackend and JsonSecretBackend) will be used for simulating secrets retrieval by the secrets module level variable.

Week 3: June 15, 2020 - June 21, 2020

This week will be used for implementing some basic functionalities of secrets. This includes loading secrets from the backend class, overriding shared variables based on backends order in SECRET_BACKENDS, and selecting the subset of secrets for each secret sources. The implementation of these will include their test to make sure each functionality works as expected.

Week 4: June 22, 2020 - June 28, 2020

The first part of this week will be use for implementing (including the test) the remaining basic functionality, which is per-source secret management. The later part of the week will be used for evaluation and documentation.

Second Phase

The second phase goal is to make the secrets variable usable anywhere anytime using the Django's lazy functionalities.

Week 5 and Half of Week 6: June 29, 2020 - July 8, 2020

In this period of time, I will implement the functionality that makes the secrets can be used inside settings.py without compromising circular import. Since I've worked some part of this (provided in one of the snippets above), I can take the time to make sure that the implementation is working at its best as the concept of laziness is still a pretty tricky concept for me.

Half of Week 6 and Week 7: July 9, 2020 - July 19, 2020

In this period of time, I will implement the functionality that makes the secrets can use previously loaded secrets to configure the next secrets backend. This implementation will require more advanced trick on using laziness concept.

Week 8: July 20, 2020 - July 26, 2020

I will use this period of time flexibly based on some situation. If there was a problem in the previous week, I'll make sure to finish the task this week. I also can do the implementation for generating .env file on startproject when the times fit. Or maybe I will just start the next task early. Whatever the situation I will make some time to do the evaluation and documentation of this phase.

Final Phase

In the final phase, I will work on the HTTP API based secrets manager. This require some exploration first, as I have limited experience on using this kind of secrets manager.

Week 9: July 27, 2020 - August 2, 2020

This week will be used for exploring HTTP API based secret managers. Some main information that I will look for: the request schema for users to take their secret out of the secret manager, the response format of the secrets, and if there's a CLI tool, does the tool use some environment variables to retrieve the credentials. While doing the exploration, I will start the implementation for BaseHttpSecretBackend. I will also do some setup on one of the secret managers for me to use when implementing the secrets backend.

Week 10-11: August 3, 2020 - August 16, 2020

This week will be used for implementing the secrets backend of one of the HTTP API based secret managers previously explored. The biggest candidate is either HashicorpVaultSecretBackend or AnsibleVaultSecretBackend. Considering I have limited experience in using any of HTTP API based secret managers, I estimate this work will be done in two weeks.

Half of Week 12: August 17, 2020 - August 19, 2020

I will do some documentation for this phase in this period of time.

End of Final Phase: August 20, 2020 - August 25, 2020

In this period of time, I aim to iron out any issues left and do the final evaluation for all of the previous implementations. After the issues are solved, the whole work will be ready for Django's core developers to be merged into the master codebase.

Q&A

Would this replace the use of os.environ?

It's not meant to replace the os.environ. If developers need to retrieve environment variables only, they still can use the os.environ instead of the secrets variable. Indeed, this secrets variable includes the os.environ variables by default, but it will be mixed up with other variables from configured secret sources. So it is still easier to use os.environ if developers want to retrieve environment variables only.

Can we remove the need for multiple settings.py?

The multiple settings.py approach has its own purpose -- it has its own wiki page. There are some logic that can reside inside the Python settings.py files that can't represented by secrets. But using secrets, developers can tune their settings.py files more conveniently.

Conclusion

The secrets module-level variable that has laziness capability make developers can easily use it anywhere anytime like django.conf.settings and os.environ. Developers can also tweak the secrets retrieval so easily thanks to SECRET_BACKENDS settings. Using this approach, it will make the future development of this secrets management easy. And for a bonus point, this change will not break any of your current project when you update the Django version.

About me

Hi! My name is Muhammad Ashlah Shinfain, my friends call me Ashlah. Currently, it's my final (4th) year as Computer Science student at University of Indonesia. I lived in Depok -- a city next to Jakarta -- Indonesia (UTC+7).

It's my 4th year knowing Python and has been using Django for 3 years now. There are several projects that I've done during these 3 years with Django. Most notably is when I lead an organization's dev team for one year. This is when I learn so much about Django. This dev team responsible for developing and maintaining the organization's system, such as recruitment system, publication request system, and book lending system. Another project that I'm proud of is when I single-handedly develop an API service for a health tracker mobile app. This is when I got the feel on how to code properly. Because I'm the only one who manage this API service development, I can easily maintain the best practices used in the project.

Recently when I code, I often look into its source code to learn the pattern they use and best practices the conform. It can be said that I learn to code by examples. My favorite packages that I use, and usually became my code style reference, are django-cas-ng, django-allauth, django-rest-framework, and of course django itself. For Django projects, I often refer to djangoproject.com and readthedocs.org, which I found as two of the most popular Django project from djangopackages.org.

Currently, my contribution to Django codebase are these two PRs: django/django#12596 and django/django#12591. Although it's been reviewed, it's not merged yet. Even after finishing this GSOC milestone, I think Django will remain to be my favorite open source project to contribute.

If there's something to be discussed, I can be reached through my email: [email protected]. If you want to know me more, you can see my GitHub profile and my StackOverflow profile. I also shared my developer journey on Twitter

hashlash/gsoc-django-secret-manager.md