Skip to content

Instantly share code, notes, and snippets.

@crodjer
Created March 26, 2012 05:28
Show Gist options
  • Save crodjer/2203174 to your computer and use it in GitHub Desktop.
Save crodjer/2203174 to your computer and use it in GitHub Desktop.
django-security-enhancement-proposal

#About Me:

Hi, I am Rohan Jain, a 4th (final) year B.Tech undergraduate Student from Indian Institute of Technology Kharagpur. I have been using django since over a year and generally look into the code base to find about various implementations. I have made attempts to make some minor contributions and if selected this would be my first major one.

More about Me: http://www.rohanjain.in/about/ Github, IRC: crodjer

Current state

Django currently uses two methods for protection against CSRF:

  • In case of http non SSL requests, a conventional token system is used.
  • In case of https requests, we do referer checking.

Both these system introduce a difference in behaviour. Using cookie-token system provides a side effect feature which lets you access the CSRF token across multiple subdomains. But the other, referer checking looks for the request to come from the same host and hence, doesn't allow for cookie like cross subdomain behaviour.

The fact that there is a CSRF_COOKIE_DOMAIN setting shows that django supports the cross domain capabilities in the CSRF system. So users might utilize this while building multiple sites. When they change to https scheme, they will see a new surprising behaviour.

The modified system

The working tree is at crodjer/django over github.

###Origin and Referer Checking, Permitted domains.

  • Origin header, a fairly new introduction, has CSRF as one of the target. Unlike the referer header, it sends back only the domain and the path information, so the people don't need to disable these through plugins like the referer header.

  • With origin header absent, referer header can be used as a fallback.

  • Strict referer checking is already there in https scheme which is used in all major deployments. We can bring it to the non secure scheme also.

  • With this method a permitted domains functionality is possible, which lets explicit specification of domains which are trusted and can bypass CSRF. A prototype implementation is there at crodjer/django, csrf-enhancements branch.

  • Cons:

    • Since origin checking is still a new concept, it is not implemented yet in major browsers.
    • Referer headers spoofing is also heard of in some old versions of flash.
    • Plugins which disable referer header are popular. A client which doesn't provide origin header and with one such plugin, will fail against this kind of checking.

###CSRF Cookies (Time signed):

  • A random token generated by the server stored in the browser cookies. For verification, every non get request will need to provide a signed version of the same token. This can then be verified on the browser side.

  • This can be implemented by adding signing to the existing csrf token system, using the signing framework.

  • A conventional method of CSRF checks, all the major frameworks have similar systems.

  • Signing takes care of the side effects due to cross domain behaviour of cookies.

  • Cons:

    • Relies on the browser cookies system, which introduces insecurities.
    • Can be broken easily by having a parallel legitimate session, which gives a valid token, signature pair. This generator can then be used in MITM attacks.

Given these two methods, I am slightly inclined at Origin/Referer checking, which is already being completely relied upon in case of the secure requests.

Or we could loose the possibility of permitted domains functionality and implement both kinds of checks.

#Abstract

Django is a reasonably secure framework. It provides an API and development patterns which transparently take care of the common web security issues. But still there are security features which need attention. I propose to work and improved CSRF checking without any compromises and on integration of existing work on centralized token system. If time permits I will also attempt on integration of django-secure.

#Description ##CSRF Improvements

Cross-Origin Resource Sharing (CORS):
W3C has a working draft regarding CORS, which opens up the possibility for allowing client-side request cross-origin requests. This directly triggers in mind the capability to develop API which can be exposed directly to the web browser. This would let us get rid of proxies and other hacks used to achieve this. Currently all the major browsers support this: Chrome (all versions), Firefox (> 3.0), IE (> 7.0), Safari (> 3.2), Opera (> 12.0). Firefox and Chrome send the origin header for both AJAX and standard from POST requests. Introduced it here as some further parts of the post refer to this.

###Origin checking

With CORS around need for using CSRF token can be dropped, at least in some browsers. Ticket #16859, is an attempt for that. But this was rejected because of neglecting the case for presence of CSRF_COOKE_DOMAIN (Refer to the closing comment on the ticket for details). So to handle this we need to simulate checking of CSRF cookie domain as web browsers do it. Maybe:

reqest.META.get('HTTP_ORIGIN').endswith(settings.CSRF_COOKIE_DOMAIN)

In case the server receives an origin header in the request, it will be used for an initial checking and then all the conventional checks will be done. The general security will automatically be improved with the increased market share of newer browsers which support Origin Header.

As the closing comment points it out, we can't do this with secure requests. They need to be essentially checked against the referrer or origin, at least for now. We can not be sure that some untrusted or insecure subdomain has not already set the cookie or cookie domain. To deal with this, we have to consider https separately as it is being done now. So it will be something like:

def process_view(self, request, ....):

    # Same initial setup

    if request.method not in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):

        host = request.get_host()
        origin = reqest.META.get('HTTP_ORIGIN', "")
        cookie_domain = settings.CSRF_COOKIE_DOMAIN

        if request.is_secure():
            good_referer = 'https://%s/' % host
            referer = origin or request.META.get('HTTP_REFERER')
            # Do the same origin checks here

        # We are insecure, so care less
        # A better way for this check can be used if needed
        elif origin.endswith(cookie_domain):
            # Safe, continue conventional checking

        # Do the conventional checks here

If the above were to be implemented, the setting CSRF_COOKIE_DOMAIN should be deprecated for something like CSRF_ALLOWED_DOMAIN which makes more sense.

###Multiple Allowed Domains (was Better CORS Support) Since, already introducing Origin checking, we can go one step further and try to provide better support for CORS for browsers supporting it. A tuple/list setting, which specifies allowed domains will be provided. Using this the various access control allowance response headers will be set when the request origin is from amongst the allowed domains. For CSRF check, just see if http origin is an allowed domain.

def set_cors_headers(response, origin):
    response['Access-Control-Allow-Origin'] = origin

def process_response(self, request, response):

    origin = reqest.META.get('HTTP_ORIGIN', "")

    if origin in settings.CSRF_ALLOWED_DOMAINS:
        set_cors_headers(response, origin)

def process_request(self, request, response):

    # Use origin in settings.CSRF_ALLOWED_DOMAINS here instead of
    # origin.endswith

Probably, something similar to the above will be needed to incorporate the CORS support.

###Less restrictive secure requests

The current CSRF system is pretty much secure as it is. But CSRF protection poses too much restriction to https. It says no to all the request, without honouring any tokens. It kind of has to, thanks to the way browsers allow cookie access. A cookie accessible through subdomains mean that any subdomain secure or insecure can set the CSRF token, which could be really serious for the site security. To get around this, currently one has to completely exempt views from CSRF and may or may not handle CSRF attacks. This can be dangerous. Also if a person has a set of sites, which talk to each other through clients and decides to run it over https, it would need some modifications.

Django should behave under https similarly as it does under http without compromising any security. So, we need to make sure that the CSRF token is always set by a trusted site. Signing the data with the same key, probably settings.SECRET_KEY, across the sites looks apt for this, using django.core.signing. We can have get_token and set_token methods which abstract the signing process. This can be done in two ways:

  • Store CSRF data in sessions data in case contrib.sessions is installed. Then the data will automatically be signed with the secret key or will not be stored in the client as cookies at all.

  • In case of it being absent from installed apps, revert to custom signing

  • Encryption?

from django.core.signing import TimestampSigner

signer = TimestampSigner("csrf-token")
CSRF_COOKIE_MAX_AGE = 60 * 60 * 24 * 7 * 52


def get_unsigned_token(request):
    # BadSignature exception needs to be handled somewhere
    return signer.unsign(request.META.get("CSRF_COOKIE", None)
                         max_age = CSRF_COOKIE_MAX_AGE)

def set_signed_token(response, token):
    response.set_cookie(settings.CSRF_COOKIE_NAME,
                        signer.sign(request.META["CSRF_COOKIE"]),
                        max_age = CSRF_COOKIE_MAX_AGE,
                        domain=settings.CSRF_COOKIE_DOMAIN,
                        path=settings.CSRF_COOKIE_PATH,
                        secure=settings.CSRF_COOKIE_SECURE
                        )


def get_token(request):
    if 'django.contrib.sessions' in settings.INSTALLED_APPS:
        return request.session.csrf_token
    else:
        return get_unsigned_token(request)

def set_token(response, token)
    if 'django.contrib.sessions' in settings.INSTALLED_APPS:
        request.session.csrf_token = token
    else:
        set_signed_token(response, token)

# Comparing to the token in the request
constant_time_compare(request_csrf_token, get_token(csrf_token))

Now, doing this is not as simple as the above code block makes it look. There is a lot which can and probably will go wrong with this approach:

  • Even when the token is signed, other domains can completely replace the CSRF token cookie, it won't grant them access through CSRF check though. Even with signing, they just need to replay an existing good token/cookie pair, which they can get directly from the server any time they want.

  • This sort of couples CSRF with sessions, a contrib app. Currently nothing except some of the other contrib apps are tied up with sessions. It will break if sessions were to be removed in future or the API changed. Also, this means that if one website is using sessions CSRF, all of the other must be too. It would actually kind of be a step because of the coupling.

  • If this were successfully implemented, is this exposing any critical security flaws otherwise? Will it cause compatibility issues?

  • Encryption itself comes with its own issues. It will need high considerations.

As Paul McMillan said "This is a hard problem", I'll delegate figuring this to future me. I will look into The Tangled Web and Google's Browser Security Handbook for ideas, again suggested by Paul on the IRC.

##Centralized tokenization There are multiple places in django which use some or other kinds of tokens:

  • contirb.auth (random password, password reset)
  • formtools
  • session (backends)
  • cache
  • csrf
  • etags

Token generation is pretty common around the framework. So, instead of each application having its own token system, and hence needs to be maintained separately. There should be centralized token system, which provides an abstract API for everyone to loose. In fact, I have seen that some apps use User.objects.make_random_password from contrib.auth, which they can be sure of being maintained in the future for random generation. To me this looks kind of weird. In last djangocon, a lot of work regarding this was done over Yarko's Fork.

I had a discussion with Yarko Tymciurak regarding this. The work is nearly ready for a merge, only some tasks left. I can work over these to insure that the already done significant work gets in django and is updated for 1.5.

  • Porting more stuff to the new system (README.sec in yarko's fork)
  • Testing - See if the current coverage of the tests is enough, write them if not.
  • Compatibility issues
  • API Documentation

I will study the changes done at djangocon and then attempt the tasks mentioned above.

##Integrating django-secure A really useful app for catching security configuration related mistakes is [carljm's django-secure][djang-secure]. It is specially useful to find out issues that might have been introduced while quick changes to settings for development. This project is popular and useful enough that it can be shipped with django. I haven't been able give this enough time yet. I can think of two ways of integrating this:

  • Dropping it as a contrib app
    This seems pretty straight forward would require minimal amount of changes.

  • Distribute around the framework:
    Like CSRF, this can also be distributed framework wide and hence it won't be optional to have. Apps can still define custom checks in the same way when django-secure was installed as a pluggable application.

The app might also need some changes whilst being integrated:

  • More security checks, if required
  • Adjust according to the changes introduced above.

#Plan I think that the tasks CSRF enhancements and centralized tokenization will be enough to span through the SoC period. If after a thorough implementation and testing of these, I still have time, django-secure integration can be looked into.

Roughly this proposal can span over a maximum of 5 tasks. Each task will generally have the following steps:

a. Initial Research. Design decisions
b. Implementation with minor parallel tests.
c. Thorough and regression testing to to achieve security quality.
d. Configuration/Settings changes and handle compatibility issues.
e. Documentation.

Tasks (with most effort requiring steps in parenthesis):

  1. Origin Checking (b, c)
  2. Multiple Allowed Domains (b, c)
  3. Less restrictive CSRF checking over HTTPS / CORS for HTTPS (a, b)
  4. Unified Tokenization (a,c,e)
  5. Integration of django-secure (d,e)

I'll be using my fork of django over github. I'll probably use the following branch names: csrf-enhancements (origin checking, multiple request domains etc) centralized-tokenization (djangocon2011-sec)

##Timeline Week 1: Task 1.a, 1.b.
Week 2: Task 1.c, 1.d
Week 3: Task 2.a, 2.b. Start task 3.a
Week 4: Task 2.c, 2.d
Week 5: Task 1.e, 2.e (Doing these together might be beneficial)
Week 6-7: Complete 3.a. Task 3.b
Week 7-9: Task 3.c, 3.d
Week 10: Task 3.e
Week 11-12: Tasks 4.abcde (max possible)
Week 13: Complete Task 4 and maybe Max of Task 5

I am sorry for writing these as if written by a bot, the deadline was so close so had to adopt this method.

##Resources

###Tasks

  • Ticket for origin header checking support: #16010
    The most recent patch in this ticket forces the origin check without considering CSRF_COOKIE_DOMAIN setting, which seems the reason for the patch to be not accepted.

  • CSRF Improvement ticket: #16859

    • Adding CSRF to sessions:
      In case a project has sessions app included, using of session cookie can be beneficial, as we won't need to use an extra cookie for csrf. Also this automatically saves for any chance of tempering with the token in whatever session store is being used, db or cookie.

    • Signing of CSRF token:
      Complicating CSRF cookies would mean to have two places which rely on browsers to be maintained In case the session app is there and the above sessions is used with csrf, there won't be any need to do separate sigining. Otherwise, the cookie can be signed, probably using django signing utility like it is done in sessions signed cookie store.

  • Integrating django-secure
    I looked into its source code and was pretty much able to understand the app. Should it be integrated as a contirb app, maybe contrib.security?

  • Centralizing randomized token issuance and validation:
    Some work done over it in DjangoCon 2011, at https://github.com/yarko/django. TODOs: https://github.com/yarko/django/blob/djangocon2011-sec/README.sec Merging, Testing and Documentation

###External

Python Security django page: Some posts about security regarding django.

Origin Header: Origin request header usable for CSRF checking. Not implemented in all browsers. Mozilla page, W3 spec

###Mailing List Post about info leakage (I am unsure about it and its old).

###Problem Definition

Django has developed many security features over time. The existing set of security features is pretty good, but there's lots of room for improvement. Much of the work in this project will be related to cleaning up existing code to make it more obviously secure, eliminate edge cases, and and improve fallback handling.

Some potential areas of work include:

  • Enhancing CSRF protection: #16859
  • Centralizing randomized token issuance and validation
  • Integrating carljm's django-secure project
  • Building an interactive admin dashboard to display and check installation security parameters
  • Targeted Code audit for a specific list of security errors

While an interest in security will make these tasks more interesting, most of them don't require you to be a security expert already. Your mentor will make sure your plan is correct before you code, and carefully review your work before it is committed to trunk. Most of these tasks will be significantly easier if you already have some familiarity with Django's codebase. A successful application will have a plan which selects related areas of work, provides details, and has a good estimation of complexity for the proposed tasks. Remember that (especially for security work) a good patch often has more lines of tests than code changes. An ideal applicant will be able to demonstrate the skill with Python and attention to detail necessary to make fundamental changes to Django without breaking existing code.

Ideas that will probably not be accepted:

  • Adding database or cookie encryption support (unless you can provide a secondary mentor who is a crypto expert)
  • Proposals that strongly couple sessions with CSRF or Auth
  • Proposals to include external libraries in Django

If you are interested in working on this project, please talk to us sooner rather than later! PaulM is usually available on IRC, and wants to help you write a really good application.

[CSRF Protection Page][csrf-proection]

Centralized Tokenization

Tokens strings are common in web projects and applications. The tokenization library provides an abstraction over hashlib and random.

It provides two kind of tokens, RandomToken for random strings and HashToken for hashed strings:

  • RandomToken([length]) Class for generating randomized tokens. Has an optional argument length, which defaults to 32.

  • HashToken([value, algorithm]) Class for generating hashed tokens based on an initial string. Has optional arguments:

    • value (default ''): The string input, a hashed version of which is required.
    • algorithm (default 'sha256'): The hashing algorithm to be used. Options are - md5, sha1, sha256.

A token object provides a methods to retrieve token strings in multiple ways:

  • token.hex() Token in hex character set

  • token.digits() Token consisting of only digits.

  • token.alphanumeric() Token consisting of alphanumeric characters (digits, uppercase alphabets, lowercase alphabets).

  • token.lower_alphanumeric() Token consisting of alphanumeric characters (digits, uppercase alphabets, lowercase alphabets).

  • token.readable_alphabet() String which has a subset of alphanumeric characters easily distinguishable by humans from one another.

  • token.custom_chars(chars) Takes in a string of custom character set in which the token string should be returned.

Apart from these, a HashToken object also has two more methods/properties:

  • hash_token.update(value) Update the value associated with hash object.

  • hash_token.digestmod The hashlib constructor which returns a new hash objects. Useful to feed in to functions like hamc.new(key[, msg[, digestmod]]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment