Wagtail feature spec: Multi-tenancy

By Andy Babic, Senior Developer and Wagtail Consultant, Torchbox
23 Aug 2022

What do we mean by Multi-tenancy?

It’s a way to split up large multi-site Wagtail instances in a way that makes it easier for editors to find and manage the things that are relevant to them.

It’s different to multi-site, in that it applies hard limits to what is shown in the CMS for each tenant, creating a user experience more akin to multi-instance (running multiple, separate instances of Wagtail), but with the option to share certain bits of content/data between tenants where appropriate.

Rather than depending entirely on user groups, role assignment, and tying data to sites to make views filterable, the visibility of content within each tenant is strictly controlled by a separate set of rules (managed from outside of Wagtail).

User groups and role assignment is still used to give users permission to see and manage tenanted content, but that is done on a per-tenant basis, using tenant-specific permission groups that aren’t visible across tenants.

It shares the same benefits as multi-site:

Because all tenants use the same database and media storage, cross-linking of data between tenants is possible
You can support both from a single running instance of Wagtail, saving you money on infrastructure.

As well as the same downsides:

There are obvious capacity limitations to serving many sites from a single instance of Wagtail, especially if Wagtail is serving frontend requests as well as backend ones, or generating a lot of image renditions.
Unless you pay close attention to model design and queries, increased volumes of data may cause certain features to run more slowly.

However, there are a few additional benefits over multi-site:

It’s a cleaner experience for editors: Less noise, and less manual filtering needed to find the content / data of interest.
The ‘hard filtering’ of content between tenants means that you can keep things that aren’t meant to be shared private. Think: front-end user data/preferences, feedback submissions, breaking news that shouldn’t yet be seen by the wider organisation.
Should an ‘Admin’ account be compromised somehow, the potential damage is restricted to the tenant(s) they have access to.
You have dedicated user managers for each tenant. Their job is easier, because they only see or manage users and groups for a single tenant, and have more freedom to manage things in a way that works best for the sites in that tenant.

And a couple of extra downsides too:

What is and isn’t visible in each tenant needs to be configured outside of Wagtail by someone that knows what they are doing.
As a product owner, you need to have a clear idea about which content/sites belong to which tenant, and how much of that is shared with other tenants (With multi-site, you don’t really need to think about this, because most things are visible to everyone).
When adding custom models, you have to think a little bit more about how data and access works across multiple tenants (there are helpers, mixin classes and documentation to help with implementation, but it is an added layer of complexity that is inescapable for most things).
It can only be reliably enforced for editors using the Wagtail interface. Management commands and other server-run scripts won’t have any awareness of tenants unless you build it into them (e.g. requiring a tenant_id to be specified when running).
Some degree of user management will need to be done outside of Wagtail (for example, in a separately enabled Django-admin area), where there is cross-tenant visibility of users and groups.

A practical example

Editors sign into https://tenantone.companycms.org to manage site A
Editors sign into https://tenanttwo.companycms.org to manage sites B and C
Editors sign into https://tenantthree.companycms.org to manage sites D, E and F

Site A has a lot of pages that are referenced by the other sites, so read-only access to site A’s pages is granted to the other tenants, allowing them to be selected in link choosers etc. But, management of the pages remains the responsibility of https://tenantone.companycms.org

Sites B and C have the latest and greatest corporate-approved imagery, which is to be used across all other sites. So, read-only access to that tenant’s image collections is granted to the other tenants, allowing the images to be selected for use elsewhere. But, management remains the responsibility of https://tenanttwo.companycms.org.

A couple of the editors from https://tenantthree.companycms.org also work on sections of site A, so additional tenant access is configured for just for them, and they can use the same credentials to successfully sign into https://tenantone.companycms.org OR https://tenantthree.companycms.org (but not https://tenanttwo.companycms.org).

NOTE: Using a separate URL for each tenant is just one way to do multi-tenancy. If you want to serve all tenants at the same URL, Wagtail can identify a default tenant for users based on their permissions, and activate that instead.

How it might be implemented in Wagtail

While requirements have varied a bit between past projects, broadly speaking, I believe the features of multi-tenancy fall into the following strands, which I’ll elaborate on further throughout this post.

Model design: How the tenants themselves are modelled, and how data relates to those tenants in different circumstances.
Tenant management: How/where tenants are defined and configured
Detecting the relevant tenant: How we determine which tenant is ‘relevant’ for any given session/request.
Switching between tenants: How we allow users with access to multiple tenants to see what those options are and switch between them.
Restricting content visibility: How we restrict access to data and/or features in Wagtail depending on the active tenant
User management and permissions: How this works within (and across multiple) tenants, and the interplay between tenant configuration and permissions

1. Model design

Since Wagtail has models to define sites and collections, it makes sense to introduce a Tenant model to store configuration for tenants, allowing them to reference (and be referenced by) other models, and to benefit from referential integrity at the database level.

1A. The default tenant

It will be important to create a ‘default’ Tenant (via a data migration), so that functionality can be updated to function in a consistent way, whether projects intend to use multiple tenants or not.

Like the default Site created by Wagtail, it makes sense for this tenant to have some kind of ‘special status’, so that we can tell it apart from objects created by a developer, and hopefully make the introduction less disruptive to existing projects.

In addition to setting an is_default flag to True, I think it would be useful to have an explicit access_restricted flag, that could be set to False to allow any authenticated editor to access it (this could be convenient for testing new tenant additions, or supporting models where the tenants do not need to be protected).

Regardless of the default tenant configuration, we should go ahead and populate all of the relevant through-models and native_tenant fields to link everything to the default tenant, as this should mean that things continue to work, even if the flag were ‘unset’ by a curious developer.

1B. Reusable models and managers

TenantMember

This would be similar to the CollectionMember model that Wagtail includes for creating model objects that are grouped by collection, only for grouping objects by tenant instead. It would add a (required) native_tenant field to the model, which would default to the ‘default’ tenant, but would be set by Wagtail’s ‘create’ views to the active tenant.

TenantMemberQuerySet

A QuerySet subclass implementing the for_tenant() method, which will be used throughout Wagtail to identify TenantMember objects of various types for display.

The method will have one required argument: The Tenant object to find objects for.

It will also have one optional argument: An include_shared boolean with a default value of False. When True, the return value will also include any relevant objects from the SharedTenantMember table, making it useful in choosers and other contexts where it’s okay to show objects from other tenants.

SharedTenantMember

This concrete model will be used alongside TenantMember to make objects from one tenant visible within others. To start with, it would just have a ForeignKey to the Tenant model, and a GenericForeignKey allowing any custom object to be referenced. Though, this might be extended in future to allow for different kinds of sharing.

1C. Associating sites with tenants

On projects I’ve worked on previously, tenants have had a one-to-one relationship with Site, which would be the only site that could be managed while that tenant were active. I don’t think this is the right approach for Wagtail, for a few reasons:

Tenants should be seen as a layer ‘above’ sites, rather than a rough equivalent. Although it might seem rare, there's no practical reason why multiple sites shouldn’t be managed by (or served from) the same tenant, or why you shouldn't be able to add a site to a previously single-site tenant.
The multi-site features baked into Wagtail are useful and battle tested! Only allowing a single site to be managed via any given tenant could mean having to conditionally apply multi-site behaviour when tenants are activated, which feels messy.
If we plan to create a ‘default’ tenant for all projects (which we do), that tenant will be activated by default when users next access Wagtail. That shouldn’t mean that multi-site no longer works.

Since Wagtail is in control of the Site model, we can update it to subclass the TenantMember abstract model, binding instances to tenants when they are first created, and allowing them to be ‘shared’ between tenants via the SharedTenantMember model.

1D. Associating pages with tenants

While most pages would be indirectly related to tenants (via the Site to which they belong), it’s not always the case that pages ‘belong’ to a Site. For example, the process of creating a new site requires that the root page be created in advance, which is typically done just beforehand. It would be confusing if that newly created page disappeared before it could be selected as a root page.

Since Wagtail is in control of the Page model, we can update it to subclass the TenantMember abstract model, binding instances to tenants when they are first created. However, the for_tenant() implementation from TenantMemberQuerySet won’t suffice for pages. When called with include_shared=True, we’ll look for sites added to the SharedTenantMember table (instead of pages), and include pages from those sites in the result (Sharing individual pages between tenants will not be possible).

1E. Associating images and documents with tenants

Wagtail already groups images and documents via Collections, and popular add-ons (such as wagtailmedia) typically use Collections to group items too. So, linking tenants to collections instead of directly to images or documents feels like the obvious solution here (especially since each item ALWAYS belongs to a collection - even if it’s the just root collection).

Since Wagtail is in control of the Collection model, we can update it to subclass the TenantMember abstract model, binding instances to tenants when they are first created, and allow them to be ‘shared’ between tenants via the SharedTenantMember model.

1F. Associating permission groups with tenants

Groups are very often created to manage content for a specific site (or part of one), so surfacing all groups in each tenant would not be appropriate. Similarly, a user administrator for any one tenant should be able to manage groups freely, without fear of affecting another tenant.

Groups are a unique case, because Wagtail doesn’t control the group model (they are a Django thing). We can get around this by adding a WagtailGroup model, which would have a native_tenant field to indicate which tenant it was created in.

We would use a data migration to create a WagtailGroup object for all existing groups, tying them to the ‘default’ tenant.

We would use a post_save() signal to create a WagtailGroup object for all newly created groups, which would be updated by the relevant view to set native_tenant to the active one.

1G. Associating users with tenants

Users are a special case when it comes to multi-tenancy. While it doesn’t really make sense to share user data between tenants in the same way as for other models, we do need to cater for users being granted access to multiple tenants.

There are two ways in which users are associated with tenants:

The tenant from which they were created (their native tenant)
Plus: Any additional tenants they are permitted to access

Because Wagail has no control over the User model, we can’t use the TenantMember model like we do elsewhere. Instead, we can extend the existing UserProfile model to store this association, and use a data migration to link any existing users to the default tenant.

We would use a post_save() signal to create a UserProfile object for all newly created users, which would be updated by the relevant view to set native_tenant to the active one.

In order to explicitly grant users access to other tenants, developers could add additional tenants to a secondary_tenants ManyToManyField on the UserProfile model.

Cross-tenant username uniqueness

Because the same model data is used by all tenants, two users cannot exist with the same username, even if they are for separate tenants.

Because the model is outside of Wagtails control, we cannot tweak unique constraints to get around this. For now, the best solution is to document this limitation and outline the alternatives (using a different login/user for each tenant, or granting the same user access to multiple tenants).

Safely permissioning users for separate tenants

For projects that wish to use multiple tenants, we would need to provide a custom Django authentication backend that would override ModelBackend._get_group_permissions() to only return permissions from groups for the active tenant.

1G. Associating other Wagtail-controlled models with tenants

As mentioned on the Multi-site, multi-instance and multi-tenancy page of the Wagtail docs, there are a few other models that require thinking about too.

Core models:

Log entries
Workflows

Contrib app models:

Redirects
Search promotions
Global settings

With the exception of Global settings (which will be handled separately), all of these can be updated to use the abstract TenantMember model from the core app.

Tag models are another thing that will likely want to be tenant-specific too, so we can provide a multi-tenancy-compatible Tag model - where new tags would be added to the active tenant only.

1H. Associating snippet and other custom models with tenants

It would only be reasonable to provide developers with a way to split their custom model data between tenants too, so we’ll document the TenantMember and SharedTenantMember classes and encourage use of them for custom models.

2. Tenant management

If your projects are anything like the ones I’m involved in, clients are often reluctant to make decisions about permissions/groups until they absolutely have to, and it’s quite common for users to be created as ‘superusers’ (labelled “Admin” in Wagtail) for a while initially. Because of this, managing tenants (and by extension, tenant membership) via the Wagtail interface doesn’t make a lot of sense. If tenants are implemented properly, most Wagtail users shouldn’t be aware of their existence, even when clicking around as a superuser.

Should we facilitate tenant management outside of Wagtail?

Wagtail would define and register a polished ModelAdmin class to facilitate tenant management via the Django admin area. Developers needing to customise the behaviour could subclass the original, unregister it, then register the customised version in its place.

3. Detecting the relevant tenant

3A. For admin requests (in Wagtail)

We can take a lot of cues from Wagtail’s Site model here. For parity, we should add a for_admin_request() method to the Tenant model, which will be the main hook-in point for most Wagtail views.

It would be worthwhile adding a separate candidates_for_admin_request() method too, making that particular bit of code easier to reuse.

We will avoid adding Middleware, and simply call the method when we (or the developer) needs to, caching data on the current request to speed up repeat calls.

We would always start by identifying the ‘candidate' Tenants for the current user (those which they have some access to), using annotate() and Case to apply some kind of ‘relevancy order’ based on:

How well the hostname and port from the request matches those field values
Whether the tenant is the ‘default’ one or not

Once we have the candidates, we would cache them on the request object for later reuse (e.g. by switcher widgets).

We would then look for an active_tenant value on the current session (and maybe even an active_tenant cookie as a fallback) and try to find a match among the candidates.

If no match was found, the most appropriate candidate would be used, and the active_tenant session and cookie values would be set to that tenant’s id.

If no candidates were found, a PermissionDenied error would be raised.

If there were no clear winner amongst the candidates, a custom MultiplePossibleTenants error would be raised.

The final match would be cached on the request object (as _wagtail_tenant) so that the work wouldn’t have to be repeated if called again.

3B. For admin API requests

These requests should always come from and be served by the admin domain, so I think we can safely reuse Tenant.for_admin_request() here, which can then be used for querysets filtering.

3C. For front-end requests (non headless projects)

Whilst the multi-tenancy projects I’ve worked on previously have all been headless, the vast majority of Wagtail projects are not, so we need to give developers a clear route to using tenanted data safely from front-end code too.

Technically, front-end requests should always be made within the context of a Site, so we can utilise that model’s association with tenants to inform this functionality.

The simplest way to handle this is to add separate for_site(), for_request() and candidates_for_request() methods to the Tenant model. The latter two methods would work in a similar way to for_admin_request() and candidates_for_admin_request(), but with the following exceptions:

They would not require the user to be authenticated
They would not take any session or cookie values into account
They would work purely based on the Site identified via Site.for_request(), and its native tenant.

3D. For regular API requests

This is a little more complicated. Ideally, consumers of the API will all be updated to indicate which tenant they are interested in, and the API would error if no such ID was provided, or no matching tenant existed. However, in the spirit of backwards compatibility, I don’t think we can do this.

TBC - Ideas welcome.

4. Switching between tenants

I have previously implemented multi-tenancy on projects where each tenant is accessed via a unique domain/subdomain, which can be used to identify the tenant of interest. While I feel this is something that would be sensible to support, I’m unsure how desirable it is for every case, so we should explore how multi-tenancy can work on the same domain too.

4A. Potential switcher interfaces

If all tenants were accessed on the same domain, we could show users something like Netflix’s profile-selection interface to choose a tenant after authenticating (complete with user-friendly labels, recognisable images, and fancy animations). A simple ‘Switch tenant’ button in the Wagtail header or sidebar could reload this interface to allow jumping between them.

Alternatively we could just pick and activate a tenant automatically, and provide a simple ‘switcher widget’ in the Wagtail header or sidebar to allow switching between them. This approach might work better for projects serving different tenants from dedicated domains, as the view could redirect users to the correct place. It might also be preferable for projects using MFA or SSO, where the login process might already have more steps than usual.

4B. The ActivateTenantView

This would be a simple view that would respond to POST requests containing the id of the tenant to activate, and a next value (indicating where the view should redirect to once the switch has been made successfully).

The id would be checked against a list of valid candidates for the request/user, and, if a match was found, the specified tenant would be ‘activated’ by setting the active_tenant session and cookie values, and setting the _wagtail_tenant cache attribute on the request.

If the selected tenant had a domain/port value different to the one being currently requested, the user would be redirected away to the Wagtail home page for that tenant, instead of it being activated (that tenant would take care of enforcing authentication again, if required).

If the user did not have access to the specified tenant, the view would raise a PermissionDenied error.

On successful switch, the view would check the next value for suitability, and redirect to that URL (if safe). If unsafe (or missing), the view would redirect to the Wagtail Home page.

5. Restricting content visibility

5A. In listing views

We’ll apply the for_tenant() filter to the main queryset, which should prevent data from other tenants from appearing.

5B. In edit views

We’ll apply the for_tenant() filter to the main queryset, which should result in a 404 error should users attempt to edit something from another tenant.

5C: In create/edit forms

Any form with a ModelChoiceField or ModelMultipleChoiceField could potentially expose data from other tenants, but filtering data at the form field level is difficult because forms and fields usually have no awareness of the current request (and by extension, the active tenant).

The cleanest solution I can think of here is to add a TenantAwareFormMixin with an overridden __init__() method that accepts a required tenant argument, saves a reference to it, calls super(), then calls some kind of util function to apply the for_tenant() filter to any field-specific querysets that support it.

We might need to take a slightly different approach to forms that are assembled by edit handlers, but I’m sure we can make it work.

We might also want to add some way to exclude certain fields from this patching process (e.g. by defining an allow_non_native_selection list of field names on the form), which could be useful when option filtering is handled some other way (e.g. in the chooser view), and selection of shared items from other tenants shouldn’t be classed as invalid.

5D: In chooser views

Any native chooser views (or API endpoints that supply data to chooser views) should be updated to apply the for_tenant() filter to querysets wherever possible. Because choosers are the one place we can expect it to be okay for data from other tenants to appear, we can use for_tenant(include_shared=True) to allow those shared items to appear.

5E: Other places throughout Wagtail

The for_tenant() filter would need to be applied in a few other places throughout Wagtail too, all of which should be relatively simple:

The site-switcher dropdown that appears when editing site settings
The collection-switcher dropdown that appears on image and document listings
Tag filters that appear on image and document listings
Panels for the Wagtail admin homepage
Search views
All built-in reports

6. User management and permissions

6A. Group management changes

The group listing will only surface groups created in the active tenant.
The group creation/edit forms will be updated so that permissions can only be configured for pages native to the active tenant.
The group creation/edit forms will be updated so that only the ‘choose’ permission will be respected for collections that are only shared with the active tenant.

6B. User management changes

By default, users will only be visible to user admins for the tenant they were created in.
The user creation/edit forms will be updated so that only groups native to the active tenant are selectable under the Roles tab.
When updating a user’s roles via Wagtail, only group membership for the active tenant will be affected (membership of groups managed by tenants other than the active one will always be preserved).
When multi-tenancy is switched on, we’ll remove the “Admin” checkbox (which maps to the ‘is_superuser’ field) from user management forms - to prevent the ability for admins in one tenant to elevate a user’s permissions in others (superusers can still be created by those with server access - just not from within Wagtail).

6C. Login view changes

On the login page, If the tenant of interest can be determined from the domain being used, login credentials will be further validated to ensure the user can access that tenant.
If multiple tenants are defined, the validation error message will change slightly to suggest the user check that they are using the correct URL (regardless of the user’s access to other tenants)

6D. The interplay between tenant configuration and permissions

Tenant-specific visibility restrictions can be kept as a completely separate ‘layer’ of restriction to group/user-level permissions. We shouldn’t necessarily have to change permission policies etc to support multi-tenancy. Views can just apply additional tenant-specific filtering to querysets where appropriate (see native_to_active_tenant() and visible_to_active_tenant() from the POC).
Permissions are king: Group assignment still needs to be applied to non-superusers before they can access Wagtail. A custom Django Authentication backend will ensure that only permissions for groups native to the active tenant are respected, allowing users to have independent sets of permissions in each tenant.

thibaudcolas/multi-tenancy.md