Skip to content

Instantly share code, notes, and snippets.

@adamcin
Last active October 25, 2016 12:20
Show Gist options
  • Save adamcin/9906117 to your computer and use it in GitHub Desktop.
Save adamcin/9906117 to your computer and use it in GitHub Desktop.
Replace Adobe's Dispatcher module for Apache with one built on Sling.

What is Dispatcher?

From Dispatcher - docs.day.com:

"Dispatcher is Adobe Experience Manager's caching and/or load balancing tool. Using AEM's Dispatcher also helps to protect your AEM server from attack. Therefore, you can increase the security of your AEM instance by using the Dispatcher in conjunction with an enterprise-class web server."

Specifically, the Dispatcher module is used to perform these critical functions:

  1. URL-based Security Filter
  2. HTTP Response Cache
  3. Load Balancer

Among these roles, the security filter and response caching responsibilities tend to be more tightly-coupled with application design and behavior and less tied to production scaling concerns, to the extent that they really should be developed alongside the application code itself, and packaged, deployed, and tested using the same continuous integration cycle.

The Load Balancer feature has a useful place between the cache and the render pool, but its ideal configuration is really specific to each environment, so it should not be so tightly coupled to the other two functions.

Configuring Apache/Dispatcher is a pain.

First, let me say that Dispatcher is a tried-and-true component in the overall AEM architecture. It doesn't necessarily need replacing, but I have a few gripes that I feel could be addressed to make the AEM developer experience much better, especially for deploying to a cloud environment. These generally have to do with improving mechanisms for configuration, deployment, and testing.

  1. AEM app configuration management and deployment management is awesome. Deploy a platform-independent CRX package with all code and all configurations to all servers over HTTP and let the Sling Run Modes in each environment determine which features of your application are active.
  2. Apache/Dispatcher configuration management sucks. Some of it is filesystem specific, other aspects are platform-specific. And while the Apache configuration format allows references to environment variables, the dispatcher.any does not.
  3. System integrators often use Windows for development, but generally deploy Apache on RHEL/CentOS. Developers therefore require a completely different Apache distribution and setup process even before embarking on the significant effort involved in getting Apache configured in the same way as the linux version.
  4. The coupling of Dispatcher to Apache encourages the use of many anti-patterns, like using mod_rewrite to implement both the security filters and internal path rewrites (so that one is tightly coupled with the other).
  5. Last but not least, the Dispatcher module is completely propietary and is not open-source, in contrast to 80% of the application that it supports.

I want a Sling Reverse Proxy Module with Dispatcher features.

I think the security and caching features of Dispatcher would be better implemented as a reverse proxy module for Sling, rather than as a module for the web server. Only the proxy listener port and the cache root directory should be configurable using immutable startup parameters. The rest should be configurable using sling:OsgiConfig nodes deployed to the Sling repository.

Imagine the Possibilities!

Imagine that your local AEM quickstart publish install also included Dispatcher listening on a standard port like 8083, configured out-of-the-box with the recommended path filter.

Imagine that a maven build produced a single package artifact which also included the same filter and cache configuration that would be active in production.

Imagine that cache eviction/invalidation logic was extensible, perhaps via a custom Sling Servlet, or even via subscription to JMS.

Imagine that the Dispatcher-related items in recommended security checklist could be implemented using a single content package deployed to all environments.

HTTP Pipeline Architecture

  1. Sling listens on standard felix http port for proxy configuration/cache management requests, in addition to app server functionality.
  2. Proxy service listens on its own port to completely segregate internal HTTP traffic from external HTTP traffic.
  3. Proxy service should be highly scalable, and its API should encourage a reactive architecture.
  4. Proxy service is composed as a pipeline of layers (RenderPool <- Cache <- Filter... <- Listener), each of which is distinct from the others and configured separately.

Filter

  1. Proxy request security filter should be configurable using RAML, Swagger, or API Blueprints, as in, a portable HTTP API spec from which tests may be generated.
  2. Filter interface should allow custom request/response transformations, to allow for adding/removing request and response headers, as well as for request line rewriting similar to what is supported by Sling Resource Resolver Mappings.
  3. The request API should expose a Sling-like URL model, with first-class handling of selectors, extensions, suffixes, and query string parameters.

Cache

  1. It should be possible to support in-memory AND on-disk caching
  2. It should be possible to support caching of headers AND entities (like caching the Content-Type response header for URLs which do not have an extension).
@steeleforge
Copy link

One of the advantages of pushing the work to the webserver tier is that it is a separate independent process and often on separate resources. apache/dispatcher also becomes a software load balance when budget is an issue, and the performance is admirable. I would assume that in order to approach the response time performance, the pipeline would have to be optimized for serving cached renders and short-circuit/fail-fast at security condition violation or cache-control suggestions from the request before hitting some of the other more costly filters. I wonder what mileage can be granted by NIO2 in java7. Cache eviction is a hard problem to reason about. Which remaining dispatcher features would you leave outside of sling; and where would you leave them (e.g. apache/IIS, varnish, nginx, etc).

@shoebappa
Copy link

@kaisershahid
Copy link

i would love to help with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment