Caching in GraphQL

Since Heighliner is the center point of our data across all future Apollos projects, and has the potential to inform the creating of Rock's GraphQL endpoint, efficiency at scale is critical. In order to feel instant for our users, we have a high level of caching as part of our stack.

Why are we building this

Due to the newness of GraphQL and the new paradigms it creates, there is almost no writing about caching within a GraphQL app, nor are there any available tooling / libraries we can use. Caching (more importantly cache invalidation) is the best iterative way to get to reactive GraphQL in my opinion. GraphQL is also a declarative data system. Unlike REST API's a query could fetch from any number of data sources each request depending on the fields. Instead of interacting with the request AST, we cache the data interface level which supports cross type and varying request cache hits.

We also don't run GraphQL as the only point of data mutation in our application stack. Even in the long term, Rock will manage its own data, as will other platforms we integrate with. We cannot rely on mutations in GraphQL to be the only method for cache invalidation.

At the end of the day, we are building this because speed is one of our core values. We believe that every 100ms we delay someone from interacting online, is a potential next step not taken. As we launch the new native app and continue to provide complex interaction points for our people, we need a system that can keep up.

What we are building

We are writing a caching layer that sits on top of the Heighliner application which handles cache reads, cache writes, and cache deletes. It is used both per model, and per method to allow granular cache control. It uses an abstracted interface so any caching backend (key:value) can be implemented as the data store.

Cache structure

Since GraphQL has its own typing system, we use it to create our caching structure. Each modeled schema type (for instance Person) has a top level cache tag. Within each tag is two layers: Entity, which is for individual database objects, and Queries, which are used to link an arbitrary set of arguments to a result of entity ids. // XXX developing...

Interface / API

`@this.cache()`

The base level Heighliner class that all models extend from, exposes a caching decorator that can be used to denote that a method should interact with the caching layer. Application specific models, can enhance this method to the data modeling of their application (i.e. the Rock model vs the EE model). @this.cache is used to describe the cache read / write methods.

@this.cache() is ONLY aware of the arguments and the model called within to create caching rules. It has no knowledge of the result set from the query.

@this.cache takes following arguments (all of which are optional)

Type `string`:

The first argument for @this.cache is the GraphQL type that is represented by the result set. It is a string value representing the name of the type. It defaults to the model's stated type using this.__type in the class definition.

Key `(...args) => string`:

The second method should be a function which takes the arguments of the decorated method, and returns a unique key which can be replicated using the same arguments. It defaults to taking the first argument (assumed to be the id) and creating a global object id from it using this.__type as the type name.

Conditional `(...args) => boolean`:

The third argument should be a function which takes the arguments of the decorated method, and returns a truthy or falsy value to determine if the cache should be read. It defaults to looking at the last argument for a ["cache"] field on an object. If their is an object, and that key is found, it will use its value. Otherwise, it defaults to always reading from the cache.

Cache Key

When a method is decorated with @this.cache, it expects the method to return a promise which eventually resolves into a single object, or collection of objects (Object | Object[]) which have a unique id field specified by the application model (defaults to id). The resulting cache key that is used to fetch the data is Type:(Type:id)

Example usage

@this.cache() // default caching settings
public async getFromId(id: string): Promise<IPerson> {
  return PersonTable.findOne({ where: { Id: id }});
}

// Person:(Person:1234) = { firstName: James, lastName: Baxley }


@this.cache(this.__type, ({ guid }) => guid)
public async getFromGuid({ guid }): Promise<IPerson> {
  return PersonTable.findOne({ where: { Guid: guid }});
}

// Person:(Person:3B013E6F-477F-427D-9571-C52C8037A3D6) = { firstName: James, lastName: Baxley }


@this.cache(this.__type, id => id, ({ modifiedDate }) => modifiedDate < new Date())
public async find(id, { modifiedDate }): Promise<IPerson> {
  return PersonTable.findOne({ where: { Id: id }});
}

// Person:(Person:1234) = { firstName: James, lastName: Baxley } only if modified date is in the past

`@this.invalidate()`

The base level Heighliner class that all models extend from, exposes a caching invalidation decorator that can be used to denote that a method should interact with the caching layer and presents a method to invalidate the resulting cache of itself. Application specific models, can enhance this method to the data modeling of their application (i.e. the Rock model vs the EE model). @this.invalidate is used to describe the cache invalidation methods.

The goal of @this.invalidate is to define result => cache dependencies. It is a way of describing what makes up a query, and when it should be removed. @this.cache includes the default @this.invalidate if not included. It should always be included after @this.cache.

The caching system of Heighliner relies on an event driven IO (pub/sub) model to invalidate cache. When a model is saved (POST, PUT, PATCH) or removed (DEL), it triggers an event based on its entity type. All methods that have subscribed to this type using @this.invalidate can potentially be rerun to update the cache depending on their needs.

@this.invalidate takes following arguments (all of which are optional)

Entity Types `string[]`:

The first argument is an array of database entity types which are potential invalidators for this model. These are used to register listeners on the cache event stream for model changes. It defaults to this.__entityTypes.

Comparison (`entity, results, ...args) => boolean | Promise<boolean>`

The second argument is an iteratee which acts as a way to compare if a changed entity should trigger a cache removal. It takes the changed entity, the results of the previous query, and the arguments from the wrapped method and expects to eventually get back a truthy or falsy value determining if the cache should be cleared. It defaults to entity.id === result.id. You can think of this like shouldComponentUpdate from react

Options `{ expiration: number, shouldRefetch: boolean, /* ??? */ }`

The third argument expects an object with general cache settings. The only currently supported keys on said object are an expiration value in seconds for the cache to expire and a setting to determine if the cache should be refetched after being cleared.

Note If no previous result found, @this.invalidate is not run because there is nothing to clear

@this.cache() // default caching settings
public async getFromId(id: string): Promise<IPerson> { // id = 1
  return PersonTable.findOne({ where: { Id: id }});
}

// cache POST { type: Person, id: 1, action: "save" }

// types match:  ["Person"].indexOf("Person") > -1
// comparison matches: (Person, Result) => Person.id === Result.id


@this.cache("PhoneNumber")
@this.invalidate(["PhoneNumber"], ({ PersonId }, _, id) => PersonId === id)
public async getPhoneNumbersFromId(id: string): Promise<IPerson> { // id = 1
  return PhoneNumberTable.find({ where: { PersonId: `${id}` } });
}

// cache POST { type: PhoneNumber, id: 100, action: "save" }

// types match:  ["PhoneNumber"].indexOf("PhoneNumber") > -1
// comparison matches: ({ PersonId }, _, id) => PersonId === id


@this.cache("GroupMember")
@this.invalidate(
  ["GroupMember"],
  ({ GroupId, Id }, results) => _.find(results, { GroupId }) || _.find(results, { Id })
)
public async getFamilyFromId(id: string | number): Promise<any> { // id = 1
  return GroupMember.db.query(`
    SELECT GroupMember.*
    FROM [GroupMember] gm
    LEFT JOIN [Group] g ON gm.[GroupId] = g.[Id]
    LEFT JOIN [GroupMember] GroupMember ON GroupMember.GroupId = g.Id
    WHERE gm.[PersonId] = ${id} AND g.[GroupTypeId] = 10
  `).then(([members]) => members);
}

// cache POST { type: GroupMember, id: 30, action: "save" }

// types match:  ["GroupMember"].indexOf("GroupMember") > -1
// comparison matches:
//    ({ GroupId, Id }, results) => _.find(results, { GroupId }) || _.find(results, { Id })


@this.cache("Group")
@this.invalidate(null, null, { expiration: 3600 })
public async findByAttributesAndQuery(/* args */) {
  // return WileySort™
}

// expires every hour

Goals

These two APIs should hopefully cover almost all use cases for caching. Ideally these APIs are impactful enough to get us close to the goal of 95% cache hits. They should be somewhat self documenting in their usage and meaning as well.

jbaxleyiii/caching.md

Caching in GraphQL

Why are we building this

What we are building

Cache structure

Interface / API

@this.cache()

Type string:

Key (...args) => string:

Conditional (...args) => boolean: