Post-Prisma data handling

As Prisma 1 is feature-frozen and Prisma 2 is going in some directions (no generated SDL schemas, code-first) that might not fit our needs, I've started research on some alternatives.

Data Layer

Prisma 2

PROS:

no separate server needed
nested mutations (OpenCRUD)
similar api
declarative datamodel (though not SDL)
MySQL server

CONS:

lots of regressions (no cascade deletes, no Json type, etc)
code-first schemas, no automatic SDL generation
non-SDL datamodel
very far away from stable release

Sequelize / Knex

PROS:

MySQL server, so no new technologies for ops to support
good documentation

CONS:

no nested mutation support
imperative datamodel (with a convoluted api)
code-first schemas, via a 3rd party libraries (with small support communities)

Dgraph

PROS:

native graph db with first-class GraphQL support
declarative SDL datamodel (including interfaces, which would eliminate a lot of code and complexity)
edge properties support (facets) which would eliminate a lot of code and complexity (e.g. position on parent/child relations)
extensive search indices supported by default, including term and full-text search
very stable and fast
nested mutations
underlying RDF syntax simplifies maintenance and batch import / export of data
first class client-id / temporary-id support

CONS:

new tech, so more work for ops to support

Neo4J

PROS:

native graph db with first-class GraphQL support (though less powerful than dgraph's)
declarative SDL datamodel (with partial support for interfaces)

CONS:

no nested mutation support
potentially slow
CYPHER queries are hard to learn and reason about
new tech, so more work for ops to support

Ideas for Implementing Dgraph

Because Dgraph supports both GraphQL and GraphQL+-, we can take advantage of the full power of both.

GraphQL for direct passthroughs, and for datamodel SDL definitions
GraphQL+- for automatic aggregations, functions, and useful graph traversal stuff (k-shortest-path, etc)

Schema Changes

We'd still maintain two SDL schemas, one for the database and one for the API, and (like with Prisma 1) we'd be able to use and overwrite the db schema at the API layer. Instead of simply importing it with graphql-import however, we might be able to make use of Apollo's schema federation.

Generating Data Layer Queries from Client Queries

If we add a @computed directive to the API schema, we'll be able to be more declarative when dealing with computed db fields and serving data from other (REST) APIs:

Setting @computed on a field would tell our Dgraph query generator to ignore that field when making calls to the db, as that field only exists at the API layer. This can also be used for fields (and types) that are composed from other APIs.

Setting @computed(from: ["fieldA", "fieldB"]) would tell our Dgraph query generator to ignore that field, but include the underlying db fields it depends on (if they're not otherwise being fetched). Thus our field resolvers wouldn't need to fetch any data from the db themselves.

Setting @count(from: "fieldA") would tell our Dgraph query generator to ignore that field, but include a count function against a different field. This allows for very fast aggregate queries in the db itself. In the future, we could have other directives that correspond to GraphQL+- features.

One huge win for using Dgraph would be that each query to our API would generate at most a single query to the database, completely eliminating the N+1 Problem without needing complicated caching strategies or dataloaders. In the future, we could make this even more efficient by checking permissions beforehand and adding a filter to the generated Dgraph query, so the database would only return results that the user is able to view (right now we're doing that kind of filtering after the db is accessed, which is wasteful and a potential attack vector).

Query Example

Database Schema

type Content {
  id: ID!
  label: String
  # By saving rich text as both plaintext and its raw JSON value,
  # we allow for rich searching and filtering
  title: String @search(by: [term, fulltext]) # dgraph directive
  # dgraph doesn't actually support scalar JSON values, so in reality this would be a string
  # that we'd parse / stringify as needed at the API layer
  rawTitle: Json
  attachments: [Content!]!
  userId: Int # This ID is used to call a User API
}

API Schema

extend type Content {
  title(format: Format): RichText @computed(from: ["rawTitle"])
  attachmentCount: Int! @count(from: "attachments")
  user: User @computed(from: ["userId"])
  randomNumber: Int @computed # no field dependencies
}

# If the User API also served GraphQL, this could be federated
# Assume that it's a REST API
type User {
  id: Int!
  name: String!
}

enum Format {
  HTML
  PLAINTEXT
  RAW # raw rich text JSON
}

Client sends a GraphQL query

{
  contents(filter: { title: { anyoftext: "api design" } }) {
    id
    label
    title(format: HTML)
    user {
      name
    }
    randomNumber
  }
}

API generates a single Dgraph query

If the incoming query doesn't ask for a field with @count, we can create a GraphQL query based on the incoming query and computed fields:

{
  contents(filter: { title: { anyoftext: "api design" } }) {
    id
    label
    rawTitle # from title
    userId # from user
  }
}

If the incoming query does ask for fields with @count (e.g. attachmentCount), generate a GraphQL+- query instead:

{
  contents(func: anyoftext(title, "api design")) {
    id
    label
    rawTitle # from title
    attachmentCount: count(attachments) # from attachmentCount, also used as the alias
    userId # from user
  }
}

API passes the resulting data to resolvers

title resolver formats rawTitle based on format argument
user object resolver calls the user dataloader to fetch data from the user API via userId
randomNumber resolver generates and returns a random number

Note that we could implement the user loading via field resolvers on other types, but in a complicated data model it's cleaner to specify resolvers once, on nodes rather than edges.

Client receives resulting data

{
  "contents": {
    "id": "0x10d8a7sd8s",
    "label": "Bundle",
    "title": "The <em>Cool</em> Bundle",
    "attachmentCount": 23,
    "user": {
      "name": "Nelson Pecora"
    },
    "randomNumber": 42
  }
}

nelsonpecora/post-prisma.md

Data Layer

Prisma 2

Sequelize / Knex

Dgraph

Neo4J

Ideas for Implementing Dgraph

Schema Changes

Generating Data Layer Queries from Client Queries

Query Example

nelsonpecora commented Nov 8, 2019

Uh oh!