Skip to content

Instantly share code, notes, and snippets.

@nelsonpecora
Last active November 13, 2019 06:20
Show Gist options
  • Save nelsonpecora/62fdbc9032f76779ddbfd83e1a0aea5b to your computer and use it in GitHub Desktop.
Save nelsonpecora/62fdbc9032f76779ddbfd83e1a0aea5b to your computer and use it in GitHub Desktop.
Post-Prisma data handling

As Prisma 1 is feature-frozen and Prisma 2 is going in some directions (no generated SDL schemas, code-first) that might not fit our needs, I've started research on some alternatives.

Data Layer

Prisma 2

PROS:

  • no separate server needed
  • nested mutations (OpenCRUD)
  • similar api
  • declarative datamodel (though not SDL)
  • MySQL server

CONS:

  • lots of regressions (no cascade deletes, no Json type, etc)
  • code-first schemas, no automatic SDL generation
  • non-SDL datamodel
  • very far away from stable release

Sequelize / Knex

PROS:

  • MySQL server, so no new technologies for ops to support
  • good documentation

CONS:

  • no nested mutation support
  • imperative datamodel (with a convoluted api)
  • code-first schemas, via a 3rd party libraries (with small support communities)

Dgraph

PROS:

  • native graph db with first-class GraphQL support
  • declarative SDL datamodel (including interfaces, which would eliminate a lot of code and complexity)
  • edge properties support (facets) which would eliminate a lot of code and complexity (e.g. position on parent/child relations)
  • extensive search indices supported by default, including term and full-text search
  • very stable and fast
  • nested mutations
  • underlying RDF syntax simplifies maintenance and batch import / export of data
  • first class client-id / temporary-id support

CONS:

  • new tech, so more work for ops to support

Neo4J

PROS:

  • native graph db with first-class GraphQL support (though less powerful than dgraph's)
  • declarative SDL datamodel (with partial support for interfaces)

CONS:

  • no nested mutation support
  • potentially slow
  • CYPHER queries are hard to learn and reason about
  • new tech, so more work for ops to support

Ideas for Implementing Dgraph

Because Dgraph supports both GraphQL and GraphQL+-, we can take advantage of the full power of both.

  • GraphQL for direct passthroughs, and for datamodel SDL definitions
  • GraphQL+- for automatic aggregations, functions, and useful graph traversal stuff (k-shortest-path, etc)

Schema Changes

We'd still maintain two SDL schemas, one for the database and one for the API, and (like with Prisma 1) we'd be able to use and overwrite the db schema at the API layer. Instead of simply importing it with graphql-import however, we might be able to make use of Apollo's schema federation.

Generating Data Layer Queries from Client Queries

If we add a @computed directive to the API schema, we'll be able to be more declarative when dealing with computed db fields and serving data from other (REST) APIs:

Setting @computed on a field would tell our Dgraph query generator to ignore that field when making calls to the db, as that field only exists at the API layer. This can also be used for fields (and types) that are composed from other APIs.

Setting @computed(from: ["fieldA", "fieldB"]) would tell our Dgraph query generator to ignore that field, but include the underlying db fields it depends on (if they're not otherwise being fetched). Thus our field resolvers wouldn't need to fetch any data from the db themselves.

Setting @count(from: "fieldA") would tell our Dgraph query generator to ignore that field, but include a count function against a different field. This allows for very fast aggregate queries in the db itself. In the future, we could have other directives that correspond to GraphQL+- features.

One huge win for using Dgraph would be that each query to our API would generate at most a single query to the database, completely eliminating the N+1 Problem without needing complicated caching strategies or dataloaders. In the future, we could make this even more efficient by checking permissions beforehand and adding a filter to the generated Dgraph query, so the database would only return results that the user is able to view (right now we're doing that kind of filtering after the db is accessed, which is wasteful and a potential attack vector).

Query Example

Database Schema

type Content {
  id: ID!
  label: String
  # By saving rich text as both plaintext and its raw JSON value,
  # we allow for rich searching and filtering
  title: String @search(by: [term, fulltext]) # dgraph directive
  # dgraph doesn't actually support scalar JSON values, so in reality this would be a string
  # that we'd parse / stringify as needed at the API layer
  rawTitle: Json
  attachments: [Content!]!
  userId: Int # This ID is used to call a User API
}

API Schema

extend type Content {
  title(format: Format): RichText @computed(from: ["rawTitle"])
  attachmentCount: Int! @count(from: "attachments")
  user: User @computed(from: ["userId"])
  randomNumber: Int @computed # no field dependencies
}

# If the User API also served GraphQL, this could be federated
# Assume that it's a REST API
type User {
  id: Int!
  name: String!
}

enum Format {
  HTML
  PLAINTEXT
  RAW # raw rich text JSON
}

Client sends a GraphQL query

{
  contents(filter: { title: { anyoftext: "api design" } }) {
    id
    label
    title(format: HTML)
    user {
      name
    }
    randomNumber
  }
}

API generates a single Dgraph query

If the incoming query doesn't ask for a field with @count, we can create a GraphQL query based on the incoming query and computed fields:

{
  contents(filter: { title: { anyoftext: "api design" } }) {
    id
    label
    rawTitle # from title
    userId # from user
  }
}

If the incoming query does ask for fields with @count (e.g. attachmentCount), generate a GraphQL+- query instead:

{
  contents(func: anyoftext(title, "api design")) {
    id
    label
    rawTitle # from title
    attachmentCount: count(attachments) # from attachmentCount, also used as the alias
    userId # from user
  }
}

API passes the resulting data to resolvers

  • title resolver formats rawTitle based on format argument
  • user object resolver calls the user dataloader to fetch data from the user API via userId
  • randomNumber resolver generates and returns a random number

Note that we could implement the user loading via field resolvers on other types, but in a complicated data model it's cleaner to specify resolvers once, on nodes rather than edges.

Client receives resulting data

{
  "contents": {
    "id": "0x10d8a7sd8s",
    "label": "Bundle",
    "title": "The <em>Cool</em> Bundle",
    "attachmentCount": 23,
    "user": {
      "name": "Nelson Pecora"
    },
    "randomNumber": 42
  }
}
@nelsonpecora
Copy link
Author

In the future, we could have other directives that correspond to GraphQL+- features.

I think the more proper way to go about this might be something like neo4j-graphql's @cypher directive, where you specify the underlying GraphQL+- query directly. That way the number of different directives for every single dgraph-specific feature wouldn't explode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment