Skip to content

Instantly share code, notes, and snippets.

@dsandip
Created July 25, 2019 13:24
Show Gist options
  • Save dsandip/b3061f01bcc285731f323f2dcfdd0300 to your computer and use it in GitHub Desktop.
Save dsandip/b3061f01bcc285731f323f2dcfdd0300 to your computer and use it in GitHub Desktop.
Metrics in commercial product


Objective

The incoming log stream from Hasura instances can be analyzed to derive metrics about the instance and it’s performance.

These can help with real-time requests per second, success-error rates, query whitelisting, regression tests, field level analytics etc.

These features can be broadly classified into four buckets:

  • Debugging
  • Alerting
  • Performance monitoring
  • API statistics and analytics

For the first release, we’ll focus on some basic features that could span these categories.

Metrics

Legend:

  • ✅: possible with current logs
  • ❓🕒: needs more work

Health

  • ✅ Is this instance alive?
    • We will periodically poll the instance’s health endpoint and also see if we’re receiving logs from the instance.

HTTP API Requests

  • Request ID
    • [query-log].detail.request_id
    • [http-log].detail.operation.request_id
  • Status Code: [http-log].detail.http_info.status
  • URL: [http-log].detail.http_info.url
  • IP address: [http-log].detail.http_info.ip
  • Method: [http-log].detail.http_info.method

GraphQL query

  • [http-log].detail.http_info.url == "v1/graphql" "v1alpha1/graphql"
  • [query-log] – only for GraphQL?
  • What was queried?
    • ✅ Operation name: [query-log].detail.query.operationName
    • ✅ Raw query: [query-log].detail.query.query
    • ✅ Query variables: [query-log].detail.query.variables
    • ✅ (on error): [http-log].detail.operation.query
    • ✅ Generated SQL: [query-log].detail.generated_sql
    • 🕒 Parsed tables/columns/fields
    • ❓ Is it from Hasura or a remote schema?
  • Who made the query?
    • ❓ Client ID (something a client would generate, like a mobile app vs website)
    • ✅ (X-Hasura-Role [, X-Hasura-x]) tuple that uniquely identifies the user
      • [http-log].detail.operation.user_vars
  • Was it successful?
    • ✅ Boolean status: [http-log].level == "error"
    • ✅ Error indicators: [http-log].detail.error is not null
  • What was the error?
    • ✅ Error body: [http-log].detail.operation.error
    • ✅ Query: [htt-log].detail.operation.query
    • ✅ Error classification and categorisation
      • [http-log].detail.operation.error.code
  • 🕒 What was the response?
    • GraphQL JSON response body
    • Did any field return NULL?
  • How much time did it take?
    • ✅ Query execution time [http-log].detail.operation.query_execution_time
    • ❓ Total time taken to respond
  • How do I make it faster?
    • Analyze the query from console
    • Show the generated SQL from logs?
  • Non GraphQL errors

Metadata actions

  • [http-info].detail.http_info.url == "v1/query"
  • What was the action?
    • ✅ on error [http-log].detail.operation.query
    • ❓ on success
  • Who executed it?
    • ❓ User information parsed from Collaborator-Token
    • [http-log].detail.operation.user_vars
  • Was it successful?
    • ✅ Boolean status: [http-log].level
  • What was the error?
    • ✅ Error body: [http-log].detail.operation.error
    • ✅ Error classification: [htt-log].detail.operation.error.code
  • How much time did it take?
    • ✅ Execution time: [http-log].detail.operation.query_execution_time
    • ❓ Request latency

The items below might not be available in the first release.

Other API requests

  • What are the other APIs being called?
  • How many times?
  • Who is executing it?
  • Success/Error

Websocket requests

  • How many clients are connected?
  • What are they querying?
  • Are there errors?
  • How many subscriptions are optimised?
  • What are the time taken?
  • Are there subscriptions that are not optimised?
  • How can I optimise them?

Event triggers

  • How many events are being triggered?
  • What is the success rate?
  • What is the error rate?
  • Latencies from triggers?
  • Dead/alive status on triggers?

  • Number of API calls over time
    • Error/Success rates
    • Preferably real-time rps
    • Usage per client
  • List of GraphQL queries over time
    • With user metadata
    • Group by user metadata
  • Metadata API access audit log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment