Skip to content

Instantly share code, notes, and snippets.

@dsandip
Created July 25, 2019 13:23
Show Gist options
  • Save dsandip/f0d978d87806a478be08b68d00101e14 to your computer and use it in GitHub Desktop.
Save dsandip/f0d978d87806a478be08b68d00101e14 to your computer and use it in GitHub Desktop.
Metrics in commercial product
<title>Metrics in commercial product</title>

Objective

The incoming log stream from Hasura instances can be analyzed to derive metrics about the instance and it’s performance.

These can help with real-time requests per second, success-error rates, query whitelisting, regression tests, field level analytics etc.

These features can be broadly classified into four buckets:

  • Debugging
  • Alerting
  • Performance monitoring
  • API statistics and analytics

For the first release, we’ll focus on some basic features that could span these categories.

Metrics

Legend:

  • ✅: possible with current logs
  • ❓🕒: needs more work

Health

  • ✅ Is this instance alive?
    • We will periodically poll the instance’s health endpoint and also see if we’re receiving logs from the instance.

HTTP API Requests

  • Request ID
    • [query-log].detail.request_id
    • [http-log].detail.operation.request_id
  • Status Code: [http-log].detail.http_info.status
  • URL: [http-log].detail.http_info.url
  • IP address: [http-log].detail.http_info.ip
  • Method: [http-log].detail.http_info.method

GraphQL query

  • [http-log].detail.http_info.url == "v1/graphql" "v1alpha1/graphql"
  • [query-log] – only for GraphQL?
  • What was queried?
    • ✅ Operation name: [query-log].detail.query.operationName
    • ✅ Raw query: [query-log].detail.query.query
    • ✅ Query variables: [query-log].detail.query.variables
    • ✅ (on error): [http-log].detail.operation.query
    • ✅ Generated SQL: [query-log].detail.generated_sql
    • 🕒 Parsed tables/columns/fields
    • ❓ Is it from Hasura or a remote schema?
  • Who made the query?
    • ❓ Client ID (something a client would generate, like a mobile app vs website)
    • ✅ (X-Hasura-Role [, X-Hasura-x]) tuple that uniquely identifies the user
      • [http-log].detail.operation.user_vars
  • Was it successful?
    • ✅ Boolean status: [http-log].level == "error"
    • ✅ Error indicators: [http-log].detail.error is not null
  • What was the error?
    • ✅ Error body: [http-log].detail.operation.error
    • ✅ Query: [htt-log].detail.operation.query
    • ✅ Error classification and categorisation
      • [http-log].detail.operation.error.code
  • 🕒 What was the response?
    • GraphQL JSON response body
    • Did any field return NULL?
  • How much time did it take?
    • ✅ Query execution time [http-log].detail.operation.query_execution_time
    • ❓ Total time taken to respond
  • How do I make it faster?
    • Analyze the query from console
    • Show the generated SQL from logs?
  • Non GraphQL errors

Metadata actions

  • [http-info].detail.http_info.url == "v1/query"
  • What was the action?
    • ✅ on error [http-log].detail.operation.query
    • ❓ on success
  • Who executed it?
    • ❓ User information parsed from Collaborator-Token
    • [http-log].detail.operation.user_vars
  • Was it successful?
    • ✅ Boolean status: [http-log].level
  • What was the error?
    • ✅ Error body: [http-log].detail.operation.error
    • ✅ Error classification: [htt-log].detail.operation.error.code
  • How much time did it take?
    • ✅ Execution time: [http-log].detail.operation.query_execution_time
    • ❓ Request latency

The items below might not be available in the first release.

Other API requests

  • What are the other APIs being called?
  • How many times?
  • Who is executing it?
  • Success/Error

Websocket requests

  • How many clients are connected?
  • What are they querying?
  • Are there errors?
  • How many subscriptions are optimised?
  • What are the time taken?
  • Are there subscriptions that are not optimised?
  • How can I optimise them?

Event triggers

  • How many events are being triggered?
  • What is the success rate?
  • What is the error rate?
  • Latencies from triggers?
  • Dead/alive status on triggers?

  • Number of API calls over time
    • Error/Success rates
    • Preferably real-time rps
    • Usage per client
  • List of GraphQL queries over time
    • With user metadata
    • Group by user metadata
  • Metadata API access audit log
</div>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment