The incoming log stream from Hasura instances can be analyzed to derive metrics about the instance and it’s performance.
These can help with real-time requests per second, success-error rates, query whitelisting, regression tests, field level analytics etc.
These features can be broadly classified into four buckets:
- Debugging
- Alerting
- Performance monitoring
- API statistics and analytics
For the first release, we’ll focus on some basic features that could span these categories.
Legend:
- ✅: possible with current logs
- ❓🕒: needs more work
- ✅ Is this instance alive?
- We will periodically poll the instance’s health endpoint and also see if we’re receiving logs from the instance.
- Request ID
[query-log].detail.request_id
[http-log].detail.operation.request_id
- Status Code:
[http-log].detail.http_info.status
- URL:
[http-log].detail.http_info.url
- IP address:
[http-log].detail.http_info.ip
- Method:
[http-log].detail.http_info.method
- ✅
[http-log].detail.http_info.url == "v1/graphql" "v1alpha1/graphql"
- ❓
[query-log]
– only for GraphQL? - What was queried?
- ✅ Operation name:
[query-log].detail.query.operationName
- ✅ Raw query:
[query-log].detail.query.query
- ✅ Query variables:
[query-log].detail.query.variables
- ✅ (on error):
[http-log].detail.operation.query
- ✅ Generated SQL:
[query-log].detail.generated_sql
- 🕒 Parsed tables/columns/fields
- ❓ Is it from Hasura or a remote schema?
- ✅ Operation name:
- Who made the query?
- ❓ Client ID (something a client would generate, like a mobile app vs website)
- ✅ (X-Hasura-Role [, X-Hasura-x]) tuple that uniquely identifies the user
[http-log].detail.operation.user_vars
- Was it successful?
- ✅ Boolean status:
[http-log].level == "error"
- ✅ Error indicators:
[http-log].detail.error
is not null
- ✅ Boolean status:
- What was the error?
- ✅ Error body:
[http-log].detail.operation.error
- ✅ Query:
[htt-log].detail.operation.query
- ✅ Error classification and categorisation
[http-log].detail.operation.error.code
- ✅ Error body:
- 🕒 What was the response?
- GraphQL JSON response body
- Did any field return NULL?
- How much time did it take?
- ✅ Query execution time
[http-log].detail.operation.query_execution_time
- ❓ Total time taken to respond
- ✅ Query execution time
- How do I make it faster?
- Analyze the query from console
- Show the generated SQL from logs?
- Non GraphQL errors
[http-info].detail.http_info.url == "v1/query"
- What was the action?
- ✅ on error
[http-log].detail.operation.query
- ❓ on success
- ✅ on error
- Who executed it?
- ❓ User information parsed from Collaborator-Token
- ❓
[http-log].detail.operation.user_vars
- Was it successful?
- ✅ Boolean status:
[http-log].level
- ✅ Boolean status:
- What was the error?
- ✅ Error body:
[http-log].detail.operation.error
- ✅ Error classification:
[htt-log].detail.operation.error.code
- ✅ Error body:
- How much time did it take?
- ✅ Execution time:
[http-log].detail.operation.query_execution_time
- ❓ Request latency
- ✅ Execution time:
The items below might not be available in the first release.
- What are the other APIs being called?
- How many times?
- Who is executing it?
- Success/Error
- How many clients are connected?
- What are they querying?
- Are there errors?
- How many subscriptions are optimised?
- What are the time taken?
- Are there subscriptions that are not optimised?
- How can I optimise them?
- How many events are being triggered?
- What is the success rate?
- What is the error rate?
- Latencies from triggers?
- Dead/alive status on triggers?
- Number of API calls over time
- Error/Success rates
- Preferably real-time rps
- Usage per client
- List of GraphQL queries over time
- With user metadata
- Group by user metadata
- Metadata API access audit log
</div>