opentelemetry-sdk
, azure-monitor-opentelemetry-exporter
- set
APPLICATIONINSIGHTS_CONNECTION_STRING
(or remove AzMon exporter)
{
"name": "completions gpt-4",
"context": {
"trace_id": "0x6194e8840c82d3d4976b004835d78696",
"span_id": "0x988bf35e7dd5ba02",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2024-05-31T20:46:32.954871Z",
"end_time": "2024-05-31T20:46:32.954871Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"gen_ai.system": "openai"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.25.0",
"service.name": "unknown_service"
},
"schema_url": ""
}
}
{
"body": {
"input": "foo",
"output": "bar"
},
"severity_number": "<SeverityNumber.UNSPECIFIED: 0>",
"severity_text": null,
"attributes": {
"event.name": "gen_ai.evaluation",
"gen_ai.evaluation.status": "contains_apology",
"gen_ai.evaluation.score": 42
},
"dropped_attributes": 0,
"timestamp": "2024-05-31T20:46:33.137853Z",
"observed_timestamp": "2024-05-31T20:46:33.137853Z",
"trace_id": "0x6194e8840c82d3d4976b004835d78696",
"span_id": "0x988bf35e7dd5ba02",
"trace_flags": 1,
"resource": ""
}
...
Some additional thoughts:
input/output are already available on the parent gen-ai span. Recording them again on the evaluation event would be a duplication.
I guess they should be configurable (opt-in):
When thinking about evaluation as a metric, it seems to be too broad.
I.e. we can do this:
When evaluation is reported as event, it's reported with
gen_ai.evaluation.*
prefix. Can begen_ai.evaluation.contains_apology
,gen_ai.evaluation.groundedness
, etcWhen evaluation is reported as metric, it's reported as
gen_ai.evaluation.*
metric.gen_ai.evaluation.contains_apology
is just a counter,gen_ai.evaluation.groundedness
is a histogram, etcThis way we keep different signals consistent, but also it's easy to find all things related to evaluation by matching events/metrics which names start with
gen_ai.evaluation.*