Skip to content

Instantly share code, notes, and snippets.

@jjjordanmsft
Last active December 6, 2017 14:36
Show Gist options
  • Save jjjordanmsft/d70fad89740290601a6ec0ba512b508d to your computer and use it in GitHub Desktop.
Save jjjordanmsft/d70fad89740290601a6ec0ba512b508d to your computer and use it in GitHub Desktop.
Proposal for Go SDK interface

Background

The problem

The Go SDK's interfaces are in dire need of a facelift. Presently, much of the telemetry data is private to the SDK and only settable through the constructor methods. Requests are in particularly bad shape:

type RequestTelemetry struct {
	BaseTelemetry
	data *requestData
}

type requestData struct {
	domain
	Id           string             `json:"id"`
	Name         string             `json:"name"`
	StartTime    string             `json:"startTime"` // yyyy-mm-ddThh:mm:ss.fffffff-hh:mm
	Duration     string             `json:"duration"`  // d:hh:mm:ss.fffffff
	ResponseCode string             `json:"responseCode"`
	Success      bool               `json:"success"`
	HttpMethod   string             `json:"httpMethod"`
	Url          string             `json:"url"`
	Measurements map[string]float32 `json:"measurements"`
}

func NewRequestTelemetry(name, httpMethod, url string, timestamp time.Time, duration time.Duration, responseCode string, success bool) *RequestTelemetry

Just a few of the handful of issues with this:

  • The Id is not available to callers, either to be set or get elsewhere (for instance, for use in operation id fields or parent operation id, or to correlate with a request ID generated by the application).
  • The only way to build telemetry objects are through methods with many arguments. Go does not support function overloads or named/default arguments.
  • Properties and Measurements are all but inaccessible to all telemetry types. I tacked something on for properties, but it was only a half-measure.

There are presently several Track methods. All but the generic Track are shortcuts that do nothing interesting:

func (tc *telemetryClient) TrackMetric(name string, value float64) {
	tc.Track(NewMetricTelemetry(name, value))
}

Other efforts

There is some effort to make the track interfaces consistent across all of the SDK's.

Some Go background

I'm aware that many of the people reading this have little experience with Go, so I'll quickly highlight some quirks of the language that are relevant to this particular problem (if you do have some experience, please forgive the hand-waving):

  • Go does not support function overloads or optional/default arguments. It supports a variable number of arguments, but that's it.
  • Go Interfaces are similar to Typescript interfaces, in that it employs duck-typing. Only methods can be specified on an interface. If a type declares all of the methods in the interface, then it can be used as that interface.
  • Go does not have the notion of a default constructor. In fact it doesn't really have the notion of constructors at all in the spirit of Java/C#/C++. It's simply a term used for a method that starts with New and returns a new instance of a type. You must call the methods explicitly. There's no way to add implicit default values to members.
  • Structs can 'inherit' from other structs, which winds up including all of their members and methods in the struct (there are more details to this). But you cannot cast an instance of InheritedType to a pointer of BaseType.
    • Struct literals get a bit more verbose when one wants to also initialize the members of the base type.
    • Polymorphism is achieved through interfaces only.
  • If a type or member starts with a lower-case letter, it is not accessible outside of the package. Upper-case makes it public. In a sense, Go has two levels of access: public and internal (in C# terms).

Other solutions considered

I made a few attempts at what the interfaces ought to look like before settling on this proposal. I'll try to briefly describe each and what was wrong with them.

Telemetry objects as thin data contract wrappers

The jjjordan/fixdatamodel branch features a step in the right direction, starting with data contracts generated from the public Bond schemas.

Presently, each telemetry type is a simple wrapper around a data contract:

type EventTelemetry struct {
	BaseTelemetry
	Data *contracts.EventData
}

func NewEventTelemetry(name string) *EventTelemetry {
	data := contracts.NewEventData()
	data.Name = name

	return &EventTelemetry{
		Data: data,
		BaseTelemetry: BaseTelemetry{
			Timestamp: currentClock.Now(),
			context:   NewTelemetryContext(),
		},
	}
}

Everything is public. The user is free to poke around the Data and set whatever they like if something isn't available through the constructor. For example, this enables a Request constructor to ensure that all required data is present, and then allow the user to set up any extended data before finally submitting the telemetry. This is very flexible, but data contracts tend to change and other SDKs aren't so liberal about exposing them directly to users.

Node-like telemetry objects

This approach tried to emulate Node.JS's conventions around using objects to emulate named parameter references. For example, requests:

type RequestTelemetry struct {
	Id           string
	Name         string
	Url          string
	Duration     time.Duration
	ResponseCode string
	Success      bool
	Timestamp    time.Time
	// ...
}

Then, this could be submitted like this:

client.Track(&appinsights.RequestTelemetry{
	Url:          req.Url,
	ResponseCode: string(resp.StatusCode),
	Duration:     time.Now().Sub(startTime),
	Name:         fmt.Sprintf("%s %s", req.Method, req.Url),
	Timestamp:    startTime,
})

This has a couple of drawbacks:

  • For things like Success, we cannot detect whether it is false because it was explicitly set to false by the user, or if it was just unspecified. This means the user must be verbose, which somewhat defeats the purpose. Nullable types aren't really available outside of pointers, and it's even more verbose to produce a pointer to a literal bool. The same is true for zeros with floats/ints.
  • Things like Timestamp, Name, and Id suffer from the same problem, but we may be able to reliably detect if these are empty. Nobody's submitting telemetry at January 1st, 0001. They're probably not submitting a request without a name.
  • ... but in the case of Id, if left unspecified, the user can never access it in case s/he wants to let the SDK generate it and reference it elsewhere. The best we can do is make RandomId public and force the user to specify it in the struct literal.
  • Telemetry contexts don't come free anymore. You now need to build your own with appinsights.NewTelemetryContext, add stuff to it, and then finally supply it:
context := appinsights.NewTelemetryContext()
context.Location().SetIp(req.RemoteAddr)
client.Track(&appinsights.RequestTelemetry{
	Context: context,
	// ...
})

Proposal

Constructor methods take the minimum required parameters. Everything else shall be set after the fact via methods and properties. (OK, this seems a little obvious).

Telemetry types

General

item := appinsights.NewXyzTelemetry(/* Some arguments */)
// Properties related directly to the telemetry item can be set directly on the object
item.Name = "Hello"

// Overriding context information can be done through the TelemetryContext instance supplied by the constructor
item.Context.Location().SetIp("127.0.0.1")

// Some common properties exist too
item.Timestamp = time.Now()

// Properties and measurements exist here as well (or not, depending on the item type)
item.Properties["resource_group"] = "fotonvgeus3"
item.Measurements["queue_length"] = 33.0

// Properties/Measurements from the Telemetry interface can be accessed via GetProperties/GetMeasurements
var telem appinsights.Telemetry = item
telem.GetProperties()["common_prop"] = "foo"
// ... be careful though, some of them (MetricsTelemetry.Measurements, for example) return nil.

// Finally, track the item.
client.Track(item)

Prototypes:

type Telemetry interface {
	Time() time.Time
	TelemetryContext() *TelemetryContext
	TelemetryData() TelemetryData
	GetProperties() map[string]string
	GetMeasurements() map[string]float64
}

type BaseTelemetry struct {
	Timestamp    time.Time
	Properties   map[string]string
	Measurements map[string]float64
	Context      *TelemetryContext
}

// This interface is implemented by the data contracts generated from the Bond schema.
type TelemetryData interface {
	EnvelopeName() string
	BaseType() string
}

func (item *BaseTelemetry) Time() time.Time { /* ... */ }
func (item *BaseTelemetry) TelemetryContext() *TelemetryContext { /* ... */ }
func (item *BaseTelemetry) GetProperties() map[string]string { /* ... */ }
func (item *BaseTelemetry) GetMeasurements() map[string]float64 { /* ... */ }

Link to code

Events

ev := appinsights.NewEventTelemetry("Event name")
ev.Timestamp = time.Now()
client.Track(ev)

// Or just this if you don't mind the defaults for everything:
client.TrackEvent("foo")

Prototypes:

type EventTelemetry struct {
	BaseTelemetry
	Name string
}

func NewEventTelemetry(name string) *EventTelemetry { /* ... */ }

Link to code

I mean... events are pretty basic.

Requests

t := appinsights.NewRequestTelemetry(req.Method, req.Url, time.Now().Sub(startTime), string(resp.StatusCode))
t.Context.Location().SetIp(request.RemoteAddr)
t.Timestamp = startTime
client.Track(t)

Success gets auto-calculated based on the responseCode, but can be set on the object after the fact. The timestamp is calculated by current_time - duration

Prototypes:

type RequestTelemetry struct {
	BaseTelemetry
	Id           string
	Name         string
	Url          string
	Duration     time.Duration
	ResponseCode string
	Success      bool
}

func NewRequestTelemetry(method, url string, duration time.Duration, responseCode string) *RequestTelemetry { /* ... */ }

Link to code

Traces

func NewTraceTelemetry(message string, severityLevel contracts.SeverityLevel) *TraceTelemetry { /* ... */ }

tr := NewTraceTelemetry("Hello world", appinsights.Verbose)
tr.Properties["proc"] = "whatever"
client.Track(tr)

// Or
client.TrackTrace("Message", appinsights.Warning)

It is possible to use a trick and make severityLevel "optional", but this is ill-advised.

Prototypes:

type TraceTelemetry struct {
	BaseTelemetry
	Message       string
	SeverityLevel contracts.SeverityLevel
}

func NewTraceTelemetry(message string, severityLevel contracts.SeverityLevel) *TraceTelemetry { /* ... */ }

Link to code

Metrics

Due to the number of permutations possible with metrics telemetry, it can be a little tricky. There's at least a couple ways to go about it:

  • A constructor method that accepts no arguments -- metrics can be set by methods on that type instead.
  • Multiple types that represent metrics telemetry. Perhaps, MetricTelemetry (singular) for one value and MetricsTelemetry for multiple metrics.
  • This could also be implemented as multiple constructors for different cardinalities.

But, we'll stick with the zero-parameter constructor like it looks like .Net uses:

// Single value
metric := appinsights.NewMetricTelemetry("name", 42.0)
client.Track(metric)

// Multiple value
metrics := appinsights.NewMetricsTelemetry()

// Measurements
metrics.AddMeasurement("name", 42.0)
metrics.AddMeasurements(map[string]float64{
	"pi": 3.141,
	"e": 2.718,
	"phi": 1.618,
})

// Aggregations
metrics.AddAggregation("agg", 42.0, 40.0, 45.0, 0.3, 5) /* See prototype, below */
metrics.Metrics = append(metrics.Metrics, &contracts.DataPoint{ // <- This is the raw data contract.
	Name:   "group",
	Kind:   appinsights.Aggregation,
	Value:  41.0,
	Min:    30.0,
	Max:    50.0,
	StdDev: 2.4,
	Count:  20,
})

client.Track(metrics)

Note: I ran across this comment in the data contracts (and it's in the Bond schema as well):

	// List of metrics. Only one metric in the list is currently supported by
	// Application Insights storage. If multiple data points were sent only the
	// first one will be used.

Does this still apply? I know that other SDK's do support multi-metric, and that support on the portal is incomplete.

Prototypes:

type MetricTelemetry struct {
	BaseTelemetry
	Name  string
	Value float64
}

func NewMetricTelemetry(name string, value float64) *MetricTelemetry { /* ... */ }

type MetricsTelemetry struct {
	BaseTelemetry
	Metrics []*contracts.DataPoint
}

func NewMetricsTelemetry() *MetricsTelemetry { /* ... */ }
func (metrics *MetricsTelemetry) AddMeasurement(name string, value float64) { /* ... */ }
func (metrics *MetricsTelemetry) AddAggregation(name string, value, min, max, stddev float64, count int) { /* ... */ }
func (metrics *MetricsTelemetry) AddMeasurements(measurements map[string]float64) { /* ... */ }

Link to code

Remote dependencies

func NewRemoteDependency(type, target string, success bool) *RemoteDependencyTelemetry { /* ... */ }

dep := appinsights.NewRemoteDependency("HTTP", "https://api.applicationinsights.io/...", true)
// Set data
dep.Data = "Query"
dep.ResultCode = string(response.StatusCode)
dep.Name = "???"

// Set time:
dep.Timestamp = startTime
dep.Duration = time.Now().Sub(startTime)

// Alternately
dep.MarkTime(startTime, time.Now())

client.Track(dep)

// Not a fan of this one:
client.TrackRemoteDependency("HTTP", "https://api.applicationinsights.io/...", true)

Prototypes:

type RemoteDependencyTelemetry struct {
	BaseTelemetry
	Name       string
	Id         string
	ResultCode string
	Duration   time.Duration
	Success    bool
	Data       string
	Type       string
	Target     string
}

func NewRemoteDependencyTelemetry(dependencyType, target string, success bool) *RemoteDependencyTelemetry { /* ... */ }
func (telem *RemoteDependencyTelemetry) MarkTime(startTime, endTime time.Time) { /* ... */ }

Link to code

Exceptions

Go does not have an exception type with an attached callstack. The error type maps OK to the exception message, but stacks are conveyed by using some esoteric functions from the runtime package. Tracking a panic might look something like this:

func NewExceptionTelemetry(err interface{}) *ExceptionTelemetry { /* ... */ }
// err can be an error type or a string

defer func() {
	if r := recover(); r != nil {
		exc := appinsights.NewExceptionTelemetry(r)
		client.Track(exc)
		
		// Optional: re-throw
		panic(r)
	}
}()

It would need to be called from the error site in any case because it will need to grab the callstack. It is preferable to this:

// ...
if r := recover(); r != nil {
	stack := make([]uintptr, 64)
	n := runtimer.Callers(2, stack)
	exc := appinsights.NewExceptionTelemetry(r, stack[:n])
	// ...
}

Prototypes:

type ExceptionTelemetry struct {
	BaseTelemetry
	Error         interface{}
	Frames        []*contracts.StackFrame
	SeverityLevel contracts.SeverityLevel
}

func NewExceptionTelemetry(err interface{}) *ExceptionTelemetry { /* ... */ }
func GetCallstack(skip int) []*contracts.StackFrame { /* ... */ }

Link to code

Availability

func NewAvailabilityTelemetry(name string, duration time.Duration, success bool) *AvailabilityTelemetry { /* ... */ }

availability := appinsights.NewAvailabilityTelemetry("ping test", time.Now().Sub(startTime), result)
availability.RunLocation = "The moon"
availability.Id = /* A guid or something? */
availability.Message = "OK"
client.Track(availability)

client.TrackAvailability("ping test", time.Now().Sub(startTime), result)

Prototypes:

type AvailabilityTelemetry struct {
	BaseTelemetry
	Id          string
	Name        string
	Duration    time.Duration
	Success     bool
	RunLocation string
	Message     string
}

func NewAvailabilityTelemetry(name string, duration time.Duration, success bool) *AvailabilityTelemetry { /* ... */ }

Link to code

Page views

func NewPageViewTelemetry(url string) *PageViewTelemetry { /* ... */ }

pageView := appinsights.NewPageViewTelemetry("https://path.to/page")
pageView.Name = "Optional name"
pageView.Duration = time.Second /* Optional visit duration */
client.Track(pageView)

client.TrackPageView("https://path.to/page")

Prototypes:

type PageViewTelemetry struct {
	BaseTelemetry
	Url      string
	Duration time.Duration
	Name     string
}

func NewPageViewTelemetry(url string) *PageViewTelemetry { /* ... */ }

Link to code

Track methods

Yet another point is what methods should exist on the TelemetryClient type. Obviously, Track should accept any telemetry type and is already there, but the question comes to whether we ought to have shortcut methods, and how should they look?

One could argue that the constructor methods ought to cover the basic cases, and it's not worth adding redundant track methods, but there is something to be said for making simple things simple.

Link to code

Operation correlation

Go doesn't have inbuilt thread local storage or even using (C#) or with (Python), so it's difficult to emulate the other SDKs here. appinsights.Operation will implement the TelemetryClient interface and place the operation Id on each item that goes through it.

operation := client.StartOperation(telemetryItem, "Operation name")
// If telemetryItem has an operation ID, events tracked via operation will get it as its parent operation id.
// If telemetryItem does not have an operation ID, it will be generated randomly and set on it.

// In either case, it can be overridden by the user in this manner:
operation.Id = "foobar"

operation.TrackTrace("This is a sub-event", appinsights.Verbose)
// ...

// Sends telemetryItem and all telemetry tracked in the operation.
operation.Complete()

Not yet implemented

Flushing the buffer

(This section is already implemented -- please comment if there are issues)

To flush the buffer, just call the Flush method on the telemetry channel:

client.Channel().Flush()

This will return immediately, with no indication that the buffer has been flushed. If you need to flush right before exit, you should Close() the channel instead:

client.Channel().Close()

Will also return immediately. You can wait for a notification that all items that were in the buffer have been sent by waiting on the channel that gets returned by that method:

<-client.Channel().Close()

It can be time boxed in the usual way:

select {
case <-client.Channel().Close():
	// Success
case <-time.NewTimer(30 * time.Second).C:
	// Timeout
}

By default, the above will not retry failed submissions. To enable retries (with a time limit), supply a duration:

<-client.Channel.Close(time.Minute)
@bketelsen
Copy link

RE: Variadic parameters for functions (Again, there may be a way to make this a variadic function to optionally accept multiple strings that compose the data) Please don't do this. It turns into the magic hash at the end of every Ruby on Rails function call. If the value is useful, require it as a parameter, if it's optional, set a default and allow it to be overridden.

RE: Page Views. 95% of the production code I've written over the past 7 years in Go is serving HTTP requests. There's no good reason not to include page view metrics in the AI client.

In general your proposal looks great, and it's clear you've put a lot of thought into it. These are my only comments. Thank you for doing this.

@jjjordanmsft
Copy link
Author

Page views are typically logged on the client side of a web request (notably, the Javascript SDK), and include things like browser load time. The server's perspective is represented by the Request telemetry type. But I guess there's no great reason for omitting them entirely -- might as well cover everything. Even if it's not likely to be used there's always that one guy who wants them.

@OsvaldoRosado
Copy link

W.r.t operation correlation. Is it possible to do this the Node way (wrapping code that goes asynchronous in order to keep track of a context stack)?

Using an operation object like you describe is clever, but leaves no real solutions for autocollection.

@jjjordanmsft
Copy link
Author

Node pulls that off with continuation local storage, .Net does it with thread local storage. There is no equivalent for Go. Except for maybe one -- but do note the opening lines of its README (and be sure to read "What are people saying").

The proper pattern for passing that sort of stuff around is the key-value storage in contexts. You can get one from any Request object. On the other hand, automatic collection is something I've not thought about a ton, and perhaps I shouldn't attempt to implement operation contexts until I better understand their requirements.

@SergeyKanzhelev
Copy link

Please add Source for RequestTelemetry. It is used for app-to-app correlation.

Yes, metrics are still only work with a single measurement inside.

One of the reason to not expose bond-specific Data objects in C# was the idea of potentially changing schema without changing programming interfaces. It also allows to use native types like DateTime and Uri instead of formatted strings. I think it worked OK for C#, but it was developed over changing schemas and backend evolution. May not be such a huge deal now.

One major change in schema we are working on is migration to Microsoft Common Schema 3.0. However this migration may be too big to hide in programming interfaces and will require major version rev anyway.

I like the proposed design

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment