Skip to content

Instantly share code, notes, and snippets.

@jsquire
Last active July 22, 2020 18:45
Show Gist options
  • Save jsquire/583ff9863c60f7bfe92b02ce8ac7da33 to your computer and use it in GitHub Desktop.
Save jsquire/583ff9863c60f7bfe92b02ce8ac7da33 to your computer and use it in GitHub Desktop.
Messaging: Stress Requirements Notes

Stress Thoughts

Requirements

Hosting

  • The platform hosts multiple developer-authored test scenarios, encapsulating end-to-end use cases for the client library with varied complexity.

  • The platform can execute test scenarios for a prolonged duration spanning multiple days, while monitoring and managing scenarios that are not in a responsive or healthy state.

  • The platform allows access to Azure services.

  • The platform allows scheduled runs and ad-hoc runs.

  • The version of the SDK package can be specified for a given run, defaulting to the latest nightly build.

Configuration

  • Test scenarios can be configured per-run, using configuration items specific to the given scenario.

  • Configuration elements can support at least string as a primitive; test scenarios own responsibility for any deserialization needed to support complex types or other primitives.

Metrics and Reporting

  • The platform allows test scenarios to report metrics in a recurring fashion, where each report is considered a snapshot for that specific point in time.

  • The platform collects host environment metrics scoped to a test scenario including CPU and Memory usage on a recurring basis.

  • Test scenarios can define custom metrics for the scenario which support at least a string-based key-value pair.

  • The platform provides reports for scenario metrics in real-time when requested and at the end of a run.

  • Scenario metric reports can be sent via email and/or posted to teams upon completion.

  • Scenarios have a means to surface errors for logging, capturing exception details such as the message and stack trace, along with a textual description of context independent of the exception itself.

Nice to Have

Hosting

  • The platform allows for configurable fault injection with respect to network connectivity, DNS, latency, and other "chaos monkey"-type factors.

  • The platform allows for Azure resource setup/clean-up in a similar manner to test infrastructure.

  • The version of client libraries for a run can be based on a direct upload, allowing for a private build to be used for testing.

Configuration

  • Test scenario configuration supports a common set of elements, such as run duration, and is extensible with elements specific to that scenario.

  • Test scenario configuration can be defined as a set containing one or more elements; configuration sets may be applied to one or more scenarios to reduce duplication.

Metrics and Reporting

  • Scenarios may define a threshold for pass/fail based on individual metrics and on the final state of metrics when the scenario run has completed.

  • The platform supports custom scenario metrics of different types, ideally allowing for a label, raw value, and percentage component. Example:
    Events Read: 12,000 (99.89%)

  • The platform supports formulas for custom metrics, which may be based on other metrics. Example:
    Events Read: { this } ({ totalSent / this }%).

  • Metrics can be assigned to a display category; when displaying metrics, the platform groups them by category. Example:

    Processing
    ==========================================
    Events Published :  12,229
    Events Read      :  12,228 (99.99%)
    
    Errors
    ==========================================
    General Exceptions : 100
    Send Exceptions    :  10 (10.00%)
    Read Exceptions:   :  90 (90.00%)
    
  • The platform allows opting into SDK log events using the AzureEventListener or similar construct, allowing the log level to be specified.

  • The platform allows routing informational events and error events to different sinks, with the error-based events defaulting to the same area as scenario exceptions.

  • The platform allows for observing logs using a tail-style approach.

@MiYanni
Copy link

MiYanni commented Jul 22, 2020

I mentioned this before, and I don't know if it is do-able, but integrating BenchmarkDotNet information into our .NET tests might be useful. Mike mentioned that without it, he had a way to get memory information for the tests. Personally, I haven't used BenchmarkDotNet, but I don't know what metrics would be useful to have from there. Thoughts?

@heaths
Copy link

heaths commented Jul 22, 2020

For metrics, I would also like to see managed heap allocations and heap size in GC generations. I want to start driving improved, practical performance improvements in more expensive code paths (both in terms of speed and memory).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment