Skip to content

Instantly share code, notes, and snippets.

@andreasgerstmayr
Created August 12, 2018 04:10
Show Gist options
  • Select an option

  • Save andreasgerstmayr/d1d0ebdc0688d1923224e9051894365d to your computer and use it in GitHub Desktop.

Select an option

Save andreasgerstmayr/d1d0ebdc0688d1923224e9051894365d to your computer and use it in GitHub Desktop.

Extending BCC support for Performance Co-Pilot and Vector

The goal of this project was to integrate several BCC tools into Performance Co-Pilot (PCP) and Vector. The integration of the BCC tools into the PCP framework provides a number of benefits: 24/7 monitoring, archiving and exporting of the collected metrics to another system and much more. Furthermore, Vector can consume the metrics in near real-time and display them in the browser with meaningful visualizations, e.g. heat maps and flame graphs. Adequate visualization of collected performance metrics eases the identification and resolution of performance issues.

Integrated BCC Tools

The following BCC tools were integrated into PCP and Vector:

  • execsnoop: traces new processes
  • runqlat: records the scheduler run queue latency as histogram
  • profile: records stack traces at a specific interval
  • biolatency*: records block device I/O latency as histogram
  • biotop*: summarizes which processes are performing block I/O
  • ext4dist, xfsdist, zfsdist: trace read/write/open/fsync latencies as histogram
  • tcplife*: summarizes TCP sessions
  • tcpretrans: traces TCP retransmits
  • tcptop: summarizes TCP throughput by host and port

* PCP module was already implemented, a few changes were required for the tcplife and biotop modules

New Vector Widget Types

Status

All main goals, including one stretch goal (profile), are merged in their repositories.

The latest stable PCP version 4.1.1 includes all new BCC tools. The Vector widgets are merged into the master branch of the Vector repository.

Code

Both projects are hosted on GitHub, and every change was integrated by pull requests:

Challenges and Learnings

PCP

Quite a big amount of time was spent on the implementation and debugging of the automated QA tests. There were a few occurrences of race conditions and background processes influencing the test results of the QA tests. The occasional occurrence of these bugs made debugging troublesome, especially on Travis CI, where each complete run took about 20 minutes and the log output is the only source of information for debugging.

Another difficult-to-catch bug occured with the last module, profile. This module used a background thread, which ran at a glacial pace. However, the same code ran at normal pace in the main thread. The problem was that the Python Global Interpreter Lock (GIL) wasn't released while the Python PMDA was waiting for new instructions, therefore the background thread was effectively blocked (thanks to Marko Myllynen and Frank Ch. Eigler for debugging this issue!).

Vector

Vector uses nvd3, which requires d3 version 3. d3-heatmap2 and d3-flame-graph however require d3 version 4. The "fix" for d3-heatmap2 was a tiny wrapper around the renamed functions, but d3-flame-graph used new features of d3. Therefore I also included d3 version 4 in the project, renamed the global variable to d3v4 and modified the module definition of the included d3-flame-graph to use this variable instead. There is a huge refactoring going on in Vector at the moment (replacing AngularJS with React and nvd3 with semiotic, and upgrading d3 to version 4), therefore this workaround will be obsolete soon.

Acknowledgements

I would like to thank my mentors Marko Myllynen and Martin Spier for their support, code reviews and responsiveness. Also, I would like to thank Nathan Scott for merging my PRs and occasional pings for failed Travis CI builds, Frank Ch. Eigler for helping debugging the GIL issue and Mark Goodwin for fixing the PMDA shutdown issue. Furthermore, thanks to Brendan Gregg for creating the original BCC tools. Finally I like to thank Google for giving me this opportunity.

@andreasgerstmayr
Copy link
Author

a few screenshots from the new Vector widgets:

biolatency

biolatency

tcplife

tcplife

profile

profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment