Skip to content

Instantly share code, notes, and snippets.

@ashwoods
Created July 19, 2019 13:02
Show Gist options
  • Save ashwoods/a8f35c8c39031e91673e47c2806ccb40 to your computer and use it in GitHub Desktop.
Save ashwoods/a8f35c8c39031e91673e47c2806ccb40 to your computer and use it in GitHub Desktop.
integration problems

Root cause analysis for integration incident

This documents purpose is to detail current integration problems and possible remedies and workarounds.

Debugging and troubleshooting are cumbersome as errors are systematic and happen at several places at the same time.

Error chain:

Goal: vps-gazeoverlay 1.1.0 was to be installed. The changes where dependent on vps-image-filters 1.5.0. This is normally done by just bumping the version of gstreamer element package in the Dockerfile and rebuilding/publishing the docker image. ! A ticket was created with the versions and assigned to me with the release information (correct version numbers) and this was a great improvement already over prior releases (tracking down dependency versions manually for each package.)

-> Error 1: At this stage the first error happened by not being able to build the image as vps-gazeoverlay package wasn't found.

possible causes:

- package was built on CI but not published.
+ commit was not tagged and therefor not set for publishing. (most probably the tag wasn't pushed)

This error was remedied by cloning and tagging the release and prompting the CI to build/publish the package.

-> Error 3: After rebuilding the image a test was made, but the pipeline was broken. A simple test was taken and was also not functional. As other changes where made elsewhere on the system, a rollback and retesting was made to confirm the change.

possible causes: -+ integration was done incorrectly -+ element none functional

At this point both cases where actually the case. Going through the unit tests I recreated a smaller pipeline test that was non functional. Further analysis of the Changelogs/git history a mistake in integration was found.

The dependency vps-image-filters version 1.5.0 had also not been tagged. But in the meantime selbi had released 2.0.0, that satisfied the dep requirements for vps-gazeoverlay, that was configured with lower bounds requirements but not with upper bound. Selbi had changed the interface for vps-image-filters, and had semantically bumped the major version indicating this change.

The remedy was to tag/build the dependency and pin the upstream dependency in the Dockerfile.

-> Error 4: After building, a short test. The pipeline was functional and not throwing errors, but the element was not drawing the overlay. In this case the simple test was functional (made with videotestsrc in greyscale). Later on other formats and resolutions where tested.

possible causes: - pipeline incompatibility (tee's, encoder, etc...) - bug in buffer handling - bug in drawing algo + bug in interface implementation

Remedy: short intro on how the tc builds work and the base container for the production to be able to setup a test environment. A bug was ultimately found on a missing I420 format implementation? and fixed between Selbi, Morti and Josef.

Possible root causes

  • Communication of release versions (Tracking down dependencies) - (Improved by ticket )
  • Inconsistent and or failure prone release management for components.
  • Testing of components and application environments are too far apart.
  • Synchronization of settings is error prone as they are un-versioned and happen in different repositories.

Fixes already in the pipeline.

  • Fix the "special cases and workarounds" in eyetracking-base and move build to CI.
  • Automate some basic pipeline tests in eyetracking-base
  • Intro for gstreamer team in the CI and release management so that his future builds are tagged and published.
  • adding build status in bitbucket (specially PR's)
  • making eyetracking production environment more accessible upstream (moving pipeline development to the base instead of hog)

Other recommendations:

  • release and maintenance training and processes descriptions: developers tickets are closed only when builds are green and deployed and tested
  • configuration has to be maintained as software packages
  • add dev/nightly builds to CI for easier testing of unreleased versions.
  • make eyetracking-base the default env for all gstreamer work.
  • create a checklist of manual tests that gstreamer devs have to preform before releasing a new element
  • create basic automated integration tests for a calibration/eyetracking routine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment