Skip to content

Instantly share code, notes, and snippets.

@samueleresca
Created August 2, 2020 17:00
Show Gist options
  • Save samueleresca/c80e58b8e9bd5ff858f55d84b4921104 to your computer and use it in GitHub Desktop.
Save samueleresca/c80e58b8e9bd5ff858f55d84b4921104 to your computer and use it in GitHub Desktop.
val numTitles = fetchNumTitles();
val maxExpectedPhoneRatio = calcMaxExpectedPhoneRatio();
val result = VerificationSuite()
.onData(data)
.addCheck(Check(CheckLevel.Error, "Completeness + uniqueness on main fields")
.areComplete(Seq("customerId", "title", "impressionStart", "impressionEnd", "deviceType", "priority"))
.areUnique(Seq("customerId", "countryResidence", "deviceType", "title"))
.hasApproxCountDistinct("", value => numTitles >= value)
.hasHistogramValues("deviceType", distribution => maxExpectedPhoneRatio >= distribution("phone").ratio))
.addCheck(new Check(CheckLevel.Error, "Check consistency of range-based data")
.isNonNegative("count")
.isLessThan("impressionStart", "impressionEnd")
.isContainedIn("priority", Array("hi", "lo")))
.addCheck(new Check(CheckLevel.Warning, "Check correlation")
.isLessThan("impressionStart", "impressionEnd")
.hasCorrelation("countryResidence", "cityResidence", corr => corr >= .99)
.hasCorrelation("countryResidence", "zipCode", corr => corr >= .99))
.run()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment