Skip to content

Instantly share code, notes, and snippets.

@samueleresca
Last active January 26, 2021 04:48
Show Gist options
  • Save samueleresca/14edcc54c6d5766e339a4e29e44f7817 to your computer and use it in GitHub Desktop.
Save samueleresca/14edcc54c6d5766e339a4e29e44f7817 to your computer and use it in GitHub Desktop.
DataFrame dataSetDE = LoadDataSetDE();
DataFrame dataSetUS = LoadDataSetUS();
DataFrame dataSetCN = LoadDataSetCN();
// We initialize a new check for the following data fields
var check = new Check(CheckLevel.Warning, "generic check")
.IsComplete("manufacturerName")
.ContainsURL("manufacturerName", val => val == 0.0)
.IsContainedIn("countryCode", new[] { "DE", "US", "CN" });
// We create a new Analysis instance with the corresponding RequiredAnalyzers defined in the check
Analysis analysis = new Analysis(check.RequiredAnalyzers());
// We create a new in-memory state provider for each countryCode defined in the dataset
InMemoryStateProvider deStates = new InMemoryStateProvider();
InMemoryStateProvider usStates = new InMemoryStateProvider();
InMemoryStateProvider cnStates = new InMemoryStateProvider();
// These call will store the resulting metrics in the separate states providers for each dataSet
AnalysisRunner.Run(dataSetDE, analysis, saveStatesWith: deStates);
AnalysisRunner.Run(dataSetUS, analysis, saveStatesWith: usStates);
AnalysisRunner.Run(dataSetCN, analysis, saveStatesWith: cnStates);
// Next, we are able to compute the metrics for the whole table from the partition states
// This just aggregates the previously calculated metrics, it doesn't performs computation on the data
AnalyzerContext tableMetrics = AnalysisRunner.RunOnAggregatedStates(dataSetDE.Schema(), analysis,
new[] { deStates, usStates, cnStates });
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment