Skip to content

Instantly share code, notes, and snippets.

@luisquintanilla
Last active November 2, 2022 17:58
Show Gist options
  • Save luisquintanilla/b96f40e4404b74e315bfd3dfbce9f5e1 to your computer and use it in GitHub Desktop.
Save luisquintanilla/b96f40e4404b74e315bfd3dfbce9f5e1 to your computer and use it in GitHub Desktop.
Use transformer scopes to ignore labels when scoring
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>Microsoft.ML, 2.0.0-preview.22551.1</span></li></ul></div></div>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#r \"nuget: Microsoft.ML, 2.0.0-preview.22551.1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add using statements"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"using Microsoft.ML;\n",
"using Microsoft.ML.Data;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize MLContext"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var ctx = new MLContext();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define text loader options"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var options = new TextLoader.Options\n",
"{\n",
" Separators=new [] {','},\n",
" HasHeader=false,\n",
" Columns=new [] {\n",
" new TextLoader.Column(\"Features\",DataKind.Single,0,3),\n",
" new TextLoader.Column(\"Label\",DataKind.String,4)\n",
" }\n",
"};"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create text loader"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var textLoader = ctx.Data.CreateTextLoader(options);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load data into IDataView"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var data = textLoader.Load(@\"C:\\Datasets\\iris.data.txt\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preview schema"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th><i>index</i></th><th>Name</th><th>Index</th><th>IsHidden</th><th>Type</th><th>Annotations</th></tr></thead><tbody><tr><td>0</td><td>Features</td><td><div class=\"dni-plaintext\">0</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>Dimensions</th><th>IsKnownSize</th><th>ItemType</th><th>Size</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 4 ]</div></td><td><div class=\"dni-plaintext\">True</div></td><td><div class=\"dni-plaintext\">{ Single: RawType: System.Single }</div></td><td><div class=\"dni-plaintext\">4</div></td><td><div class=\"dni-plaintext\">Microsoft.ML.Data.VBuffer&lt;System.Single&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ ]</div></td></tr></tbody></table></td></tr><tr><td>1</td><td>Label</td><td><div class=\"dni-plaintext\">1</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">System.ReadOnlyMemory&lt;System.Char&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ ]</div></td></tr></tbody></table></td></tr></tbody></table>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data.Schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define training pipeline"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var pipeline = \n",
" ctx.Transforms.NormalizeMinMax(\"NormalizedFeatures\",\"Features\")\n",
" .Append(ctx.Transforms.Conversion.MapValueToKey(\"Label\"), TransformerScope.Training)\n",
" .Append(ctx.MulticlassClassification.Trainers.SdcaMaximumEntropy())\n",
" .Append(ctx.Transforms.Conversion.MapKeyToValue(\"PredictedLabel\"));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var model = pipeline.Fit(data);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Transform (automatic)\n",
"\n",
"Calling `Transform` automatically sets the scope to scoring."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remove Label column from IDataView"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th><i>index</i></th><th>Name</th><th>Index</th><th>IsHidden</th><th>Type</th><th>Annotations</th></tr></thead><tbody><tr><td>0</td><td>Features</td><td><div class=\"dni-plaintext\">0</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>Dimensions</th><th>IsKnownSize</th><th>ItemType</th><th>Size</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 4 ]</div></td><td><div class=\"dni-plaintext\">True</div></td><td><div class=\"dni-plaintext\">{ Single: RawType: System.Single }</div></td><td><div class=\"dni-plaintext\">4</div></td><td><div class=\"dni-plaintext\">Microsoft.ML.Data.VBuffer&lt;System.Single&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ ]</div></td></tr></tbody></table></td></tr></tbody></table>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"var idvWithoutLabels = \n",
" ctx.Transforms.DropColumns(\"Label\")\n",
" .Fit(data)\n",
" .Transform(data);\n",
"\n",
"idvWithoutLabels.Schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make batch predictions"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var batchPredictions = model.Transform(idvWithoutLabels);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preview batch prediction schema"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th><i>index</i></th><th>Name</th><th>Index</th><th>IsHidden</th><th>Type</th><th>Annotations</th></tr></thead><tbody><tr><td>0</td><td>Features</td><td><div class=\"dni-plaintext\">0</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>Dimensions</th><th>IsKnownSize</th><th>ItemType</th><th>Size</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 4 ]</div></td><td><div class=\"dni-plaintext\">True</div></td><td><div class=\"dni-plaintext\">{ Single: RawType: System.Single }</div></td><td><div class=\"dni-plaintext\">4</div></td><td><div class=\"dni-plaintext\">Microsoft.ML.Data.VBuffer&lt;System.Single&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ ]</div></td></tr></tbody></table></td></tr><tr><td>1</td><td>NormalizedFeatures</td><td><div class=\"dni-plaintext\">1</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>Dimensions</th><th>IsKnownSize</th><th>ItemType</th><th>Size</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 4 ]</div></td><td><div class=\"dni-plaintext\">True</div></td><td><div class=\"dni-plaintext\">{ Single: RawType: System.Single }</div></td><td><div class=\"dni-plaintext\">4</div></td><td><div class=\"dni-plaintext\">Microsoft.ML.Data.VBuffer&lt;System.Single&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ { IsNormalized: Boolean: Name: IsNormalized, Index: 0, IsHidden: False, Type: { Boolean: RawType: System.Boolean }, Annotations: { : Schema: [ ] } } ]</div></td></tr></tbody></table></td></tr><tr><td>2</td><td>PredictedLabel</td><td><div class=\"dni-plaintext\">2</div></td><td><div class=\"dni-plaintext\">True</div></td><td><table><thead><tr><th>Count</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">3</div></td><td><div class=\"dni-plaintext\">System.UInt32</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ { ScoreColumnKind: String: Name: ScoreColumnKind, Index: 0, IsHidden: False, Type: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Annotations: { : Schema: [ ] } }, { ScoreValueKind: String: Name: ScoreValueKind, Index: 1, IsHidden: False, Type: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Annotations: { : Schema: [ ] } }, { KeyValues: Vector&lt;String, 3&gt;: Name: KeyValues, Index: 2, IsHidden: False, Type: { Vector&lt;String, 3&gt;: Dimensions: [ 3 ], IsKnownSize: True, ItemType: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Size: 3, RawType: Microsoft.ML.Data.VBuffer&lt;System.ReadOnlyMemory&lt;System.Char&gt;&gt; }, Annotations: { : Schema: [ ] } }, { ScoreColumnSetId: Key&lt;UInt32, 0-2147483646&gt;: Name: ScoreColumnSetId, Index: 3, IsHidden: False, Type: { Key&lt;UInt32, 0-2147483646&gt;: Count: 2147483647, RawType: System.UInt32 }, Annotations: { : Schema: [ ] } } ]</div></td></tr></tbody></table></td></tr><tr><td>3</td><td>PredictedLabel</td><td><div class=\"dni-plaintext\">3</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">System.ReadOnlyMemory&lt;System.Char&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ ]</div></td></tr></tbody></table></td></tr><tr><td>4</td><td>Score</td><td><div class=\"dni-plaintext\">4</div></td><td><div class=\"dni-plaintext\">False</div></td><td><table><thead><tr><th>Dimensions</th><th>IsKnownSize</th><th>ItemType</th><th>Size</th><th>RawType</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 3 ]</div></td><td><div class=\"dni-plaintext\">True</div></td><td><div class=\"dni-plaintext\">{ Single: RawType: System.Single }</div></td><td><div class=\"dni-plaintext\">3</div></td><td><div class=\"dni-plaintext\">Microsoft.ML.Data.VBuffer&lt;System.Single&gt;</div></td></tr></tbody></table></td><td><table><thead><tr><th>Schema</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ { ScoreColumnKind: String: Name: ScoreColumnKind, Index: 0, IsHidden: False, Type: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Annotations: { : Schema: [ ] } }, { ScoreColumnSetId: Key&lt;UInt32, 0-2147483646&gt;: Name: ScoreColumnSetId, Index: 1, IsHidden: False, Type: { Key&lt;UInt32, 0-2147483646&gt;: Count: 2147483647, RawType: System.UInt32 }, Annotations: { : Schema: [ ] } }, { ScoreValueKind: String: Name: ScoreValueKind, Index: 2, IsHidden: False, Type: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Annotations: { : Schema: [ ] } }, { TrainingLabelValues: Vector&lt;String, 3&gt;: Name: TrainingLabelValues, Index: 3, IsHidden: False, Type: { Vector&lt;String, 3&gt;: Dimensions: [ 3 ], IsKnownSize: True, ItemType: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Size: 3, RawType: Microsoft.ML.Data.VBuffer&lt;System.ReadOnlyMemory&lt;System.Char&gt;&gt; }, Annotations: { : Schema: [ ] } }, { SlotNames: Vector&lt;String, 3&gt;: Name: SlotNames, Index: 4, IsHidden: False, Type: { Vector&lt;String, 3&gt;: Dimensions: [ 3 ], IsKnownSize: True, ItemType: { String: RawType: System.ReadOnlyMemory&lt;System.Char&gt; }, Size: 3, RawType: Microsoft.ML.Data.VBuffer&lt;System.ReadOnlyMemory&lt;System.Char&gt;&gt; }, Annotations: { : Schema: [ ] } } ]</div></td></tr></tbody></table></td></tr></tbody></table>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"batchPredictions.Schema"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using PredictionEngine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set scope to scoring\n",
"\n",
"This allows you to not pass in the label column."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var scoringModel = model.GetModelFor(TransformerScope.Scoring);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define model input and output schema"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"public class ModelInput\n",
"{\n",
" [VectorType(4)]\n",
" public float[] Features {get;set;}\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"public class ModelOutput\n",
"{\n",
" [VectorType(3)]\n",
" public float[] Score {get;set;}\n",
" \n",
" public string PredictedLabel{get;set;} \n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create prediction engine"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var predictionEngine = \n",
" ctx.Model.CreatePredictionEngine<ModelInput,ModelOutput>(scoringModel);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define sample input to predict"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var input = new ModelInput {Features = new [] {5.1f,3.5f,1.4f,0.2f}};"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make single prediction"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [],
"source": [
"var prediction = predictionEngine.Predict(input);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View prediction"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
},
"vscode": {
"languageId": "dotnet-interactive.csharp"
}
},
"outputs": [
{
"data": {
"text/html": [
"<table><thead><tr><th>Score</th><th>PredictedLabel</th></tr></thead><tbody><tr><td><div class=\"dni-plaintext\">[ 0.9999666, 3.3485834E-05, 2.356079E-22 ]</div></td><td><div class=\"dni-plaintext\">Iris-setosa</div></td></tr></tbody></table>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"prediction"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".NET (C#)",
"language": "C#",
"name": ".net-csharp"
},
"language_info": {
"name": "C#"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment