Skip to content

Instantly share code, notes, and snippets.

@OnorioCatenacci
Last active December 4, 2022 08:25
Show Gist options
  • Save OnorioCatenacci/6314438 to your computer and use it in GitHub Desktop.
Save OnorioCatenacci/6314438 to your computer and use it in GitHub Desktop.
Machine Learning Dojo
// This F# dojo is directly inspired by the
// Digit Recognizer competition from Kaggle.com:
// http://www.kaggle.com/c/digit-recognizer
// The datasets below are simply shorter versions of
// the training dataset from Kaggle.
// The goal of the dojo will be to
// create a classifier that uses training data
// to recognize hand-written digits, and
// evaluate the quality of our classifier
// by looking at predictions on the validation data.
// This file provides some guidance through the problem:
// each section is numbered, and
// solves one piece you will need. Sections contain
// general instructions,
// [ YOUR CODE GOES HERE! ] tags where you should
// make the magic happen, and
// <F# QUICK-STARTER> blocks. These are small
// F# tutorials illustrating aspects of the
// syntax which could come in handy. Run them,
// see what happens, and tweak them to fit your goals!
// 0. GETTING READY
// Create a new F# Library project, and
// copy the entire contents of this file
// in "Script.fsx"
// <F# QUICK-STARTER>
// With F# Script files (.fsx) and F# Interactive,
// you can "live code" and see what happens.
// Try typing let x = 42 in the script file,
// right-click and select "Execute in interactive".
// let "binds" the value on the right to a name.
// Try now typing x + 3;; in the F# Interactive window.
// ';;' indicates "execute now whatever I just typed".
// Now right-click the following 2 lines and execute:
let greet name =
printfn "Hello, %s" name
// let also binds a name to a function.
// greet is a function with one argument, name.
// You should be able to run this in F# Interactive:
// greet "World";;
// </F# QUICK-STARTER>
// Then, load data files from the following location:
// training set of 5,000 examples:
// http://brandewinder.blob.core.windows.net/public/trainingsample.csv
// validation set of 500 examples, to test your model:
// http://brandewinder.blob.core.windows.net/public/validationsample.csv
// 1. GETTING SOME DATA
// First let's read the contents of "trainingsample.csv"
// We will need System and System.IO to work with files,
// let's right-click / run in interactive,
// to have these namespaces loaded:
open System
open System.IO
// the following might come in handy:
//File.ReadAllLines(path)
// returns an array of strings for each line
let trainingSample = File.ReadAllLines("/Users/Onorio_Development/Desktop/trainingsample.csv")
// 2. EXTRACTING COLUMNS
// Break each line of the file into an array of string,
// separating by commas, using Array.map
// <F# QUICK-STARTER>
// Array.map quick-starter:
// Array.map takes an array, and transforms it
// into another array by applying a function to it.
// Example: starting from an array of strings:
let strings = [| "Machine"; "Learning"; "with"; "F#"; "is"; "fun" |]
// we can transform it into a new array,
// containing the length of each string:
let lengths = Array.map (fun (s:string) -> s.Length) strings
// We can make it look nicer, using pipe-forward:
let lengths2 = strings |> Array.map (fun s -> s.Length)
// </F# QUICK-STARTER>
// the following function might help
let csvToSplit = "1,2,3,4,5"
let splitResult = csvToSplit.Split(',')
let splitByCommas fileString =
fileString |> Array.map(fun (s:string) -> s.Split(','))
let trainingArr = splitByCommas trainingSample
// 3. CLEANING UP HEADERS
// Did you note that the file has headers? We want to get rid of it.
// <F# QUICK-STARTER>
// Array slicing quick starter:
// let's start with an Array of ints:
let someNumbers = [| 0 .. 10 |] // create an array from 0 to 10
// you can access Array elements by index:
let first = someNumbers.[0]
// you can also slice the array:
let twoToFive = someNumbers.[ 1 .. 4 ] // grab a slice
let upToThree = someNumbers.[ .. 2 ]
// </F# QUICK-STARTER>
let removeHeaders = arr.[1..]
let noheaders = removeHeaders trainingArr
let myMath = function | Some x -> failwith "Screw you" | _ -> failwith "Whatever"
let myMath2 x =
match x with
| Some y -> failwith "Screw you"
| _ -> failwith "Whatever"
// 4. CONVERTING FROM STRINGS TO INTS
// Now that we have an array containing arrays of strings,
// and the headers are gone, we need to transform it
// into an array of arrays of integers.
// Array.map seems like a good idea again :)
// The following might help:
let castedInt = (int)"42"
// or, alternatively:
let convertedInt = Convert.ToInt32("42")
let convert (s:string) =
Convert.ToInt32 s
//let processOuterArray a =
// a |> processArray (
//noheaders |> Array.map (fun (a:Array) -> a |> Array.map Convert.ToInt32)
let toIntegers noheaders =
noheaders
|> Array.map (fun line ->
line |> Array.map (fun pix -> convert pix))
let integerPixels = toIntegers noheaders
// 5. CONVERTING ARRAYS TO RECORDS
// Rather than dealing with a raw array of ints,
// for convenience let's store these into an array of Records
// Record quick starter: we can declare a
// Record (a lightweight, immutable class) type that way:
type Example = { Label:int; Pixels:int[] }
// and instantiate one this way:
//let example = { Label = 1; Pixels = [| 1; 2; 3; |] }
type numberImage = { Number:int; Pixels:int[] }
let makeNumberImage (arrInt:int[]) = {Number = arrInt.[0]; Pixels = arrInt.[1..]}
let trainingSet = Array.map makeNumberImage integerPixels
// 6. COMPUTING DISTANCES
// We need to compute the distance between images
// Math reminder: the euclidean distance is
// distance [ x1; y1; z1 ] [ x2; y2; z2 ] =
// (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + (z1-z2)*(z1-z2) + ...
// <F# QUICK-STARTER>
// Array.map2 could come in handy here.
// Array.map2 quick start example
// Suppose we have 2 arrays:
let point1 = [| 0; 1; 2 |]
let point2 = [| 3; 4; 5 |]
// Array.map2 takes 2 arrays at a time
// and maps pairs of elements, for instance:
let map2Example =
Array.map2 (fun p1 p2 -> p1 + p2) point1 point2
// This simply computes the sums for point1 and point2,
// but we can easily turn this into a function now:
let map2PointsExample (P1: int[]) (P2: int[]) =
Array.map2 (fun p1 p2 -> p1 + p2) P1 P2
// </F# QUICK-STARTER>
// Having a function like
let distance (p1: int[]) (p2: int[]) = 42
// would come in very handy right now,
// except that in this case,
// 42 is likely not the right answer
let findImgDifference (image1Pixels:int[]) (image2Pixels:int[]) =
Array.map2(fun p1 p2 -> (p1 - p2) * (p1 - p2)) image1Pixels image2Pixels |> Array.sum
//printfn "%A" (findImgDifference trainingSet.[0].Pixels trainingSet.[1].Pixels)
// 7. WRITING THE CLASSIFIER FUNCTION
// We are now ready to write a classifier function!
// The classifier should take a set of pixels
// (an array of ints) as an input, search for the
// closest example in our sample, and predict
// the value of that closest element.
// <F# QUICK-STARTER>
// Array.minBy can be handy here, to find
// the closest element in the Array of examples.
// Array.minBy quick start:
// suppose we have an Array of Example:
let someData =
[| { Label = 0; Pixels = [| 0; 1 |] };
{ Label = 1; Pixels = [| 9; 2 |] };
{ Label = 2; Pixels = [| 3; 4 |] }; |]
// We can find for instance
// the element with largest first pixel
let findThatGuy =
someData
|> Array.maxBy (fun x -> x.Pixels.[0])
// </F# QUICK-STARTER>
// <F# QUICK-STARTER>
// F# and closures work very well together
let immutableValue = 42
let functionWithClosure (x: int) =
if x > immutableValue // using outside value
then true
else false
// </F# QUICK-STARTER>
// The classifier function should probably
// look like this - except that this one will
// classify everything as a 0:
let classify (unknown:int[]) =
// do something smart here
// like find the Example with
// the shortest distance to
// the unknown element...
// and use the training examples
// in a closure...
let closestImg = trainingSet
|> Array.minBy (fun trainingItem -> findImgDifference unknown trainingItem.Pixels)
closestImg.Number
// [ YOUR CODE GOES HERE! ]
// 8. EVALUATING THE MODEL AGAINST VALIDATION DATA
// Now that we have a classifier, we need to check
// how good it is.
// This is where the 2nd file, validationsample.csv,
// comes in handy. For each Example in the 2nd file,
// we know what the true Label is, so we can compare
// that value with what the classifier says.
// You could now check for each 500 example in that file
// whether your classifier returns the correct answer,
// and compute the % correctly predicted.
let validationSample = File.ReadAllLines("/Users/Onorio_Development/Desktop/validationsample.csv")
let validationImages = (splitByCommas validationSample) >> removeHeaders
let array = removeHeaders validationImages
//>> removeHeaders
//>> toIntegers
//>> Array.map makeNumberImage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment