Last active
December 15, 2015 16:29
-
-
Save jackrusher/5289206 to your computer and use it in GitHub Desktop.
Part II on entropy, working toward the ID3 algorithm.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; Fill in this definition of seq-entropy that takes a second | |
;; parameter that's the function that returns 'true' when a given item | |
;; is positive. | |
(defn seq-entropy | |
"Calculate the entropy of sequence 'sq' assuming the all positive | |
numbers represent positive samples." | |
[sq pos-func] | |
;; ... | |
) | |
;; ... validate it against your bit strings by passing 'pos?' as the | |
;; discriminator function, then use it for the rest of this assignment. | |
;; next, we'll redefine entropy to the simplified measurement used in | |
;; the example we're pulling from | |
;; http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture11.html | |
(defn entropy | |
"Entropy(S) = —p+log2(p+)" | |
[p-pos] | |
(* (- p-pos) (log2 p-pos))) | |
;; play with this to get a feel for the range it produces, note that | |
;; the maximum score is now 0.5 | |
;; (remember to change your seq-entropy function to use this new entropy API) | |
;; here's the old familiar weekend data | |
(def weekend-data | |
[[:weather :parents :money :decision] ;; column headings | |
[:sunny :yes :rich :cinema] | |
[:sunny :no :rich :tennis] | |
[:windy :yes :rich :cinema] | |
[:rainy :yes :poor :cinema] | |
[:rainy :no :rich :stay-in] | |
[:rainy :yes :poor :cinema] | |
[:windy :no :poor :cinema] | |
[:windy :no :rich :shopping] | |
[:windy :yes :rich :cinema] | |
[:sunny :no :rich :tennis]]) | |
(rest weekend-data) | |
;; removes the first row, which is all headings | |
;; returns the last column, minus headings | |
(map last (rest weekend-data)) | |
;; seq-entropy using the decision of :cinema as the positive | |
(seq-entropy (map last (rest weekend-data)) (partial = :cinema)) | |
;; explain what (partial = :cinema) returns | |
;; seq-entropy for each distinct decision in decisions | |
;; (pause to appreciate the beauty of the notation) | |
(let [decisions (map last (rest weekend-data))] | |
(for [decision (distinct decisions)] | |
(seq-entropy decisions (partial = decision)))) | |
;; => (0.44217935649972373 0.46438561897747244 0.33219280948873625 | |
;; 0.33219280948873625) | |
;; ^ compare these values to those at the worked example here: | |
;; http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture11.html | |
;; group rows by the value of the nth column, in this case 0 (weather) | |
(group-by #(nth % 0) (rest weekend-data)) | |
;; press: C-u C-x C-e on the next uncommented line, inspect and explain what's | |
;; returned and why it's useful to us. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment