Skip to content

Instantly share code, notes, and snippets.

@jackrusher
Last active December 15, 2015 16:29
Show Gist options
  • Save jackrusher/5289206 to your computer and use it in GitHub Desktop.
Save jackrusher/5289206 to your computer and use it in GitHub Desktop.
Part II on entropy, working toward the ID3 algorithm.
;; Fill in this definition of seq-entropy that takes a second
;; parameter that's the function that returns 'true' when a given item
;; is positive.
(defn seq-entropy
"Calculate the entropy of sequence 'sq' assuming the all positive
numbers represent positive samples."
[sq pos-func]
;; ...
)
;; ... validate it against your bit strings by passing 'pos?' as the
;; discriminator function, then use it for the rest of this assignment.
;; next, we'll redefine entropy to the simplified measurement used in
;; the example we're pulling from
;; http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture11.html
(defn entropy
"Entropy(S) = —p+log2(p+)"
[p-pos]
(* (- p-pos) (log2 p-pos)))
;; play with this to get a feel for the range it produces, note that
;; the maximum score is now 0.5
;; (remember to change your seq-entropy function to use this new entropy API)
;; here's the old familiar weekend data
(def weekend-data
[[:weather :parents :money :decision] ;; column headings
[:sunny :yes :rich :cinema]
[:sunny :no :rich :tennis]
[:windy :yes :rich :cinema]
[:rainy :yes :poor :cinema]
[:rainy :no :rich :stay-in]
[:rainy :yes :poor :cinema]
[:windy :no :poor :cinema]
[:windy :no :rich :shopping]
[:windy :yes :rich :cinema]
[:sunny :no :rich :tennis]])
(rest weekend-data)
;; removes the first row, which is all headings
;; returns the last column, minus headings
(map last (rest weekend-data))
;; seq-entropy using the decision of :cinema as the positive
(seq-entropy (map last (rest weekend-data)) (partial = :cinema))
;; explain what (partial = :cinema) returns
;; seq-entropy for each distinct decision in decisions
;; (pause to appreciate the beauty of the notation)
(let [decisions (map last (rest weekend-data))]
(for [decision (distinct decisions)]
(seq-entropy decisions (partial = decision))))
;; => (0.44217935649972373 0.46438561897747244 0.33219280948873625
;; 0.33219280948873625)
;; ^ compare these values to those at the worked example here:
;; http://www.doc.ic.ac.uk/~sgc/teaching/pre2012/v231/lecture11.html
;; group rows by the value of the nth column, in this case 0 (weather)
(group-by #(nth % 0) (rest weekend-data))
;; press: C-u C-x C-e on the next uncommented line, inspect and explain what's
;; returned and why it's useful to us.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment