Skip to content

Instantly share code, notes, and snippets.

@hlin117
Last active January 10, 2019 13:05
Show Gist options
  • Save hlin117/dee443d98af5ac3413c5 to your computer and use it in GitHub Desktop.
Save hlin117/dee443d98af5ac3413c5 to your computer and use it in GitHub Desktop.
MDLP discretization in python
# A benchmark I'm using to show that my results from MDLP are correct
# Using the R package "discretization" for comparison
library(discretization)
data(iris)
mdlp(iris)$Disc.data
#!/usr/bin/env python
"""NOTE: sklearn.preprocessing does not have MDLP yet. It is a feature
that I am trying to add in my pull request.
"""
from sklearn.datasets import load_iris
from sklearn.preprocessing import MDLP
iris = load_iris()
X = iris.data
Y = iris.target
"""My version of MDLP allows users to also specify which columns to
discretize. The R function assumes all columns are continuous.
I don't use this feature here, because the iris dataset contains only
continuous attributes.
"""
mdlp = MDLP()
conv_X = mdlp.fit_transform(X, Y)
"""Unlike the mdlp function in R, I allow users to specify the
minimum depth they would like to discretize, invalidating the
minimum description length principal. This could be useful sometimes.
The minimum depth will sometimes not be achieved, especially if the
algorithm can find a cut with zero entropy.
"""
mdlp = MDLP(min_depth=100)
conv_X = mdlp.fit_transform(X, Y)
[[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 1 1 1]
[2 2 1 1]
[2 2 1 1]
[1 2 1 1]
[1 2 1 1]
[2 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 1 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[1 2 1 1]
[2 2 2 2]
[2 2 2 2]
[2 2 2 2]
[1 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 2 2 2]
[1 1 2 2]
[2 1 2 2]
[1 1 2 2]
[1 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 2 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 2 2 3]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 3 2]
[2 1 2 2]
[2 1 2 2]
[1 1 2 2]
[1 1 2 2]
[2 1 2 2]
[2 1 3 2]
[1 1 2 2]
[2 2 2 2]
[2 2 2 2]
[2 1 2 2]
[2 1 2 2]
[1 1 2 2]
[1 1 2 2]
[2 1 2 2]
[2 1 2 2]
[1 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[2 1 2 2]
[1 1 2 2]
[2 1 2 2]
[2 2 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[1 1 2 2]
[2 1 3 3]
[2 1 3 3]
[2 2 3 3]
[2 2 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[2 2 3 3]
[2 1 3 3]
[2 2 3 3]
[2 1 3 3]
[2 1 3 2]
[2 2 3 3]
[2 1 2 3]
[2 1 3 3]
[2 1 2 3]
[2 2 3 3]
[2 2 3 3]
[2 1 2 3]
[2 1 2 3]
[2 1 3 3]
[2 1 3 2]
[2 1 3 3]
[2 2 3 3]
[2 1 3 3]
[2 1 3 2]
[2 1 3 2]
[2 1 3 3]
[2 2 3 3]
[2 2 3 3]
[2 1 2 3]
[2 2 3 3]
[2 2 3 3]
[2 2 3 3]
[2 1 3 3]
[2 2 3 3]
[2 2 3 3]
[2 1 3 3]
[2 1 3 3]
[2 1 3 3]
[2 2 3 3]
[2 1 3 3]]
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
"1" 1 3 1 1 "setosa"
"2" 1 2 1 1 "setosa"
"3" 1 2 1 1 "setosa"
"4" 1 2 1 1 "setosa"
"5" 1 3 1 1 "setosa"
"6" 1 3 1 1 "setosa"
"7" 1 3 1 1 "setosa"
"8" 1 3 1 1 "setosa"
"9" 1 1 1 1 "setosa"
"10" 1 2 1 1 "setosa"
"11" 1 3 1 1 "setosa"
"12" 1 3 1 1 "setosa"
"13" 1 2 1 1 "setosa"
"14" 1 2 1 1 "setosa"
"15" 2 3 1 1 "setosa"
"16" 2 3 1 1 "setosa"
"17" 1 3 1 1 "setosa"
"18" 1 3 1 1 "setosa"
"19" 2 3 1 1 "setosa"
"20" 1 3 1 1 "setosa"
"21" 1 3 1 1 "setosa"
"22" 1 3 1 1 "setosa"
"23" 1 3 1 1 "setosa"
"24" 1 2 1 1 "setosa"
"25" 1 3 1 1 "setosa"
"26" 1 2 1 1 "setosa"
"27" 1 3 1 1 "setosa"
"28" 1 3 1 1 "setosa"
"29" 1 3 1 1 "setosa"
"30" 1 2 1 1 "setosa"
"31" 1 2 1 1 "setosa"
"32" 1 3 1 1 "setosa"
"33" 1 3 1 1 "setosa"
"34" 1 3 1 1 "setosa"
"35" 1 2 1 1 "setosa"
"36" 1 2 1 1 "setosa"
"37" 1 3 1 1 "setosa"
"38" 1 3 1 1 "setosa"
"39" 1 2 1 1 "setosa"
"40" 1 3 1 1 "setosa"
"41" 1 3 1 1 "setosa"
"42" 1 1 1 1 "setosa"
"43" 1 2 1 1 "setosa"
"44" 1 3 1 1 "setosa"
"45" 1 3 1 1 "setosa"
"46" 1 2 1 1 "setosa"
"47" 1 3 1 1 "setosa"
"48" 1 2 1 1 "setosa"
"49" 1 3 1 1 "setosa"
"50" 1 2 1 1 "setosa"
"51" 3 2 2 2 "versicolor"
"52" 3 2 2 2 "versicolor"
"53" 3 2 3 2 "versicolor"
"54" 1 1 2 2 "versicolor"
"55" 3 1 2 2 "versicolor"
"56" 2 1 2 2 "versicolor"
"57" 3 2 2 2 "versicolor"
"58" 1 1 2 2 "versicolor"
"59" 3 1 2 2 "versicolor"
"60" 1 1 2 2 "versicolor"
"61" 1 1 2 2 "versicolor"
"62" 2 2 2 2 "versicolor"
"63" 2 1 2 2 "versicolor"
"64" 2 1 2 2 "versicolor"
"65" 2 1 2 2 "versicolor"
"66" 3 2 2 2 "versicolor"
"67" 2 2 2 2 "versicolor"
"68" 2 1 2 2 "versicolor"
"69" 3 1 2 2 "versicolor"
"70" 2 1 2 2 "versicolor"
"71" 2 2 3 3 "versicolor"
"72" 2 1 2 2 "versicolor"
"73" 3 1 3 2 "versicolor"
"74" 2 1 2 2 "versicolor"
"75" 3 1 2 2 "versicolor"
"76" 3 2 2 2 "versicolor"
"77" 3 1 3 2 "versicolor"
"78" 3 2 3 2 "versicolor"
"79" 2 1 2 2 "versicolor"
"80" 2 1 2 2 "versicolor"
"81" 1 1 2 2 "versicolor"
"82" 1 1 2 2 "versicolor"
"83" 2 1 2 2 "versicolor"
"84" 2 1 3 2 "versicolor"
"85" 1 2 2 2 "versicolor"
"86" 2 3 2 2 "versicolor"
"87" 3 2 2 2 "versicolor"
"88" 3 1 2 2 "versicolor"
"89" 2 2 2 2 "versicolor"
"90" 1 1 2 2 "versicolor"
"91" 1 1 2 2 "versicolor"
"92" 2 2 2 2 "versicolor"
"93" 2 1 2 2 "versicolor"
"94" 1 1 2 2 "versicolor"
"95" 2 1 2 2 "versicolor"
"96" 2 2 2 2 "versicolor"
"97" 2 1 2 2 "versicolor"
"98" 3 1 2 2 "versicolor"
"99" 1 1 2 2 "versicolor"
"100" 2 1 2 2 "versicolor"
"101" 3 2 3 3 "virginica"
"102" 2 1 3 3 "virginica"
"103" 3 2 3 3 "virginica"
"104" 3 1 3 3 "virginica"
"105" 3 2 3 3 "virginica"
"106" 3 2 3 3 "virginica"
"107" 1 1 2 2 "virginica"
"108" 3 1 3 3 "virginica"
"109" 3 1 3 3 "virginica"
"110" 3 3 3 3 "virginica"
"111" 3 2 3 3 "virginica"
"112" 3 1 3 3 "virginica"
"113" 3 2 3 3 "virginica"
"114" 2 1 3 3 "virginica"
"115" 2 1 3 3 "virginica"
"116" 3 2 3 3 "virginica"
"117" 3 2 3 3 "virginica"
"118" 3 3 3 3 "virginica"
"119" 3 1 3 3 "virginica"
"120" 2 1 3 2 "virginica"
"121" 3 2 3 3 "virginica"
"122" 2 1 3 3 "virginica"
"123" 3 1 3 3 "virginica"
"124" 3 1 3 3 "virginica"
"125" 3 2 3 3 "virginica"
"126" 3 2 3 3 "virginica"
"127" 3 1 3 3 "virginica"
"128" 2 2 3 3 "virginica"
"129" 3 1 3 3 "virginica"
"130" 3 2 3 2 "virginica"
"131" 3 1 3 3 "virginica"
"132" 3 3 3 3 "virginica"
"133" 3 1 3 3 "virginica"
"134" 3 1 3 2 "virginica"
"135" 2 1 3 2 "virginica"
"136" 3 2 3 3 "virginica"
"137" 3 3 3 3 "virginica"
"138" 3 2 3 3 "virginica"
"139" 2 2 3 3 "virginica"
"140" 3 2 3 3 "virginica"
"141" 3 2 3 3 "virginica"
"142" 3 2 3 3 "virginica"
"143" 2 1 3 3 "virginica"
"144" 3 2 3 3 "virginica"
"145" 3 2 3 3 "virginica"
"146" 3 2 3 3 "virginica"
"147" 3 1 3 3 "virginica"
"148" 3 2 3 3 "virginica"
"149" 3 3 3 3 "virginica"
"150" 2 2 3 3 "virginica"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment