Last active
January 10, 2019 13:05
-
-
Save hlin117/dee443d98af5ac3413c5 to your computer and use it in GitHub Desktop.
MDLP discretization in python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A benchmark I'm using to show that my results from MDLP are correct | |
# Using the R package "discretization" for comparison | |
library(discretization) | |
data(iris) | |
mdlp(iris)$Disc.data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""NOTE: sklearn.preprocessing does not have MDLP yet. It is a feature | |
that I am trying to add in my pull request. | |
""" | |
from sklearn.datasets import load_iris | |
from sklearn.preprocessing import MDLP | |
iris = load_iris() | |
X = iris.data | |
Y = iris.target | |
"""My version of MDLP allows users to also specify which columns to | |
discretize. The R function assumes all columns are continuous. | |
I don't use this feature here, because the iris dataset contains only | |
continuous attributes. | |
""" | |
mdlp = MDLP() | |
conv_X = mdlp.fit_transform(X, Y) | |
"""Unlike the mdlp function in R, I allow users to specify the | |
minimum depth they would like to discretize, invalidating the | |
minimum description length principal. This could be useful sometimes. | |
The minimum depth will sometimes not be achieved, especially if the | |
algorithm can find a cut with zero entropy. | |
""" | |
mdlp = MDLP(min_depth=100) | |
conv_X = mdlp.fit_transform(X, Y) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 1 1 1] | |
[2 2 1 1] | |
[2 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[2 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 1 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[1 2 1 1] | |
[2 2 2 2] | |
[2 2 2 2] | |
[2 2 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 2 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[1 1 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 2 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 2 2 3] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 3 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[1 1 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 1 3 2] | |
[1 1 2 2] | |
[2 2 2 2] | |
[2 2 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[1 1 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[2 1 2 2] | |
[1 1 2 2] | |
[2 1 2 2] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[1 1 2 2] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 1 3 2] | |
[2 2 3 3] | |
[2 1 2 3] | |
[2 1 3 3] | |
[2 1 2 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 1 2 3] | |
[2 1 2 3] | |
[2 1 3 3] | |
[2 1 3 2] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 1 3 2] | |
[2 1 3 2] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 1 2 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 2 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 1 3 3] | |
[2 2 3 3] | |
[2 1 3 3]] |
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" | |
"1" 1 3 1 1 "setosa" | |
"2" 1 2 1 1 "setosa" | |
"3" 1 2 1 1 "setosa" | |
"4" 1 2 1 1 "setosa" | |
"5" 1 3 1 1 "setosa" | |
"6" 1 3 1 1 "setosa" | |
"7" 1 3 1 1 "setosa" | |
"8" 1 3 1 1 "setosa" | |
"9" 1 1 1 1 "setosa" | |
"10" 1 2 1 1 "setosa" | |
"11" 1 3 1 1 "setosa" | |
"12" 1 3 1 1 "setosa" | |
"13" 1 2 1 1 "setosa" | |
"14" 1 2 1 1 "setosa" | |
"15" 2 3 1 1 "setosa" | |
"16" 2 3 1 1 "setosa" | |
"17" 1 3 1 1 "setosa" | |
"18" 1 3 1 1 "setosa" | |
"19" 2 3 1 1 "setosa" | |
"20" 1 3 1 1 "setosa" | |
"21" 1 3 1 1 "setosa" | |
"22" 1 3 1 1 "setosa" | |
"23" 1 3 1 1 "setosa" | |
"24" 1 2 1 1 "setosa" | |
"25" 1 3 1 1 "setosa" | |
"26" 1 2 1 1 "setosa" | |
"27" 1 3 1 1 "setosa" | |
"28" 1 3 1 1 "setosa" | |
"29" 1 3 1 1 "setosa" | |
"30" 1 2 1 1 "setosa" | |
"31" 1 2 1 1 "setosa" | |
"32" 1 3 1 1 "setosa" | |
"33" 1 3 1 1 "setosa" | |
"34" 1 3 1 1 "setosa" | |
"35" 1 2 1 1 "setosa" | |
"36" 1 2 1 1 "setosa" | |
"37" 1 3 1 1 "setosa" | |
"38" 1 3 1 1 "setosa" | |
"39" 1 2 1 1 "setosa" | |
"40" 1 3 1 1 "setosa" | |
"41" 1 3 1 1 "setosa" | |
"42" 1 1 1 1 "setosa" | |
"43" 1 2 1 1 "setosa" | |
"44" 1 3 1 1 "setosa" | |
"45" 1 3 1 1 "setosa" | |
"46" 1 2 1 1 "setosa" | |
"47" 1 3 1 1 "setosa" | |
"48" 1 2 1 1 "setosa" | |
"49" 1 3 1 1 "setosa" | |
"50" 1 2 1 1 "setosa" | |
"51" 3 2 2 2 "versicolor" | |
"52" 3 2 2 2 "versicolor" | |
"53" 3 2 3 2 "versicolor" | |
"54" 1 1 2 2 "versicolor" | |
"55" 3 1 2 2 "versicolor" | |
"56" 2 1 2 2 "versicolor" | |
"57" 3 2 2 2 "versicolor" | |
"58" 1 1 2 2 "versicolor" | |
"59" 3 1 2 2 "versicolor" | |
"60" 1 1 2 2 "versicolor" | |
"61" 1 1 2 2 "versicolor" | |
"62" 2 2 2 2 "versicolor" | |
"63" 2 1 2 2 "versicolor" | |
"64" 2 1 2 2 "versicolor" | |
"65" 2 1 2 2 "versicolor" | |
"66" 3 2 2 2 "versicolor" | |
"67" 2 2 2 2 "versicolor" | |
"68" 2 1 2 2 "versicolor" | |
"69" 3 1 2 2 "versicolor" | |
"70" 2 1 2 2 "versicolor" | |
"71" 2 2 3 3 "versicolor" | |
"72" 2 1 2 2 "versicolor" | |
"73" 3 1 3 2 "versicolor" | |
"74" 2 1 2 2 "versicolor" | |
"75" 3 1 2 2 "versicolor" | |
"76" 3 2 2 2 "versicolor" | |
"77" 3 1 3 2 "versicolor" | |
"78" 3 2 3 2 "versicolor" | |
"79" 2 1 2 2 "versicolor" | |
"80" 2 1 2 2 "versicolor" | |
"81" 1 1 2 2 "versicolor" | |
"82" 1 1 2 2 "versicolor" | |
"83" 2 1 2 2 "versicolor" | |
"84" 2 1 3 2 "versicolor" | |
"85" 1 2 2 2 "versicolor" | |
"86" 2 3 2 2 "versicolor" | |
"87" 3 2 2 2 "versicolor" | |
"88" 3 1 2 2 "versicolor" | |
"89" 2 2 2 2 "versicolor" | |
"90" 1 1 2 2 "versicolor" | |
"91" 1 1 2 2 "versicolor" | |
"92" 2 2 2 2 "versicolor" | |
"93" 2 1 2 2 "versicolor" | |
"94" 1 1 2 2 "versicolor" | |
"95" 2 1 2 2 "versicolor" | |
"96" 2 2 2 2 "versicolor" | |
"97" 2 1 2 2 "versicolor" | |
"98" 3 1 2 2 "versicolor" | |
"99" 1 1 2 2 "versicolor" | |
"100" 2 1 2 2 "versicolor" | |
"101" 3 2 3 3 "virginica" | |
"102" 2 1 3 3 "virginica" | |
"103" 3 2 3 3 "virginica" | |
"104" 3 1 3 3 "virginica" | |
"105" 3 2 3 3 "virginica" | |
"106" 3 2 3 3 "virginica" | |
"107" 1 1 2 2 "virginica" | |
"108" 3 1 3 3 "virginica" | |
"109" 3 1 3 3 "virginica" | |
"110" 3 3 3 3 "virginica" | |
"111" 3 2 3 3 "virginica" | |
"112" 3 1 3 3 "virginica" | |
"113" 3 2 3 3 "virginica" | |
"114" 2 1 3 3 "virginica" | |
"115" 2 1 3 3 "virginica" | |
"116" 3 2 3 3 "virginica" | |
"117" 3 2 3 3 "virginica" | |
"118" 3 3 3 3 "virginica" | |
"119" 3 1 3 3 "virginica" | |
"120" 2 1 3 2 "virginica" | |
"121" 3 2 3 3 "virginica" | |
"122" 2 1 3 3 "virginica" | |
"123" 3 1 3 3 "virginica" | |
"124" 3 1 3 3 "virginica" | |
"125" 3 2 3 3 "virginica" | |
"126" 3 2 3 3 "virginica" | |
"127" 3 1 3 3 "virginica" | |
"128" 2 2 3 3 "virginica" | |
"129" 3 1 3 3 "virginica" | |
"130" 3 2 3 2 "virginica" | |
"131" 3 1 3 3 "virginica" | |
"132" 3 3 3 3 "virginica" | |
"133" 3 1 3 3 "virginica" | |
"134" 3 1 3 2 "virginica" | |
"135" 2 1 3 2 "virginica" | |
"136" 3 2 3 3 "virginica" | |
"137" 3 3 3 3 "virginica" | |
"138" 3 2 3 3 "virginica" | |
"139" 2 2 3 3 "virginica" | |
"140" 3 2 3 3 "virginica" | |
"141" 3 2 3 3 "virginica" | |
"142" 3 2 3 3 "virginica" | |
"143" 2 1 3 3 "virginica" | |
"144" 3 2 3 3 "virginica" | |
"145" 3 2 3 3 "virginica" | |
"146" 3 2 3 3 "virginica" | |
"147" 3 1 3 3 "virginica" | |
"148" 3 2 3 3 "virginica" | |
"149" 3 3 3 3 "virginica" | |
"150" 2 2 3 3 "virginica" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment