zouzias · December 15, 2017 10:57
diff --git a/gistfile1.txt b/gistfile1.txt
 See comment https://github.com/jpmml/jpmml-sklearn/issues/38 (vruusmann commented on Apr 19)

 If you want to apply one-hot-encoding to string columns, then you should simply use the sklearn.preprocessing.LabelBinarizer transformer class for that. It has exactly the same effect as a sequence of LabelEncoder followed by OneHotEncoder.

 mapper = DataFrameMapper([
  ("country_name", LabelBinarizer())
 ])
 The OneHotEncoder transformation makes sense if your input data contains categorical integer columns.

 Currently, sklearn_pandas.DataFrameMapper is unable to apply [LabelEncoder(), OneHotEncoder()] on a string column due to the above "matrix transpose" problem. You could additionally open an issue with the sklearn_pandas project, and ask for their opinion about it.

 It would be possible to make [LabelEncoder(), OneHotEncoder()] work by developing a custom Scikit-Learn transformer that handles "matrix transpose". For example, [LabelEncoder(), MatrixTransposer(), OneHotEncoder()]. This MatrixTransposer operation would be no-op from the PMML perspective.
	See comment https://github.com/jpmml/jpmml-sklearn/issues/38 (vruusmann commented on Apr 19)

	If you want to apply one-hot-encoding to string columns, then you should simply use the sklearn.preprocessing.LabelBinarizer transformer class for that. It has exactly the same effect as a sequence of LabelEncoder followed by OneHotEncoder.

	mapper = DataFrameMapper([
	("country_name", LabelBinarizer())
	])
	The OneHotEncoder transformation makes sense if your input data contains categorical integer columns.

	Currently, sklearn_pandas.DataFrameMapper is unable to apply [LabelEncoder(), OneHotEncoder()] on a string column due to the above "matrix transpose" problem. You could additionally open an issue with the sklearn_pandas project, and ask for their opinion about it.

	It would be possible to make [LabelEncoder(), OneHotEncoder()] work by developing a custom Scikit-Learn transformer that handles "matrix transpose". For example, [LabelEncoder(), MatrixTransposer(), OneHotEncoder()]. This MatrixTransposer operation would be no-op from the PMML perspective.