Skip to content

Instantly share code, notes, and snippets.

@perryism
Last active April 7, 2023 00:55
Show Gist options
  • Save perryism/42784bad94c338b28238596949586ddd to your computer and use it in GitHub Desktop.
Save perryism/42784bad94c338b28238596949586ddd to your computer and use it in GitHub Desktop.
Convert Xgboost model to PMML 4.3 version

Overview

We are developing a model which needs to be integrated into an external system where it only supports PMML 4.3

Challenge

We were using the latest xgboost model and the latest sklearn2pmml and it could only produce PMML in 4.4 version.

These are some errors that we were getting

SEVERE: Failed to parse PKL
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
SEVERE: Failed to parse learner
java.io.IOException: Expected 27-element array of zeroes, got [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Downgrades

There are components that need to be downgraded.

  1. xgboost

This post suggests that xgboost version needs to be in 1.0.0

  1. sklearn2pmml

This post mentions 4.4 was introduced sometime in May 2020. I just looked through the releases and found a jar close to that date.

FROM python
RUN apt-get update
RUN apt-get install -y openjdk-11-jdk
RUN python -m pip install --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org sklearn xgboost==1.0.0 sklearn2pmml ipython
RUN wget https://github.com/jpmml/jpmml-xgboost/releases/download/1.4.0/jpmml-xgboost-executable-1.4.0.jar
RUN python train.py
RUN java -jar jpmml-xgboost-executable-1.3.16.jar --model-input iris.model --fmap-input Audit.fmap --target-name Adjusted --pmml-output XGBoostAudit.pmml
from sklearn import datasets
from sklearn import metrics
import pandas as pd
from sklearn import svm
from xgboost import XGBClassifier
iris = datasets.load_iris() #dataset loading
clf = XGBClassifier()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[iris["feature_names"]], df['target'], test_size=0.2, random_state=42)
clf.fit(X_train, y_train)
clf.save_model("iris.model")
from sklearn2pmml.xgboost import make_feature_map
audit_fmap = make_feature_map(df, enable_categorical = False)
audit_fmap.save("Audit.fmap")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment