Last active
August 1, 2022 17:36
-
-
Save shaypal5/821b0bbd316d496f1e8a10ebca47b81e to your computer and use it in GitHub Desktop.
An example for an advanced initialization of a complex pdpipe pipeline for processing pandas dataframes. ๐ผ๐ฟ
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> mp = MyPipelineAndModel( | |
savings_max_val=101, | |
drop_gender=False, | |
standardize=True, | |
ohencode_country=True, | |
savings_bin_val=1, | |
pca_threshold=25, | |
fit_intercept=True) | |
>>> mp | |
<PdPipeline -> LogisticRegression> | |
>>> mp.estimator | |
LogisticRegression() | |
>>> mp.pipeline | |
A pdpipe pipeline: | |
[ 0] Drop columns Columns with at least 0.2 missing value rate | |
[ 1] Drop rows by label values | |
[ 2] Encode label values | |
[ 3] Drop columns 'Name' | |
[ 4] Apply dataframe method set_index with kwargs {'keys': 'id'} | |
[ 5] Drop rows by qualifier <RowQualifier: Qualify rows with X[Savings] > | |
101> | |
[ 6] Assign column Viking with df[Country].isin(['Denmark', 'Finland']) & | |
~df[Bearded] | |
[ 7] Assign column YearlyGrands with df[Savings] * 1000 / df[Age] | |
[ 8] Bin Savings by [1]. | |
[ 9] One-hot encode 'Country' | |
[10] Tokenize Quote | |
[11] Stemming tokens in Quote... | |
[12] Remove stopwords from Quote | |
[13] Count-vectorizing column Quote. | |
[14] Decompose columns Columns that start with Quote with PCA | |
[15] Encode 'Savings_bin', 'Gender' | |
[16] Scale columns Columns of dtypes <class 'numpy.number'> | |
[17] Drop columns 'Bearded' | |
[18] Transform input dataframes to the following schema: <Learnable Schema> | |
[19] Validates conditions |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment