Skip to content

Instantly share code, notes, and snippets.

@shaypal5
Last active August 1, 2022 17:36
Show Gist options
  • Save shaypal5/821b0bbd316d496f1e8a10ebca47b81e to your computer and use it in GitHub Desktop.
Save shaypal5/821b0bbd316d496f1e8a10ebca47b81e to your computer and use it in GitHub Desktop.
An example for an advanced initialization of a complex pdpipe pipeline for processing pandas dataframes. ๐Ÿผ๐Ÿšฟ
>>> mp = MyPipelineAndModel(
savings_max_val=101,
drop_gender=False,
standardize=True,
ohencode_country=True,
savings_bin_val=1,
pca_threshold=25,
fit_intercept=True)
>>> mp
<PdPipeline -> LogisticRegression>
>>> mp.estimator
LogisticRegression()
>>> mp.pipeline
A pdpipe pipeline:
[ 0] Drop columns Columns with at least 0.2 missing value rate
[ 1] Drop rows by label values
[ 2] Encode label values
[ 3] Drop columns 'Name'
[ 4] Apply dataframe method set_index with kwargs {'keys': 'id'}
[ 5] Drop rows by qualifier <RowQualifier: Qualify rows with X[Savings] >
101>
[ 6] Assign column Viking with df[Country].isin(['Denmark', 'Finland']) &
~df[Bearded]
[ 7] Assign column YearlyGrands with df[Savings] * 1000 / df[Age]
[ 8] Bin Savings by [1].
[ 9] One-hot encode 'Country'
[10] Tokenize Quote
[11] Stemming tokens in Quote...
[12] Remove stopwords from Quote
[13] Count-vectorizing column Quote.
[14] Decompose columns Columns that start with Quote with PCA
[15] Encode 'Savings_bin', 'Gender'
[16] Scale columns Columns of dtypes <class 'numpy.number'>
[17] Drop columns 'Bearded'
[18] Transform input dataframes to the following schema: <Learnable Schema>
[19] Validates conditions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment