Skip to content

Instantly share code, notes, and snippets.

@shaypal5
Last active July 9, 2022 16:06
Show Gist options
  • Save shaypal5/777c0c4dbe4d40313fac0f9f16119cc5 to your computer and use it in GitHub Desktop.
Save shaypal5/777c0c4dbe4d40313fac0f9f16119cc5 to your computer and use it in GitHub Desktop.
Another minimal example of some pdpipe features.
>>> df = pd.DataFrame(
... [[23, 'Jo', 45], [19, 'Bo', 72], [15, 'Di', 12], [5, 'Jo', 0]],
... columns=['age', 'name', 'salary'])
>>> df
age name salary
0 23 Jo 45
1 19 Bo 72
2 15 Di 12
3 5 Jo 0
>>> pipeline = pdp.DropDuplicates('name').Bin({'salary': [0, 20, 50]}) \
... + pdp.SetIndex('name').ColDrop('name')
>>> pipeline
A pdpipe pipeline:
[ 0] Drop duplicates in columns 'name'
[ 1] Bin salary by [0, 20, 50].
[ 2] Set indexes.
[ 3] Drop columns 'name'
>>> pipeline(df)
FailedPreconditionError: Pipeline stage failed because not all columns 'name' were found in the input dataframe.
The above exception was the direct cause of the following exception:
...
PipelineApplicationError: Exception raised in stage [ 3] PdPipelineStage: Drop columns 'name'
>>> pipeline[0:3](df, verbose=True)
- Drop duplicates in columns 'name'
1 rows dropped.
- Bin salary by [0, 20, 50].
salary: 100%|████████████████| 1/1 [00:00<00:00, 338.39it/s]
- Set indexes.
age salary
name
Jo 23 20-50
Bo 19 50≤
Di 15 0-20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment