shaypal5 · July 9, 2022 16:06
diff --git a/pdpipe_2nd_look.py b/pdpipe_2nd_look.py
 >>> df = pd.DataFrame(
 ...   [[23, 'Jo', 45], [19, 'Bo', 72], [15, 'Di', 12], [5, 'Jo', 0]],
 ...   columns=['age', 'name', 'salary'])
 >>> df
   age name  salary
 0   23   Jo      45
 1   19   Bo      72
 2   15   Di      12
 3    5   Jo       0
 >>> pipeline = pdp.DropDuplicates('name').Bin({'salary': [0, 20, 50]}) \
 ...   + pdp.SetIndex('name').ColDrop('name')
 >>> pipeline
 A pdpipe pipeline:
 [ 0]  Drop duplicates in columns 'name'
 [ 1]  Bin salary by [0, 20, 50].
 [ 2]  Set indexes.
 [ 3]  Drop columns 'name'
 >>>  pipeline(df)
 FailedPreconditionError: Pipeline stage failed because not all columns 'name' were found in the input dataframe.
 The above exception was the direct cause of the following exception:
 ...
 PipelineApplicationError: Exception raised in stage [ 3] PdPipelineStage: Drop columns 'name'
 >>> pipeline[0:3](df, verbose=True)
 - Drop duplicates in columns 'name'
 1 rows dropped.
 - Bin salary by [0, 20, 50].
 salary: 100%|████████████████| 1/1 [00:00<00:00, 338.39it/s]
 - Set indexes.
      age salary
 name
 Jo     23  20-50
 Bo     19    50≤
 Di     15   0-20
	>>> df = pd.DataFrame(
	... [[23, 'Jo', 45], [19, 'Bo', 72], [15, 'Di', 12], [5, 'Jo', 0]],
	... columns=['age', 'name', 'salary'])
	>>> df
	age name salary
	0 23 Jo 45
	1 19 Bo 72
	2 15 Di 12
	3 5 Jo 0
	>>> pipeline = pdp.DropDuplicates('name').Bin({'salary': [0, 20, 50]}) \
	... + pdp.SetIndex('name').ColDrop('name')
	>>> pipeline
	A pdpipe pipeline:
	[ 0] Drop duplicates in columns 'name'
	[ 1] Bin salary by [0, 20, 50].
	[ 2] Set indexes.
	[ 3] Drop columns 'name'
	>>> pipeline(df)
	FailedPreconditionError: Pipeline stage failed because not all columns 'name' were found in the input dataframe.
	The above exception was the direct cause of the following exception:
	...
	PipelineApplicationError: Exception raised in stage [ 3] PdPipelineStage: Drop columns 'name'
	>>> pipeline[0:3](df, verbose=True)
	- Drop duplicates in columns 'name'
	1 rows dropped.
	- Bin salary by [0, 20, 50].
	salary: 100%\|████████████████\| 1/1 [00:00<00:00, 338.39it/s]
	- Set indexes.
	age salary
	name
	Jo 23 20-50
	Bo 19 50≤
	Di 15 0-20