-
-
Save ChronoJon/cee8de9e8e34c37d58d288ff0f97e6ab to your computer and use it in GitHub Desktop.
This is how I would do it. I have resorted to only using
.where
(ornp.where
ornp.select
):(df .droplevel(axis='columns', level=0) .assign(subassign=-1) .assign(subassign=lambda df_: df_.subassign.where( (df_.B >= 5) | (df_.B.isna()), df_.A.str.split('-').str[-1] ) .astype(int) ) )
Ok, but you just have postponed the .astype
operation after the .mask
method call is finished. mask
and where
are just the same method with inverse logic.
This would be equivalent:
df_mask
.assign(subassign=-1)
.mask(
lambda df: df.B < 5,
lambda df: df.assign(subassign=lambda df_: df_.A.str.split("-").str[-1])
)
.astype(dict(subassign=int))
This was just an example of a problem that occurs, when using chain operations, where you want to change a subset of the data with a transformation, that would result in an error on data not part of that subset.
Furthermore, the transformation is called on the whole dataframe, even if you change a relatively small part of it and most the transformation would be thrown away. This is especially wasteful with any kind of string operation as showcased here (because you leave numpy land and are working in the python domain).
Direct mutation is clearly superior here:
- No unnecessary calculations are performed
- You don't have to change the operation because it throws an error in unrelated parts of the dataframe. Thus you only have to think about the parts you want to change.
WRT
.assign
on hierarchical columns. You are correct, it doesn't appear that.assign
can create the inner column name. I guess I've never ran into this as I try to flatten columns ASAP. 🤷♀️
I really don't understand this sentiment, but it is the not first time, I've read it. Hierarchical columns can be useful for grouping related data. Otherwise you would have to use multiple dataframes and SQL
like association dataframes or resort to ugly filter calls to select these groups.
In my view, the only problem with it is, that it's not well supported in panda's functional API. One could provide something like an .assign_map
method (analogous to str.format_map
) with pandas_flavor or similar.
My sentiment is that I've (sample size 1, but consulted with Pandas, used Pandas for years, and taught Pandas to thousands) never had a need for this. I'm not saying it might not happen. But perhaps that is why support is lacking... 🤷♀️
WRT
.assign
on hierarchical columns. You are correct, it doesn't appear that.assign
can create the inner column name. I guess I've never ran into this as I try to flatten columns ASAP. 🤷♀️