A common task in nushell is to create a pipeline and then go back and update each column with into blah
, where blah
is the nushell value you want to update it with. While this is helpful, it's a bit cumbersome. We do have some shortcuts though. e.g. If all the columns you want to update are ints, you can just do $table | into int col1 col2 col3
. That's great if you have a lot of the same type. However, that doesn't happen all the time.
What I'd like to introduce is a method of inferring nushell value type in each cell. I've thought about doign this a few different ways.
- Add an flag to the
table
command liketable --infer
and that would go through each cell and try and infer what it is. - Another option that I've thought of is a new command called
into value
that essentially does the same thing.
One place this would really come in handy is when we have to use detect columns
or from ssv
. Those commands are used frequently when consuming the output of traditional non-nushell external tools. Here's an example I created for someone the other day who was longingly missing the bash ls
output. So, I created this one-liner.
^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | update column5 {|r| $r.column5 | into datetime} | into int column1 column3 | into filesize column4 | rename perms links inode filesize datetime name
That is helpful but all the update blah blah blah
is noise that I'd love to remove with something like this
^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | into values | rename perms links inod filesize datetime name
or
^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | table --infer | rename perms links inod filesize datetime name
The thought of iterating through each cell makes me think of 1 word; slow. It would be nice to do it for columns but nushell doesn't require columnar datatypes. By the way, this is what polars does. It can infer datatypes per column.
I'm wondering if we could/should have some value precednece where you try value conversion in a certain order and then default to string when nothing works.
There's also the consideration that something can be represented in multiple value types. For instance, a unix epoch is an int and can be shown that way, but more often, you want to see them as a datetime value. I also think bools can be 1 or 0 representing true or false, at least into bool
will do that conversion for you.
Another idea I had was to use cached regular expressions to determine nushell value/datatype. I was thinking this is what polars does but I need to look closer at it.