Skip to content

Instantly share code, notes, and snippets.

@fdncred
Last active August 31, 2023 19:11
Show Gist options
  • Save fdncred/fbafbacd9e23342a65abdf9ece6c8dbe to your computer and use it in GitHub Desktop.
Save fdncred/fbafbacd9e23342a65abdf9ece6c8dbe to your computer and use it in GitHub Desktop.
inferring values

Inferring nushell values proposal

Problem

A common task in nushell is to create a pipeline and then go back and update each column with into blah, where blah is the nushell value you want to update it with. While this is helpful, it's a bit cumbersome. We do have some shortcuts though. e.g. If all the columns you want to update are ints, you can just do $table | into int col1 col2 col3. That's great if you have a lot of the same type. However, that doesn't happen all the time.

Solution

What I'd like to introduce is a method of inferring nushell value type in each cell. I've thought about doign this a few different ways.

Options

  1. Add an flag to the table command like table --infer and that would go through each cell and try and infer what it is.
  2. Another option that I've thought of is a new command called into value that essentially does the same thing.

A more general use case

One place this would really come in handy is when we have to use detect columns or from ssv. Those commands are used frequently when consuming the output of traditional non-nushell external tools. Here's an example I created for someone the other day who was longingly missing the bash ls output. So, I created this one-liner.

^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | update column5 {|r| $r.column5 | into datetime} | into int column1 column3 | into filesize column4 | rename perms links inode filesize datetime name

That is helpful but all the update blah blah blah is noise that I'd love to remove with something like this

^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | into values | rename perms links inod filesize datetime name

or

^ls -lh | detect columns --no-headers --skip 1 --combine-columns 5..7 | table --infer | rename perms links inod filesize datetime name

How do we do this

The thought of iterating through each cell makes me think of 1 word; slow. It would be nice to do it for columns but nushell doesn't require columnar datatypes. By the way, this is what polars does. It can infer datatypes per column.

I'm wondering if we could/should have some value precednece where you try value conversion in a certain order and then default to string when nothing works.

There's also the consideration that something can be represented in multiple value types. For instance, a unix epoch is an int and can be shown that way, but more often, you want to see them as a datetime value. I also think bools can be 1 or 0 representing true or false, at least into bool will do that conversion for you.

Another idea I had was to use cached regular expressions to determine nushell value/datatype. I was thinking this is what polars does but I need to look closer at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment