Sets need ♥️

When looking at RuboCop's code, I noticed a big number of frozen arrays being used only to later call include? on them. This is O(n) instead of O(1).

Trying to convert them to Sets causes major compatibility issues, as well as very frustrating situations (See set.join and array + set) and where the fact that they are now Sets makes them much less efficient (See array & set).

Here are the improvements that would improve Sets:

Set#join

I'd like to add #join to Set:

# Now:
Set[1, 2, 3].join('x') # => NoMethodError
# After
Set[1, 2, 3].join('x') # => "1x2x3"

I'd make this join never recursive. (I've never wanted to use the recursivity of Array#join, but it may very well just be my particular experiences)

Reminder: Why there is no Enumerable#join: https://bugs.ruby-lang.org/issues/1893

Array `<operator>` Set

We have set <op> array work fine:

Set[1] + [2] # => Set[1, 2]

Nothing works in the reverse order:

[1] + Set[2] # => no implicit conversion of Set into Array
# should be:
[1] + Set[2] # => [1, 2]

Array & set

Note that the situation is particularly frustrating for & and |. If someone wants to do ary & set (and keep the results in the order of ary), one has to do ary & set.to_a which will, internally, do a to_set, so we have set.to_a.to_set!! (assuming ary is over SMALL_ARRAY_LEN == 16 size, otherwise it's still O(n) instead of O(1)).

See order issue as to why this can not (officially) be done any other way.

Reminder:

ary & ary.reverse # => ary
Set[*ary] & Set[*ary.reverse]  # => Set[*ary.reverse], officially order indeterminate

Element order

Officially, set elements have uncertain order. This predades when Hash started being ordered (Ruby 1.9.0, Xmas 2007). Sets have since been de-facto insertion-ordered. FYI, in those 13 years, there have been about 70 commits to lib/set.rb.

I have the impression that a non-negligible amount of code in the wild rely on sets being ordered, at least under most circumstances. I feel that this should be officialized.

If sets are truly unordered, then why do we hesitate to make an optimization of & and |: https://bugs.ruby-lang.org/issues/15281

Hash#key_set

To create a set from hash keys currently implies a temporary array for all keys, rehashing all those keys and rebuilding a hash. Instead, the hash could be copied and its values set to true.

h = {a: 1}
# Now:
Set.new(h.keys) # => Set[:a]
# After
h.key_set # => Set[:a], efficiently.

Frozen set literals

I would like a shorthand syntax for frozen sets of symbols or of strings.

%ws{hello world} # => Set['hello', 'world'].freeze
%is{hello world} # => Set[:hello, :world].freeze

We should consider these sets to return a unique frozen to_a. The strings should be frozen. These literals would be created once at parse time (like regex):

def foo
  p %ws{hello world}.object_id
end
foo
foo # => prints the same id twice

Reminder: Ruby has literal notations for Rational and Complex. I've sadly never had to use either. I would venture to say that Complex is much less used than Sets.

Reminder: previous discussion for builtin syntax was not for frozen literal, strings or symbols specifically: https://bugs.ruby-lang.org/issues/5478

flatten

Quite minor, but Set#flatten should be accept an argument to level the level of flattening.

<=> missing

Quite minor, but Set#<=> should be defined.

The official stated reason for Set to not implement is that some sets are not comparable. That is exactly what nil result type is for.

Set[1] < Set[1, 2] # => true
[Set[1], Set[1, 2]] # => ArgumentError, should be [Set[1], Set[1, 2]]
[Set[1], Set[2]] # => ArgumentError, ok, not comparable

Great ideas!

I have the impression that a non-negligible amount of code in the wild rely on sets being ordered, at least under most circumstances. I feel that this should be officialized.

I've often wondered by this myself.

Reminder: Ruby has literal notations for Rational and Complex. I've sadly never had to use either. I would venture to say that Complex is much less used than Sets.

Excellent observation! I never had to use those literals myself. I've rarely touched Sets in Ruby, but I use them all the time in other languages where they are first-class citizens.

Reminder: previous discussion for builtin syntax but nothing for strings and symbols specifically: https://bugs.ruby-lang.org/issues/5478

I've been a big proponent of sets being promoted to Ruby's core. Unfortunately, as the issue indicates, there has been little upstream interest in this.

marcandre/Set.md

Sets need ♥️

Set#join

Array `<operator>` Set

Array & set

Element order

Hash#key_set

Frozen set literals

flatten

<=> missing

bbatsov commented Jun 10, 2020

Uh oh!

marcandre/Set.md

Sets need ♥️

Set#join

Array <operator> Set

Array & set

Element order

Hash#key_set

Frozen set literals

flatten

<=> missing

bbatsov commented Jun 10, 2020

Uh oh!

Array `<operator>` Set