When looking at RuboCop
's code, I noticed a big number of frozen arrays being used only to later call include?
on them. This is O(n)
instead of O(1)
.
Trying to convert them to Set
s causes major compatibility issues, as well as very frustrating situations (See set.join
and array + set
) and where the fact that they are now Set
s makes them much less efficient (See array & set
).
Here are the improvements that would improve Set
s:
I'd like to add #join
to Set
:
# Now:
Set[1, 2, 3].join('x') # => NoMethodError
# After
Set[1, 2, 3].join('x') # => "1x2x3"
I'd make this join never recursive. (I've never wanted to use the recursivity of Array#join, but it may very well just be my particular experiences)
Reminder: Why there is no Enumerable#join
: https://bugs.ruby-lang.org/issues/1893
We have set <op> array
work fine:
Set[1] + [2] # => Set[1, 2]
Nothing works in the reverse order:
[1] + Set[2] # => no implicit conversion of Set into Array
# should be:
[1] + Set[2] # => [1, 2]
Note that the situation is particularly frustrating for &
and |
.
If someone wants to do ary & set
(and keep the results in the order of ary
), one has to do ary & set.to_a
which will, internally, do a to_set
, so we have set.to_a.to_set
!! (assuming ary
is over SMALL_ARRAY_LEN == 16
size, otherwise it's still O(n)
instead of O(1)
).
See order issue as to why this can not (officially) be done any other way.
Reminder:
ary & ary.reverse # => ary
Set[*ary] & Set[*ary.reverse] # => Set[*ary.reverse], officially order indeterminate
Officially, set elements have uncertain order. This predades when Hash started being ordered (Ruby 1.9.0, Xmas 2007). Sets have since been de-facto insertion-ordered. FYI, in those 13 years, there have been about 70 commits to lib/set.rb
.
I have the impression that a non-negligible amount of code in the wild rely on sets being ordered, at least under most circumstances. I feel that this should be officialized.
If sets are truly unordered, then why do we hesitate to make an optimization of &
and |
: https://bugs.ruby-lang.org/issues/15281
See also: https://bugs.ruby-lang.org/issues/14069
To create a set from hash keys currently implies a temporary array for all keys, rehashing all those keys and rebuilding a hash. Instead, the hash could be copied and its values set to true
.
h = {a: 1}
# Now:
Set.new(h.keys) # => Set[:a]
# After
h.key_set # => Set[:a], efficiently.
I would like a shorthand syntax for frozen sets of symbols or of strings.
%ws{hello world} # => Set['hello', 'world'].freeze
%is{hello world} # => Set[:hello, :world].freeze
We should consider these sets to return a unique frozen to_a
. The strings should be frozen. These literals would be created once at parse time (like regex):
def foo
p %ws{hello world}.object_id
end
foo
foo # => prints the same id twice
Reminder: Ruby has literal notations for Rational
and Complex
. I've sadly never had to use either.
I would venture to say that Complex
is much less used than Sets
.
Reminder: previous discussion for builtin syntax was not for frozen literal, strings or symbols specifically: https://bugs.ruby-lang.org/issues/5478
Quite minor, but Set#flatten
should be accept an argument to level the level of flattening.
Quite minor, but Set#<=>
should be defined.
The official stated reason for Set
to not implement is that some sets are not comparable. That is exactly what nil
result type is for.
Set[1] < Set[1, 2] # => true
[Set[1], Set[1, 2]] # => ArgumentError, should be [Set[1], Set[1, 2]]
[Set[1], Set[2]] # => ArgumentError, ok, not comparable
Great ideas!
I've often wondered by this myself.
Excellent observation! I never had to use those literals myself. I've rarely touched Sets in Ruby, but I use them all the time in other languages where they are first-class citizens.
I've been a big proponent of sets being promoted to Ruby's core. Unfortunately, as the issue indicates, there has been little upstream interest in this.