Inspired by Lucian Ghinda's article about code review/refactoring with AI, just an experimenting on looking at the code the way I usually do that.
It is not an attempt to demonstrate "how everybody should write code," rather a way of sharing how I approach code style and refatoring. (I.e. not to communicate "you should do this," but just "here are some techniques and considerations I use, maybe you'll find it interesting").
So...
Here's the original code:
class Parser
def initialize(list, rubygems_client: RubyGems::Client.new)
@list = list
@rubygems_client = rubygems_client
end
def parse
return parse_gemfile_format if gemfile_format?
parse_single_line_format
end
private
attr_reader :list, :rubygems_client
def gemfile_format? = lines.any? { gem_method_call?(_1) }
def parse_gemfile_format
ast = Prism.parse(list)
root_node = ast.value
parsed_gems = []
root_node.statements.body.each do |call_node|
case call_node.name
when :gem
name = gem_name_from_call_node(call_node)
parsed_gems << name
when :group
names = gem_names_from_group(call_node)
parsed_gems.concat(names) if names.present?
end
end
parsed_gems.compact_blank.map { rubygems_client.info(_1) }
end
def parse_single_line_format
lines.filter_map do |line|
gem_name = line.strip
gem_info = rubygems_client.info(gem_name) unless gem_name.empty?
gem_info
end.compact_blank
end
def gem_names_from_group(call_node)
statements_node = call_node&.block&.body
statements_node.body.map { gem_name_from_call_node(_1) }
end
def gem_name_from_call_node(call_node) = call_node&.arguments&.arguments&.first&.unescaped
def gem_method_call?(line) = line.strip.include?("gem ")
def lines = list.lines
end
Concern 1: I don't like guard conditions to branch by equally possible branches. I prefer to use them to, indeed, guard something (like "if the argument is wrong, quick raise" or "if some exceptional condition happens, quick return"), but in cases like this I prefer bare if
to empahsize "here are two equally possible/interesting branches."
def parse
if gemfile_format?
parse_gemfile_format
else
parse_single_line_format
end
end
Next, my favorite technique to understand, and maybe adjust, relatively complex algorithmic code, is to start with one method. Once upon a time I was a proponent of the "many small methods with descriptive names" culture, but eventually I started to believe that (reasonably) big methods that try to tell the entire story is at least a good starting point to see the whole story.
So... Let's see how it looks:
def parse
if lines.any? { gem_method_call?(_1) }
ast = Prism.parse(list)
root_node = ast.value
parsed_gems = []
root_node.statements.body.each do |call_node|
case call_node.name
when :gem
name = gem_name_from_call_node(call_node)
parsed_gems << name
when :group
names = gem_names_from_group(call_node)
parsed_gems.concat(names) if names.present?
end
end
parsed_gems.compact_blank.map { rubygems_client.info(_1) }
else
lines.filter_map do |line|
gem_name = line.strip
gem_info = rubygems_client.info(gem_name) unless gem_name.empty?
gem_info
end.compact_blank
end
end
This is so far just one round of flattening (three main methods inlined), let's take this a bit further.
Having a condition:
if lines.any? { gem_method_call?(_1) }
...and embedding the call:
if lines.any? { _1.strip.include?("gem ") }
...and now to atomize it into chained operations (which allows to use a form of any?
that accepts pattern):
if lines.map(&:strip).any?(/gem /)
Now, the smaller branch:
lines.filter_map do |line|
gem_name = line.strip
gem_info = rubygems_client.info(gem_name) unless gem_name.empty?
gem_info
end.compact_blank
I would also try to atomize the chained calls:
lines.map(&:strip).reject(&:empty?).map { rubygems_client.info(_1) }
Now, I think I saw this lines.map(&:strip)
somewhere 🤔
Let's extract it:
lines = list.lines.map(&:strip).reject(&:empty?)
if lines.any?(/gem /)
# ..ignore Prism branch for a moment...
else
lines.map { rubygems_client.info(_1) }
end
Now to the Prism branch!
So far, we have this:
ast = Prism.parse(list)
root_node = ast.value
parsed_gems = []
root_node.statements.body.each do |call_node|
case call_node.name
when :gem
name = gem_name_from_call_node(call_node)
parsed_gems << name
when :group
names = gem_names_from_group(call_node)
parsed_gems.concat(names) if names.present?
end
end
parsed_gems.compact_blank.map { rubygems_client.info(_1) }
One round of "more functional iterators" and "more chains":
Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem
gem_name_from_call_node(call_node)
when :group
gem_names_from_group(call_node)
end
}.compact.map { rubygems_client.info(_1) }
(I tend to "embed every local var that is needed only on the next line" initially, with then, possibly, extracting some non-trivial things to name and debug them.)
Now, one more round of "embed everything":
Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem
call_node.arguments&.arguments&.first&.unescaped
when :group
call_node.block&.body&.body&.map { _1.arguments&.arguments&.first&.unescaped }
end
}.compact.map { rubygems_client.info(_1) }
(I made one less &.
hre where we definitely should have an object. Without testing and looking deeper into Prism docs, it is not clear what else is impossible to be nil
)
Now, the content of "gem name from call node" is actually repeated, so it might be reasonable to have in method... Or, actually, we might notice that it dangles on the end of every case
branch, so we just can put it outside:
Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem then call_node
when :group then call_node.block&.body&.body
end
}.compact
.filter_map { _1.arguments&.arguments&.first&.unescaped }
.map { rubygems_client.info(_1) }
So, here is more or less the whole picture...
def parse
lines = list.lines.map(&:strip).reject(&:empty?)
if lines.any?(/gem /)
Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem then call_node
when :group then call_node.block&.body&.body
end
}.compact
.filter_map { _1.arguments&.arguments&.first&.unescaped }
.map { rubygems_client.info(_1) }
else
lines.map { rubygems_client.info(_1) }
end
end
...but wait, actually every branch ends with the call to rubygems_client
, so...
def parse
lines = list.lines.map(&:strip).reject(&:empty?)
if lines.any?(/gem /)
Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem then call_node
when :group then call_node.block&.body&.body
end
}.compact.filter_map { _1.arguments&.arguments&.first&.unescaped }
else
lines
end.map { rubygems_client.info(_1) }
end
...but wait, now it is obvious that one of the branches does no processing whatsoever, so...
class Parser
def initialize(list, rubygems_client: RubyGems::Client.new)
@list = list
@rubygems_client = rubygems_client
end
private attr_reader :list, :rubygems_client
def parse
gem_names = list.lines.map(&:strip).reject(&:empty?)
if gem_names.any?(/gem /)
# It is actually a Gemfile, parse it with prism!
gem_names = Prism.parse(list).value.statements.body.flat_map { |call_node|
case call_node.name
when :gem then call_node
when :group then call_node.block&.body&.body
end
}.compact.filter_map { _1.arguments&.arguments&.first&.unescaped }
end
gem_names.map { rubygems_client.info(_1) }
end
end
Now, this code can be split back into methods once it matures (though with this amount of code, I'd let it live for some time), but the sequence/structure of these methods would be different from the original. Now we localized the most non-trivial thing (Prism parsing), and split the whole process into two major steps (guess gem names → get gems info) instead of two branches (get names, then infos from text file / get names, then infos from Gemfile).
We can now also change the body of big flat_map
(find all gem "foo"
calls) to something more robust, as long as it returns necessary call nodes, without affecting any of the rest of the code logic.
Hope it was entertaining :) For me, it was!