Today at dayjob, my architect: "You should update the README as well. You know, I don't like having all the library environment variables copied into the service README. Can you just link to the submodule?"
Me, actually quite liking that all the variables are right there in the service README: "I wish there was an include directive for markdown."
Me, having access to Sonnet API via OpenRouter: "How hard is ad-hoc Markdown parsing anyway?"
"Hey Sonnet, can you..."
Hi! Need your help with a ruby/rake tool. I'm trying to ensure that my README includes environment variable snippets that are consistent with the submodules the service pulls in. Those have their own READMEs documenting their respective environment variables. So my idea is to throw in a very simple ruby markdown parser that just splits the file at sections (lines starting with #) and include directives (lines of the form
[//] # (include path#section)
and then just ensures that the text following the include directive matches the section in the file included file, outputting a diff and an error if it fails. Can you write that up for me please?
"Absolutely!"
Here's the implementation:
require 'rake'
require 'rake/tasklib'
class ReadmeValidator
class ValidationError < StandardError; end
def initialize(readme_path = 'README.md')
@readme_path = readme_path
@errors = []
end
def validate
content = File.read(@readme_path)
sections = parse_into_sections(content)
sections.each do |section|
validate_includes(section)
end
raise ValidationError, @errors.join("\n") if @errors.any?
end
private
def parse_into_sections(content)
sections = []
current_section = { title: nil, content: [] }
content.lines.each do |line|
if line.start_with?('#')
# Save previous section if it exists
sections << current_section if current_section[:content].any?
current_section = { title: line.strip, content: [] }
else
current_section[:content] << line
end
end
# Don't forget the last section
sections << current_section if current_section[:content].any?
sections
end
def validate_includes(section)
content = section[:content].join
# Find all include directives
includes = content.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/)
includes.each do |file_path, section_name|
begin
included_content = extract_section_from_file(file_path, section_name)
position = content.index("[//] # (include #{file_path}##{section_name})")
following_content = content[position..].lines[1..].join # Skip the include line
# Find where the next include starts (if any)
next_include = following_content.index("[//] # (include")
following_content = following_content[0...next_include] if next_include
# Compare and record any differences
if !following_content.strip.start_with?(included_content.strip)
@errors << generate_diff_message(file_path, section_name, included_content.strip, following_content.strip)
end
rescue Errno::ENOENT
@errors << "Could not find file: #{file_path}"
end
end
end
def extract_section_from_file(file_path, section_name)
content = File.read(file_path)
sections = parse_into_sections(content)
target_section = sections.find { |s| s[:title].gsub('#', '').strip == section_name.strip }
raise ValidationError, "Section '#{section_name}' not found in #{file_path}" unless target_section
target_section[:content].join
end
def generate_diff_message(file_path, section_name, expected, actual)
<<~ERROR
Mismatch in include from #{file_path}##{section_name}
Expected:
#{expected}
Got:
#{actual}
Please update the README to match the source content.
ERROR
end
end
namespace :readme do
desc "Validate README includes match their sources"
task :validate_includes do
begin
ReadmeValidator.new.validate
puts "README includes are up to date!"
rescue ReadmeValidator::ValidationError => e
puts "README validation failed:"
puts e.message
exit 1
end
end
end
Jesus Christ, that's a whopper of code. No way I'm getting that through review on a lark. Hey, Sonnet, "out of interest, just for a joke..."
Nice! Out of interest, how hard can you golf this using all ruby language techniques?
Oh, this is a fun challenge! Here's an aggressively golfed version that maintains the same functionality:
And then it threw this at me.
class ReadmeValidator
def self.validate(f='README.md')
c=File.read(f)
c.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/).map{|p,s|
[c[c.index("[//] # (include #{p}##{s})")..].lines[1..].take_while{|l|!/\[\/\/\] # \(include/.match?(l)}.join.strip,
File.read(p).split(/^#/).find{|x|x.strip.start_with?(s)}.to_s.sub(/.*\n/,'').strip]
}.select{|a,b|a!=b}.tap{|e|
raise "Mismatches:\n"+e.map{|a,b|"Expected:\n#{b}\n\nGot:\n#{a}"}.join("\n\n") if e.any?
}
end
end
task :validate_readme do; ReadmeValidator.validate; rescue=>e; puts e.message; exit 1 end
Jesus Christ a second time! That can't work... can it? No way. At any rate, I can't debug this. However:
Pffh, nice! On the basis of that golfed version, can you expand it back out with somewhat split lines and proper variable names?
Here's the "ungolfed but elegant" version, keeping the smart techniques from the golfed version but making it readable:
class ReadmeValidator
def self.validate(readme_path = 'README.md')
content = File.read(readme_path)
content.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/)
.map { |file_path, section_name|
include_pos = content.index("[//] # (include #{file_path}##{section_name})")
actual_content = content[include_pos..]
.lines[1..]
.take_while { |line| !/\[\/\/\] # \(include/.match?(line) }
.join
.strip
expected_content = File.read(file_path)
.split(/^#/)
.find { |section| section.strip.start_with?(section_name) }
.to_s
.sub(/.*\n/, '')
.strip
[actual_content, expected_content]
}
.select { |actual, expected| actual != expected }
.tap { |errors|
if errors.any?
error_message = "Mismatches:\n" +
errors.map { |actual, expected|
"Expected:\n#{expected}\n\nGot:\n#{actual}"
}.join("\n\n")
raise error_message
end
}
end
end
task :validate_readme do
ReadmeValidator.validate
rescue => error
puts error.message
exit 1
end
And yeah, cards on the table. This code doesn't work. But it's like 10% away from code that does work.
AIs would be so much better if they could just run their code to try it out. Keep in mind it's oneshotting this without stopping or pausing to think!
So yeah, finally, here's the actual working code, after I've cleaned it up a bit:
def validate_markdown(path)
content = File.read(path)
content.scan(/\[\/\/\]: # \(include (.*?)#(.*?)\)/)
.map { |target_path, section_name|
remainder = content.split("[//]: # (include #{target_path}##{section_name})")[1].strip
section = File.read(target_path)
.split(/^##* /)
.find { |section| section.strip.start_with?(section_name) }
.to_s
.strip
[remainder, section]
}
.select { |remainder, section| !remainder.start_with?(section) }
.tap { |errors|
if errors.any?
raise "#{path} failed validation!\n" +
errors.map { |actual, expected|
"Expected prefix:\n#{expected}\n\nGot:\n#{actual}"
}.join("\n\n")
end
}
end
task :validate_readme do
validate_markdown 'README.md'
end
This is obviously hacky as hell and applies approximately zero principle or good design. But it works, and it's small enough I might actually get it through review.
Then I went through the obligatory final step of showing Sonnet the fixed code and having it sycophantically congratulate me on really basic improvements, at which point
Would you like to share this as an open source tool? It seems like it could be useful for others maintaining documentation that needs to stay in sync across repositories.
I mean, I think the code isn't the important thing here! This is trivial. Anyone with fairly basic Ruby knowledge could shit this out in about twenty minutes, and anyone with AI prompting knowledge in five. But then it occurs to me that these are very much not universal skills. The important part isn't the code. The important part is that if you know what questions to ask, you don't need deep Ruby knowledge. You just tell the AI "use your deep Ruby knowledge" and then do a very minimal amount of fixing-up. If I hadn't spoken Ruby, the only change would have been that I'd have to ask for tests, and then feed the test output into Sonnet until it fixed the very minor issues on its own.
The point isn't markdown include parsing. The point isn't even AI prompting. The point is that the effort ceiling on trivial to simple tasks that can be executed by pasting text from a website is permanently lowered. AI tools as they are aren't going to save or destroy the world, but they are at least a level of improvement on StackOverflow as StackOverflow on Google, or as Google on a system manual. The payoff is that small inconveniences become trivial, and it's hard to overstate how much any reduction in the cost of inconveniences compounds over time in terms of personal agency.
Now excuse me, I have to open a PR.