How to include sections from another Markdown file with Ruby (with AI)

Today at dayjob, my architect: "You should update the README as well. You know, I don't like having all the library environment variables copied into the service README. Can you just link to the submodule?"

Me, actually quite liking that all the variables are right there in the service README: "I wish there was an include directive for markdown."

Me, having access to Sonnet API via OpenRouter: "How hard is ad-hoc Markdown parsing anyway?"

"Hey Sonnet, can you..."

Hi! Need your help with a ruby/rake tool. I'm trying to ensure that my README includes environment variable snippets that are consistent with the submodules the service pulls in. Those have their own READMEs documenting their respective environment variables. So my idea is to throw in a very simple ruby markdown parser that just splits the file at sections (lines starting with #) and include directives (lines of the form [//] # (include path#section) and then just ensures that the text following the include directive matches the section in the file included file, outputting a diff and an error if it fails. Can you write that up for me please?

"Absolutely!"

Here's the implementation:

require 'rake'
require 'rake/tasklib'

class ReadmeValidator
  class ValidationError < StandardError; end
  
  def initialize(readme_path = 'README.md')
    @readme_path = readme_path
    @errors = []
  end

  def validate
    content = File.read(@readme_path)
    sections = parse_into_sections(content)
    
    sections.each do |section|
      validate_includes(section)
    end
    
    raise ValidationError, @errors.join("\n") if @errors.any?
  end

  private

  def parse_into_sections(content)
    sections = []
    current_section = { title: nil, content: [] }
    
    content.lines.each do |line|
      if line.start_with?('#')
        # Save previous section if it exists
        sections << current_section if current_section[:content].any?
        current_section = { title: line.strip, content: [] }
      else
        current_section[:content] << line
      end
    end
    
    # Don't forget the last section
    sections << current_section if current_section[:content].any?
    sections
  end

  def validate_includes(section)
    content = section[:content].join
    
    # Find all include directives
    includes = content.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/)
    
    includes.each do |file_path, section_name|
      begin
        included_content = extract_section_from_file(file_path, section_name)
        position = content.index("[//] # (include #{file_path}##{section_name})")
        following_content = content[position..].lines[1..].join # Skip the include line
        
        # Find where the next include starts (if any)
        next_include = following_content.index("[//] # (include")
        following_content = following_content[0...next_include] if next_include
        
        # Compare and record any differences
        if !following_content.strip.start_with?(included_content.strip)
          @errors << generate_diff_message(file_path, section_name, included_content.strip, following_content.strip)
        end
      rescue Errno::ENOENT
        @errors << "Could not find file: #{file_path}"
      end
    end
  end

  def extract_section_from_file(file_path, section_name)
    content = File.read(file_path)
    sections = parse_into_sections(content)
    
    target_section = sections.find { |s| s[:title].gsub('#', '').strip == section_name.strip }
    raise ValidationError, "Section '#{section_name}' not found in #{file_path}" unless target_section
    
    target_section[:content].join
  end

  def generate_diff_message(file_path, section_name, expected, actual)
    <<~ERROR
      Mismatch in include from #{file_path}##{section_name}
      Expected:
      #{expected}
      
      Got:
      #{actual}
      
      Please update the README to match the source content.
    ERROR
  end
end

namespace :readme do
  desc "Validate README includes match their sources"
  task :validate_includes do
    begin
      ReadmeValidator.new.validate
      puts "README includes are up to date!"
    rescue ReadmeValidator::ValidationError => e
      puts "README validation failed:"
      puts e.message
      exit 1
    end
  end
end

Jesus Christ, that's a whopper of code. No way I'm getting that through review on a lark. Hey, Sonnet, "out of interest, just for a joke..."

Nice! Out of interest, how hard can you golf this using all ruby language techniques?

Oh, this is a fun challenge! Here's an aggressively golfed version that maintains the same functionality:

And then it threw this at me.

class ReadmeValidator
  def self.validate(f='README.md')
    c=File.read(f)
    c.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/).map{|p,s|
      [c[c.index("[//] # (include #{p}##{s})")..].lines[1..].take_while{|l|!/\[\/\/\] # \(include/.match?(l)}.join.strip,
       File.read(p).split(/^#/).find{|x|x.strip.start_with?(s)}.to_s.sub(/.*\n/,'').strip]
    }.select{|a,b|a!=b}.tap{|e|
      raise "Mismatches:\n"+e.map{|a,b|"Expected:\n#{b}\n\nGot:\n#{a}"}.join("\n\n") if e.any?
    }
  end
end

task :validate_readme do; ReadmeValidator.validate; rescue=>e; puts e.message; exit 1 end

Jesus Christ a second time! That can't work... can it? No way. At any rate, I can't debug this. However:

Pffh, nice! On the basis of that golfed version, can you expand it back out with somewhat split lines and proper variable names?

Here's the "ungolfed but elegant" version, keeping the smart techniques from the golfed version but making it readable:

class ReadmeValidator
  def self.validate(readme_path = 'README.md')
    content = File.read(readme_path)
    
    content.scan(/\[\/\/\] # \(include (.*?)#(.*?)\)/)
          .map { |file_path, section_name|
      include_pos = content.index("[//] # (include #{file_path}##{section_name})")
      
      actual_content = content[include_pos..]
                        .lines[1..]
                        .take_while { |line| !/\[\/\/\] # \(include/.match?(line) }
                        .join
                        .strip
      
      expected_content = File.read(file_path)
                          .split(/^#/)
                          .find { |section| section.strip.start_with?(section_name) }
                          .to_s
                          .sub(/.*\n/, '')
                          .strip
      
      [actual_content, expected_content]
    }
    .select { |actual, expected| actual != expected }
    .tap { |errors|
      if errors.any?
        error_message = "Mismatches:\n" + 
          errors.map { |actual, expected| 
            "Expected:\n#{expected}\n\nGot:\n#{actual}" 
          }.join("\n\n")
        raise error_message
      end
    }
  end
end

task :validate_readme do
  ReadmeValidator.validate
rescue => error
  puts error.message
  exit 1
end

And yeah, cards on the table. This code doesn't work. But it's like 10% away from code that does work.

AIs would be so much better if they could just run their code to try it out. Keep in mind it's oneshotting this without stopping or pausing to think!

So yeah, finally, here's the actual working code, after I've cleaned it up a bit:

def validate_markdown(path)
  content = File.read(path)

  content.scan(/\[\/\/\]: # \(include (.*?)#(.*?)\)/)
        .map { |target_path, section_name|
    remainder = content.split("[//]: # (include #{target_path}##{section_name})")[1].strip

    section = File.read(target_path)
                        .split(/^##* /)
                        .find { |section| section.strip.start_with?(section_name) }
                        .to_s
                        .strip

    [remainder, section]
  }
  .select { |remainder, section| !remainder.start_with?(section) }
  .tap { |errors|
    if errors.any?
      raise "#{path} failed validation!\n" +
        errors.map { |actual, expected|
          "Expected prefix:\n#{expected}\n\nGot:\n#{actual}"
        }.join("\n\n")
    end
  }
end

task :validate_readme do
  validate_markdown 'README.md'
end

This is obviously hacky as hell and applies approximately zero principle or good design. But it works, and it's small enough I might actually get it through review.

Then I went through the obligatory final step of showing Sonnet the fixed code and having it sycophantically congratulate me on really basic improvements, at which point

Would you like to share this as an open source tool? It seems like it could be useful for others maintaining documentation that needs to stay in sync across repositories.

I mean, I think the code isn't the important thing here! This is trivial. Anyone with fairly basic Ruby knowledge could shit this out in about twenty minutes, and anyone with AI prompting knowledge in five. But then it occurs to me that these are very much not universal skills. The important part isn't the code. The important part is that if you know what questions to ask, you don't need deep Ruby knowledge. You just tell the AI "use your deep Ruby knowledge" and then do a very minimal amount of fixing-up. If I hadn't spoken Ruby, the only change would have been that I'd have to ask for tests, and then feed the test output into Sonnet until it fixed the very minor issues on its own.

The point isn't markdown include parsing. The point isn't even AI prompting. The point is that the effort ceiling on trivial to simple tasks that can be executed by pasting text from a website is permanently lowered. AI tools as they are aren't going to save or destroy the world, but they are at least a level of improvement on StackOverflow as StackOverflow on Google, or as Google on a system manual. The payoff is that small inconveniences become trivial, and it's hard to overstate how much any reduction in the cost of inconveniences compounds over time in terms of personal agency.

Now excuse me, I have to open a PR.

FeepingCreature/sonnet_is_fun.md