Last active
September 3, 2015 12:59
-
-
Save jwliechty/575e1ea073f6198c76d5 to your computer and use it in GitHub Desktop.
This script will modify a markdown file in order to trim paragraph and list items to 80 characters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def usage | |
| "#{$0} FILE" | |
| end | |
| file = ARGV.first | |
| $stderr.puts usage unless (file && File.file?(file)) | |
| class MarkdownTruncate | |
| def initialize | |
| @output = [] | |
| @section_truncate = SectionTruncate.new | |
| end | |
| def truncate(text) | |
| @output = [] | |
| text.split(/\n/).each do |line| | |
| if is_empty?(line) | |
| close_section_if_started | |
| @output << '' | |
| elsif header_or_table?(line) || link_reference?(line) || block_quote?(line) | |
| close_section_if_started | |
| @output << line.rstrip | |
| elsif indented?(line) && !@section_truncate.started? | |
| @output << line.rstrip | |
| elsif list_item(line) | |
| close_section_if_started | |
| @section_truncate.add(line.rstrip, item_indent(line.rstrip)) | |
| else | |
| @section_truncate.add(line.rstrip) | |
| end | |
| end | |
| close_section_if_started | |
| @output.join("\n") | |
| end | |
| private | |
| def is_empty?(line) | |
| line.strip.empty? | |
| end | |
| def link_reference?(line) | |
| line.lstrip =~ /^ {0,3}\[[^\]]+\]:/ | |
| end | |
| def block_quote?(line) | |
| if line =~ /^```/ | |
| @in_block_quote = !@in_block_quote | |
| !@in_block_quote | |
| else | |
| @in_block_quote | |
| end | |
| end | |
| def indented?(line) | |
| line =~ /^\s/ | |
| end | |
| def header_or_table?(line) | |
| line =~ /^(#|===|---|\|)/ | |
| end | |
| def list_item(line) | |
| line =~ /^(?:(?:\*[^*])|(?:\d+\.)|(?:-[^-]))/ | |
| end | |
| def close_section_if_started | |
| @output.concat(@section_truncate.truncate) if @section_truncate.started? | |
| end | |
| def item_indent(line) | |
| space_count = line.match(/^(?:\s*(?:\*|\d+\.|-)\s*)/).end(0) | |
| ' ' * space_count | |
| end | |
| class SectionTruncate | |
| def initialize | |
| reset | |
| end | |
| def add(line, indent=nil) | |
| @section << line | |
| @indent = indent if indent | |
| end | |
| def started? | |
| !@section.empty? | |
| end | |
| def truncate | |
| truncated = @section.join("\n").split(/\s+/).inject(['']) do |memo, word| | |
| if memo.last.size + word.size <= 79 | |
| current = memo.last.empty? ? '' : "#{memo.last} " | |
| memo[memo.size - 1] = "#{current}#{word}" | |
| else | |
| memo << "#{@indent}#{word}" | |
| end | |
| memo | |
| end | |
| reset | |
| truncated | |
| end | |
| private | |
| def reset | |
| @section = [] | |
| @indent = '' | |
| end | |
| end | |
| end | |
| if file && File.extname(file) == '.md' # format only *.md files (as well as DO NOT run if running rspec) | |
| content = IO.read(file) | |
| new_content = MarkdownTruncate.new.truncate(content) | |
| IO.write(file, new_content) | |
| end | |
| # require 'rspec' | |
| # | |
| # describe MarkdownTruncate do | |
| # describe '#truncate' do | |
| # context 'for a one line paragraph' do | |
| # it 'truncates to 80 character lines' do | |
| # expect(MarkdownTruncate.new.truncate(input)).to eql expected | |
| # end | |
| # end | |
| # context 'when empty space ' | |
| # end | |
| # | |
| # def input | |
| # <<-END | |
| # # Some Header | |
| # | |
| # This is a test line. It can be quite long. It starts pretty simple, then gets more and more | |
| # complex. Pretty soon nobody knows what is happening.' | |
| # | |
| # Now here | |
| # is another line | |
| # that is fowl. What | |
| # can you do to help | |
| # me look more beautiful? See someone edited me on github directly and so I don't have any great formatting whatsoever... | |
| # | |
| # What happens to extra | |
| # whitespace? | |
| # | |
| # Now I have a list for you | |
| # | |
| # * one - if there is a long item like this, what are you going to do to format me. Are you going to leave me along, or wrap? | |
| # This is important because if at | |
| # all possible, I want consistency. | |
| # * two - if you do decide to wrap, that would be awesome. Help me from being gross please. | |
| # * three | |
| # | |
| # 1. How about a numbered list? | |
| # 2. Will you also wrap this numbered list? I really want to know. It is important we consider when people have these lists. | |
| # 34. What is your answer? | |
| # | |
| # Can you handle code? Because code should not | |
| # be wrapped under any circumstances. Newlines matter in code. So if there is a long line marked as code, TOUGH!! | |
| # | |
| # - this is | |
| # - and unordered | |
| # - list with hyphens | |
| # | |
| # Look This is some header | |
| # === | |
| # | |
| # ```ruby | |
| # def boo_yeah | |
| # puts 'Look! a block quote!!' | |
| # puts 'That runs two lines. And what happens if the current line is loooooooooooong? Nothing! That\'s what!' | |
| # end | |
| # ```` | |
| # | |
| # Another Header | |
| # ------------------------------------------------------------------------------------------------------------------ | |
| # | |
| # | O | N | P | Expected | Std Dev | Unit | | |
| # |-----|-----|-----|---------:|--------:|------| | |
| # | 2 | 4 | 8 | 4.3 | 1 | Day and oh by the way this line is super long but should be left alone | | |
| # | |
| # **Starting with asterisks** should not be interpreted the same as a list item. | |
| # It should handle really long lines. | |
| # | |
| # In addition, if there is a paragraph that goes on and on and just **so happens** | |
| # to have asterisks begin a line, those asterisks should not be considered a list. | |
| # | |
| # **Another problem** - for some reason, having a line start with bold and then | |
| # is already truncated properly, fails. | |
| # | |
| # * For some reason, there is this bug that when there is a list item that has | |
| # already been formatted, the output switches this line for the line above. | |
| # | |
| # [import_calibration]: images/import_calibration.png "Leave Link References Alone!!" | |
| # END | |
| # end | |
| # | |
| # def expected | |
| # out = <<-END | |
| # # Some Header | |
| # | |
| # This is a test line. It can be quite long. It starts pretty simple, then gets | |
| # more and more complex. Pretty soon nobody knows what is happening.' | |
| # | |
| # Now here is another line that is fowl. What can you do to help me look more | |
| # beautiful? See someone edited me on github directly and so I don't have any | |
| # great formatting whatsoever... | |
| # | |
| # What happens to extra whitespace? | |
| # | |
| # Now I have a list for you | |
| # | |
| # * one - if there is a long item like this, what are you going to do to format | |
| # me. Are you going to leave me along, or wrap? This is important because if at | |
| # all possible, I want consistency. | |
| # * two - if you do decide to wrap, that would be awesome. Help me from being | |
| # gross please. | |
| # * three | |
| # | |
| # 1. How about a numbered list? | |
| # 2. Will you also wrap this numbered list? I really want to know. It is important | |
| # we consider when people have these lists. | |
| # 34. What is your answer? | |
| # | |
| # Can you handle code? Because code should not | |
| # be wrapped under any circumstances. Newlines matter in code. So if there is a long line marked as code, TOUGH!! | |
| # | |
| # - this is | |
| # - and unordered | |
| # - list with hyphens | |
| # | |
| # Look This is some header | |
| # === | |
| # | |
| # ```ruby | |
| # def boo_yeah | |
| # puts 'Look! a block quote!!' | |
| # puts 'That runs two lines. And what happens if the current line is loooooooooooong? Nothing! That\'s what!' | |
| # end | |
| # ```` | |
| # | |
| # Another Header | |
| # ------------------------------------------------------------------------------------------------------------------ | |
| # | |
| # | O | N | P | Expected | Std Dev | Unit | | |
| # |-----|-----|-----|---------:|--------:|------| | |
| # | 2 | 4 | 8 | 4.3 | 1 | Day and oh by the way this line is super long but should be left alone | | |
| # | |
| # **Starting with asterisks** should not be interpreted the same as a list item. | |
| # It should handle really long lines. | |
| # | |
| # In addition, if there is a paragraph that goes on and on and just **so happens** | |
| # to have asterisks begin a line, those asterisks should not be considered a list. | |
| # | |
| # **Another problem** - for some reason, having a line start with bold and then is | |
| # already truncated properly, fails. | |
| # | |
| # * For some reason, there is this bug that when there is a list item that has | |
| # already been formatted, the output switches this line for the line above. | |
| # | |
| # [import_calibration]: images/import_calibration.png "Leave Link References Alone!!" | |
| # END | |
| # out.rstrip | |
| # end | |
| # end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment