Skip to content

Instantly share code, notes, and snippets.

@jwliechty
Last active September 3, 2015 12:59
Show Gist options
  • Select an option

  • Save jwliechty/575e1ea073f6198c76d5 to your computer and use it in GitHub Desktop.

Select an option

Save jwliechty/575e1ea073f6198c76d5 to your computer and use it in GitHub Desktop.
This script will modify a markdown file in order to trim paragraph and list items to 80 characters.
def usage
"#{$0} FILE"
end
file = ARGV.first
$stderr.puts usage unless (file && File.file?(file))
class MarkdownTruncate
def initialize
@output = []
@section_truncate = SectionTruncate.new
end
def truncate(text)
@output = []
text.split(/\n/).each do |line|
if is_empty?(line)
close_section_if_started
@output << ''
elsif header_or_table?(line) || link_reference?(line) || block_quote?(line)
close_section_if_started
@output << line.rstrip
elsif indented?(line) && !@section_truncate.started?
@output << line.rstrip
elsif list_item(line)
close_section_if_started
@section_truncate.add(line.rstrip, item_indent(line.rstrip))
else
@section_truncate.add(line.rstrip)
end
end
close_section_if_started
@output.join("\n")
end
private
def is_empty?(line)
line.strip.empty?
end
def link_reference?(line)
line.lstrip =~ /^ {0,3}\[[^\]]+\]:/
end
def block_quote?(line)
if line =~ /^```/
@in_block_quote = !@in_block_quote
!@in_block_quote
else
@in_block_quote
end
end
def indented?(line)
line =~ /^\s/
end
def header_or_table?(line)
line =~ /^(#|===|---|\|)/
end
def list_item(line)
line =~ /^(?:(?:\*[^*])|(?:\d+\.)|(?:-[^-]))/
end
def close_section_if_started
@output.concat(@section_truncate.truncate) if @section_truncate.started?
end
def item_indent(line)
space_count = line.match(/^(?:\s*(?:\*|\d+\.|-)\s*)/).end(0)
' ' * space_count
end
class SectionTruncate
def initialize
reset
end
def add(line, indent=nil)
@section << line
@indent = indent if indent
end
def started?
!@section.empty?
end
def truncate
truncated = @section.join("\n").split(/\s+/).inject(['']) do |memo, word|
if memo.last.size + word.size <= 79
current = memo.last.empty? ? '' : "#{memo.last} "
memo[memo.size - 1] = "#{current}#{word}"
else
memo << "#{@indent}#{word}"
end
memo
end
reset
truncated
end
private
def reset
@section = []
@indent = ''
end
end
end
if file && File.extname(file) == '.md' # format only *.md files (as well as DO NOT run if running rspec)
content = IO.read(file)
new_content = MarkdownTruncate.new.truncate(content)
IO.write(file, new_content)
end
# require 'rspec'
#
# describe MarkdownTruncate do
# describe '#truncate' do
# context 'for a one line paragraph' do
# it 'truncates to 80 character lines' do
# expect(MarkdownTruncate.new.truncate(input)).to eql expected
# end
# end
# context 'when empty space '
# end
#
# def input
# <<-END
# # Some Header
#
# This is a test line. It can be quite long. It starts pretty simple, then gets more and more
# complex. Pretty soon nobody knows what is happening.'
#
# Now here
# is another line
# that is fowl. What
# can you do to help
# me look more beautiful? See someone edited me on github directly and so I don't have any great formatting whatsoever...
#
# What happens to extra
# whitespace?
#
# Now I have a list for you
#
# * one - if there is a long item like this, what are you going to do to format me. Are you going to leave me along, or wrap?
# This is important because if at
# all possible, I want consistency.
# * two - if you do decide to wrap, that would be awesome. Help me from being gross please.
# * three
#
# 1. How about a numbered list?
# 2. Will you also wrap this numbered list? I really want to know. It is important we consider when people have these lists.
# 34. What is your answer?
#
# Can you handle code? Because code should not
# be wrapped under any circumstances. Newlines matter in code. So if there is a long line marked as code, TOUGH!!
#
# - this is
# - and unordered
# - list with hyphens
#
# Look This is some header
# ===
#
# ```ruby
# def boo_yeah
# puts 'Look! a block quote!!'
# puts 'That runs two lines. And what happens if the current line is loooooooooooong? Nothing! That\'s what!'
# end
# ````
#
# Another Header
# ------------------------------------------------------------------------------------------------------------------
#
# | O | N | P | Expected | Std Dev | Unit |
# |-----|-----|-----|---------:|--------:|------|
# | 2 | 4 | 8 | 4.3 | 1 | Day and oh by the way this line is super long but should be left alone |
#
# **Starting with asterisks** should not be interpreted the same as a list item.
# It should handle really long lines.
#
# In addition, if there is a paragraph that goes on and on and just **so happens**
# to have asterisks begin a line, those asterisks should not be considered a list.
#
# **Another problem** - for some reason, having a line start with bold and then
# is already truncated properly, fails.
#
# * For some reason, there is this bug that when there is a list item that has
# already been formatted, the output switches this line for the line above.
#
# [import_calibration]: images/import_calibration.png "Leave Link References Alone!!"
# END
# end
#
# def expected
# out = <<-END
# # Some Header
#
# This is a test line. It can be quite long. It starts pretty simple, then gets
# more and more complex. Pretty soon nobody knows what is happening.'
#
# Now here is another line that is fowl. What can you do to help me look more
# beautiful? See someone edited me on github directly and so I don't have any
# great formatting whatsoever...
#
# What happens to extra whitespace?
#
# Now I have a list for you
#
# * one - if there is a long item like this, what are you going to do to format
# me. Are you going to leave me along, or wrap? This is important because if at
# all possible, I want consistency.
# * two - if you do decide to wrap, that would be awesome. Help me from being
# gross please.
# * three
#
# 1. How about a numbered list?
# 2. Will you also wrap this numbered list? I really want to know. It is important
# we consider when people have these lists.
# 34. What is your answer?
#
# Can you handle code? Because code should not
# be wrapped under any circumstances. Newlines matter in code. So if there is a long line marked as code, TOUGH!!
#
# - this is
# - and unordered
# - list with hyphens
#
# Look This is some header
# ===
#
# ```ruby
# def boo_yeah
# puts 'Look! a block quote!!'
# puts 'That runs two lines. And what happens if the current line is loooooooooooong? Nothing! That\'s what!'
# end
# ````
#
# Another Header
# ------------------------------------------------------------------------------------------------------------------
#
# | O | N | P | Expected | Std Dev | Unit |
# |-----|-----|-----|---------:|--------:|------|
# | 2 | 4 | 8 | 4.3 | 1 | Day and oh by the way this line is super long but should be left alone |
#
# **Starting with asterisks** should not be interpreted the same as a list item.
# It should handle really long lines.
#
# In addition, if there is a paragraph that goes on and on and just **so happens**
# to have asterisks begin a line, those asterisks should not be considered a list.
#
# **Another problem** - for some reason, having a line start with bold and then is
# already truncated properly, fails.
#
# * For some reason, there is this bug that when there is a list item that has
# already been formatted, the output switches this line for the line above.
#
# [import_calibration]: images/import_calibration.png "Leave Link References Alone!!"
# END
# out.rstrip
# end
# end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment