Skip to content

Instantly share code, notes, and snippets.

@stemar
Last active September 15, 2019 04:34
Show Gist options
  • Select an option

  • Save stemar/55e303f38ff35af29d15 to your computer and use it in GitHub Desktop.

Select an option

Save stemar/55e303f38ff35af29d15 to your computer and use it in GitHub Desktop.
Ruby helpers to filter out unwanted HTML class names or style properties, including example to filter out Microsoft Office classes and styles
class String
def filter_html_attribute(attribute_key, args={})
self.gsub!(/#{attribute_key}="([^"]*)"/) do |attribute|
return "" if attribute.nil?
attribute_value = $1.split(args[:split]).delete_if {|i| i.match(args[:match]) }.join(args[:join])
attribute_value += args[:append].to_s unless attribute_value.empty?
attribute.replace("#{attribute_key}=\"#{attribute_value}\"")
end
self.gsub!(/ (#{attribute_key})=""/, "")
end
def filter_html_class(args={})
args = {split: /\s+/, join: " "}.merge(args)
self.filter_html_attribute("class", args)
end
def filter_html_style(args={})
args = {split: /\s*;\s*/, join: "; ", append: ";"}.merge(args)
self.filter_html_attribute("style", args)
end
end
html = <<HTML
<p class="MsoNormal">Lorem ipsum dolor sit amet</p>
<p class="first MsoNormal last">Lorem ipsum dolor sit amet</p>
<p class="first MsoNormal">Lorem ipsum dolor sit amet</p>
<p class="3DMsoNormal last"><span style="mso-border-insideh:none;mso-border-insidev:none;">Lorem ipsum dolor sit amet</span></p>
<p><span style="color:red; mso-border-insideh:none;mso-border-insidev:none; background: white; ">Lorem ipsum dolor sit amet</span></p>
HTML
puts html
class String
# Filter out Microsoft Office classes and styles
def filter_out_mso
self.filter_html_class(match: /(M|m)so\S+/)
self.filter_html_style(match: /^(M|m)so/)
end
end
puts html.filter_out_mso
# Result:
# <p>Lorem ipsum dolor sit amet</p>
# <p class="first last">Lorem ipsum dolor sit amet</p>
# <p class="first">Lorem ipsum dolor sit amet</p>
# <p class="last"><span>Lorem ipsum dolor sit amet</span></p>
# <p><span style="color:red; background: white;">Lorem ipsum dolor sit amet</span></p>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment