Skip to content

Instantly share code, notes, and snippets.

@elhoyos
Last active December 30, 2015 19:39
Show Gist options
  • Select an option

  • Save elhoyos/7875911 to your computer and use it in GitHub Desktop.

Select an option

Save elhoyos/7875911 to your computer and use it in GitHub Desktop.
Nokogiri adds \n between tags when using to_html and to_xml

Nokogiri adds \n between tags when using to_html and to_xml

Credits: These were identified and collected by @amirisnino

to_html

Example 1:

require 'nokogiri'
string = 'a<div>b<br><br></div>'
html_doc =  Nokogiri::HTML::DocumentFragment.parse(string)
html_doc.to_html
=> "a<div>b<br><br>\n</div>"

Example 2:

require 'nokogiri'
string2 = '<div>4. <i>some question</i></div><div><br></div>'
html_doc =  Nokogiri::HTML::DocumentFragment.parse(string2)
html_doc.to_html
=> "<div>4. <i>some question</i>\n</div><div><br></div>"

Workaround

html_doc.children.map {|e| e.serialize(:save_with => 0)}.join

Visible also in rgrove/sanitize#71

to_xml

Example 1:

require 'nokogiri'
string2 = "<div>4. <i>some question</i></div><div><br></div>"
xml =  Nokogiri::XML::DocumentFragment.parse(string2)
xml.to_xml
=> "<div>4. <i>some question</i></div><div>\n  <br/>\n</div>"

xml.children.to_s
=> "<div>4. <i>some question</i></div><div>\n  <br/>\n</div>"

Example 2:

require 'nokogiri'
xml =  Nokogiri::XML::Document.parse(string2)
xml.to_xml
=> "<?xml version=\"1.0\"?>\n<div>4. <i>some question</i></div>\n"
xml.children.to_s
=>"<div>4. <i>some question</i></div>"

Workaround

WARNING: THIS IS REMOVES ORIGINAL DATA TAGS

Parse using XML::Document and xml.children.to_s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment