Skip to content

Instantly share code, notes, and snippets.

@ihower
Created April 25, 2011 02:39
Show Gist options
  • Save ihower/940088 to your computer and use it in GitHub Desktop.
Save ihower/940088 to your computer and use it in GitHub Desktop.
parsing XML chunk by chunk
# <xml version="1.0" encoding="UTF-8">
# <DataFeeds>
# <Item>
# ....
# </Item>
# <Item>
# ...
# </Item>
# </DataFeeds>
File.open("really_big_file.xml") do |file|
item_i = 0
while line = file.gets
next if line =~ /xml version="1.0" encoding="UTF-8"/
next if line =~ /<DataFeeds>/
# START TAG
if line =~ /<Item>/
item_i += 1
chunk = ""
end
chunk << line
# END TAG
if line =~ /<\/Item>/
doc = Nokogiri::HTML(chunk)
# ....
end
end # while end
end # file end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment