Skip to content

Instantly share code, notes, and snippets.

@l3thal
Last active December 24, 2015 17:29
Show Gist options
  • Save l3thal/6835672 to your computer and use it in GitHub Desktop.
Save l3thal/6835672 to your computer and use it in GitHub Desktop.
word count from pptx
#!/usr/bin/ruby
require 'zip'
require 'nokogiri'
#Zip::ZipFile.open("sample.pptx").each{ |entry| puts Nokogiri::XML.parse(zip.find_entry(entry.to_s).get_input_stream).text.split(' ').uniq.length if entry.to_s.match(/ppt\/slides\/slide[0-9]+\.xml$/) }
class Pptx
def self.word_count(file, zip=Zip::ZipFile.open(file), count=nil)
zip.each{ |entry| count = uniq_words_from entry, input_stream(zip, entry) if is_slide?(entry) }
count
end
def self.input_stream(zip, entry)
Nokogiri::XML.parse(zip.find_entry(entry.to_s).get_input_stream).text
end
def self.uniq_words_from(entry, text)
text.split(' ').uniq.length if is_slide? entry
end
def self.is_slide?(entry)
entry.to_s.match(/ppt\/slides\/slide[0-9]+\.xml$/)
end
end
puts Pptx.word_count("sample.pptx")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment