Skip to content

Instantly share code, notes, and snippets.

View dkln's full-sized avatar
🤩

Diederick Lawson dkln

🤩
View GitHub Profile
@dkln
dkln / gist:72716
Created March 2, 2009 11:43
Extract text from Microsoft Word file (ment for search engine indexation)
def parse_word(file)
buffer = ""
File.open(file, 'rb').each_line { |x| buffer = buffer + x + " " if x.include?(0.chr) }
return buffer.gsub!(/[^a-zA-Z0-9\s\,\.\-@\/\_]/, '').sub!(/[,\.\-\\\/@\_]/, ' ').split(' ')
end