Skip to content

Instantly share code, notes, and snippets.

@taiar
Last active May 14, 2019 16:57
Show Gist options
  • Save taiar/541ef16037972971197682cf316d8e22 to your computer and use it in GitHub Desktop.
Save taiar/541ef16037972971197682cf316d8e22 to your computer and use it in GitHub Desktop.
Ruby script to find "strange" characters on Excel XLSX documents.
# Installation: requires loofah gem
#
# Usage: ruby find.rb name_of_the.xlsx
require 'loofah'
require 'fileutils'
TMP_DIR = 'tmp'
ASCII_TRESHOLD = 255
FILE_EXT = '.xml'
Dir.mkdir TMP_DIR unless File.directory? TMP_DIR
content = ''
file = ARGV[0]
FileUtils.cp(file, TMP_DIR)
`unzip #{TMP_DIR}/#{file} -d #{TMP_DIR}`
Dir.glob("#{TMP_DIR}/**/**").each do |file|
content += Loofah.fragment(File.read(file)).scrub!(:strip) if file.split(//).last(FILE_EXT.length).join == FILE_EXT
end
FileUtils.rm_rf TMP_DIR
content.split('').sort.each { |c| puts "'#{c}' -> #{c.ord}" if c.ord > ASCII_TRESHOLD }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment