Skip to content

Instantly share code, notes, and snippets.

@azakordonets
Created November 2, 2015 13:18
Show Gist options
  • Select an option

  • Save azakordonets/b3a7789e06b557b9b9c3 to your computer and use it in GitHub Desktop.

Select an option

Save azakordonets/b3a7789e06b557b9b9c3 to your computer and use it in GitHub Desktop.
This script allows you to count number of symbols in batch of docx files excluding white spaces and new line symbols. Just place this file into the folder with docx file and run in terminal 'sh count_symbols_in_docx_file.sh'
#!/bin/bash
for file in *.docx;do
number=$(unzip -p $file word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'|tr -d ' '|tr -d '\n' |tr -d '\t' | wc -m | tr -d ' ' | tr -d '\t')
echo "File -> $file , symbols count: $number"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment