Skip to content

Instantly share code, notes, and snippets.

@ap-Codkelden
Last active June 9, 2020 09:22
Show Gist options
  • Save ap-Codkelden/af87230b5fb4898156eebabdf064102b to your computer and use it in GitHub Desktop.
Save ap-Codkelden/af87230b5fb4898156eebabdf064102b to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
# test
# head -c 10M 17.1-EX_XML_EDR_UO_FULL_29.05.2020.xml > test.xml
# XML_FILES=test.xml
XML_FILES=17*.xml
for f in $XML_FILES
do
echo $f
split -d -a 3 -b 100M $f _s
SPLITTED_FILES=_s*
rm $f
for sf in $SPLITTED_FILES
do
echo $sf
sed -i 's/\r//g' $sf
# cause error
# recode CP1251..UTF-8 $sf
iconv -f cp1251 -t utf8 "$sf">"$sf.new" && mv -f "$sf.new" "$sf"
# if chunk is the first
if [[ $sf =~ ^_s000$ ]]; then
sed -i 's/="windows-1251"/="utf-8"/g' $sf
fi
done
cat $SPLITTED_FILES > 'p_'$f
rm $SPLITTED_FILES
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment