-
-
Save akost/2304819 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# Recursive file convertion windows-1251 --> utf-8 | |
# Place this file in the root of your site, add execute permission and run | |
# Converts *.php, *.html, *.css, *.js files. | |
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command | |
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f | | |
while read file | |
do | |
echo " $file" | |
mv $file $file.icv | |
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file | |
rm -f $file.icv | |
done |
Great script. Works perfectly, even on cygwin (which I have to use at work). Thanks a lot.
save my day
thank you!
Thanks! Really good script.
Confirmed working on cygwin, many thanks @akost!
That script is bad. since iconv doesn't detect if file is already UTF-8. So it will ruin your files if run on directory with files in mixed encodings. Running iconv more than once is guaranteed to screw your files too.
What you actually should use for this operation is enca, since it will correctly detect input encoding and act accordingly.
After installing enca, just run this one-liner & your files will be UTF-8 in no time:
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f | while read file; do enca -x UTF-8 $file; done;
thanks @anonymous2ch
worked like a charm
Thank you so much!!!!
Thanks!
That script is bad. since iconv doesn't detect if file is already UTF-8.
Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.
@anonymous2ch Помог )
Есть момент, когда имена файлов с пробелами - тогда скрипт не работает. Поправленный вариант скрипта -
find ./ -name ".txt" -o -name ".html" -o -name ".css" -o -name ".js" -type f |
while read file
do
echo " $file"
mv "$file" "$file".icv
iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
rm -f "$file".icv
done
@1nt3g3r, your script won't work. You missed *
in the filename templates. To make it work the first line should look like this:
find ./ -name "*.txt" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
However, your variant works much better then the TS's. It works even with the unprintable characters in the filenames. Thanks!
That script is bad. since iconv doesn't detect if file is already UTF-8.
Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt" -type f |
while read file
do
if ! file -bi $file | grep -q 'utf-8'
then
echo " $file"
mv "$file" "$file".icv
iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
rm -f "$file".icv
fi
done
For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;
For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;
just a quick note that that would require enca installed (brew install enca
) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8
Thanks