Skip to content

Instantly share code, notes, and snippets.

@akost
Created April 4, 2012 19:06
Show Gist options
  • Save akost/2304819 to your computer and use it in GitHub Desktop.
Save akost/2304819 to your computer and use it in GitHub Desktop.
Bash script for recursive file convertion windows-1251 --> utf-8
#!/bin/bash
# Recursive file convertion windows-1251 --> utf-8
# Place this file in the root of your site, add execute permission and run
# Converts *.php, *.html, *.css, *.js files.
# To add file type by extension, e.g. *.cgi, add '-o -name "*.cgi"' to the find command
find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |
while read file
do
echo " $file"
mv $file $file.icv
iconv -f WINDOWS-1251 -t UTF-8 $file.icv > $file
rm -f $file.icv
done
@aysenz
Copy link

aysenz commented Aug 29, 2017

Thanks!

@shuravban
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

@mitya12342
Copy link

@anonymous2ch Помог )

@1nt3g3r
Copy link

1nt3g3r commented Jan 20, 2018

Есть момент, когда имена файлов с пробелами - тогда скрипт не работает. Поправленный вариант скрипта -

find ./ -name ".txt" -o -name ".html" -o -name ".css" -o -name ".js" -type f |
while read file
do
echo " $file"
mv "$file" "$file".icv
iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
rm -f "$file".icv
done

@pasha-pivo
Copy link

@1nt3g3r, your script won't work. You missed * in the filename templates. To make it work the first line should look like this:

find ./ -name "*.txt" -o -name "*.html" -o -name "*.css" -o -name "*.js" -type f |

However, your variant works much better then the TS's. It works even with the unprintable characters in the filenames. Thanks!

@gevmarlen
Copy link

That script is bad. since iconv doesn't detect if file is already UTF-8.

Yes. I too often see something like
Какое унижение для противника!
It's utf8 text converted to utf8 text assuming it was cp1251.

find ./ -name "*.php" -o -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.txt"  -type f |
while read file
do
  if ! file -bi $file | grep -q 'utf-8'
  then 
    echo " $file"
    mv "$file" "$file".icv
    iconv -f WINDOWS-1251 -t UTF-8 "$file".icv > "$file"
    rm -f "$file".icv
  fi
done

@catmater
Copy link

catmater commented Oct 9, 2020

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

@definiteIymaybe
Copy link

For many Russian filenames with spaces and etc, and autodetect for codepage, (macos) best for me:
find ./ -name "*.sql" -type f | while read file; do enca -L russian -x UTF-8 "$file"; done;

just a quick note that that would require enca installed (brew install enca) and might fail if, say, a CP-1251 file was incorrectly saved as UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment