alegomes · May 27, 2012 00:28 · sumonst21 · Jan 1, 2021
diff --git a/iconv b/iconv
 I had a dataset but it was not UTF-8. So, I had to find out which charset was being used. 'file' command didn't helped me out.

 $ file file_name.csv 
 file_name.csv: Non-ISO extended-ASCII C++ program text, with very long lines, with CRLF line terminators 

 So, I made this bash script to figure out its encoding:

 First, I converted the file to every single format available by 'iconv':

 $ for f in $(iconv -l); do echo "Convertendo $f ..."; iconv -f $f -t UTF-8 < file_name.csv > fil_name.$f.csv; done

 The, I searched for the file name containing some known word:

 $ IFS=$(echo -en "\n\b") ; for i in $(grep -Hi "são\ paulo" *); do echo $i | awk '{print $1}'; done
	I had a dataset but it was not UTF-8. So, I had to find out which charset was being used. 'file' command didn't helped me out.

	$ file file_name.csv
	file_name.csv: Non-ISO extended-ASCII C++ program text, with very long lines, with CRLF line terminators

	So, I made this bash script to figure out its encoding:

	First, I converted the file to every single format available by 'iconv':

	$ for f in $(iconv -l); do echo "Convertendo $f ..."; iconv -f $f -t UTF-8 < file_name.csv > fil_name.$f.csv; done

	The, I searched for the file name containing some known word:

	$ IFS=$(echo -en "\n\b") ; for i in $(grep -Hi "são\ paulo" *); do echo $i \| awk '{print $1}'; done
No results found