-
-
Save milovidov983/e6bd83028d694cc0c2c8d647b94a99d1 to your computer and use it in GitHub Desktop.
private string ConvertStringToUtf8Bom(string source) { | |
var data = Encoding.UTF8.GetBytes(source); | |
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray(); | |
var encoder = new UTF8Encoding(true); | |
return encoder.GetString(result); | |
} |
Thanks! 👍
@dojo90 @milovidov983 What's the point of the convert? as the str and str2 are the same string after the convert
var str = "aîn";
var str2 = ConvertStringToUtf8Bom(str);
Console.WriteLine(str2);
@dojo90 @milovidov983 What's the point of the convert? as the str and str2 are the same string after the convert
var str = "aîn"; var str2 = ConvertStringToUtf8Bom(str); Console.WriteLine(str2);
-
At the time when I wrote this function, it seemed to me that it solved my problem and the resulting CSV file immediately recognized the correct cyrillic encoding when opened by Excel.
-
By the way it looks like the strings are not equivalent. Or am I wrong?:
https://dotnetfiddle.net/mYgvdl
Example
using System;
using System.Text;
using System.Linq;
public class Program {
public static void Main() {
var str = "aîn";
var str2 = ConvertStringToUtf8Bom(str);
Console.WriteLine(str2);
Console.WriteLine(str == str2);
var str3 = "aîn";
Console.WriteLine(str == str3);
}
static string ConvertStringToUtf8Bom(string source) {
var data = Encoding.UTF8.GetBytes(source);
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
var encoder = new UTF8Encoding(true);
return encoder.GetString(result);
}
}
// Output:
/*
aîn
False
True
*/
@milovidov983,Thanks, I tried to print the bytes array, it make sense.
var str = "aîn";
var str2 = ConvertStringToUtf8Bom(str);
Console.WriteLine(str2 == str);
Console.WriteLine($"length of {str} is {str.Length}");
var bytes1 = Encoding.UTF8.GetBytes(str);
Console.WriteLine(GetHexString(bytes1));
Console.WriteLine();
Console.WriteLine($"length of {str2} is {str2.Length}");
var bytes2 = Encoding.UTF8.GetBytes(str2);
Console.WriteLine(GetHexString(bytes2));
False
length of aîn is 3
61 C3 AE 6E
length of aîn is 4
EF BB BF 61 C3 AE 6E
By the way, I am using stream writer to create a new file with Encoding.UTF8, and it will handle the BOM automatically.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L273
does the above code add Carriage return when processing ?
Thanks guys! This helped!
1