This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
| using System; | |
| using System.IO; | |
| using System.Runtime.Serialization; | |
| using System.Runtime.Serialization.Formatters.Binary; | |
| class Program | |
| { | |
| static void Main(string[] args) | |
| { | |
| Stream inputStream = GetInputStream(); |
| using System; | |
| using System.Runtime.InteropServices; | |
| using System.Text; | |
| class Program | |
| { | |
| static void Main(string[] args) | |
| { | |
| { | |
| // the text below is meaningless |
| <?xml version="1.0" encoding="utf-8"?> | |
| <Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> | |
| <!-- ... --> | |
| <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" /> | |
| <!-- This task adds a module initializer to {IL}.txt. --> | |
| <UsingTask TaskName="InjectModuleInitializer" TaskFactory="CodeTaskFactory" AssemblyFile="$(MSBuildToolsPath)\Microsoft.Build.Tasks.v4.0.dll"> | |
| <ParameterGroup> | |
| <Path ParameterType="System.String" Required="true" /> | |
| <InitializerMethod ParameterType="System.String" Required="true" /> | |
| </ParameterGroup> |
This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
Utf8String and related concepts are meant for modern internet-facing applications that need to speak "the language of the web" (or i/o in general, really). Currently applications spend some amount of time transcoding into formats that aren't particularly useful, which wastes CPU cycles and memory.
A naive way to accomplish this would be to represent UTF-8 data as byte[] / Span<byte>, but this leads to a usability pit of failure. Developers would then become dependent on situational awareness and code hygiene to be able to know whether a particular byte[] instance is meant to represent binary data or UTF-8 textual data, leading to situations where it's very easy to write code like byte[] imageData = ...; imageData.ToUpperInvariant();. This defeats the purpose of using a typed language.
We want to expose enough functionality to make the Utf8String type usable and desirable by our developer audience, but it's not intended to serve as a
| // In a loop, try reading a natural word at a time. | |
| const int CharsPerNuint = sizeof(nuint) / sizeof(char); | |
| for (; inputLength >= CharsPerNuint; pInputBuffer += CharsPerNuint, inputLength -= CharsPerNuint) | |
| { | |
| nuint utf16Data = Unsafe.ReadUnaligned<nuint>(pInputBuffer); | |
| utf16Data &= unchecked((nuint)0xFF80_FF80_FF80_FF80ul); | |
| if (utf16Data == 0) | |
| { |
Utf8Char is synonymous with Char: they represent a single UTF-8 code unit and a single UTF-16 code unit, respectively. They are distinct from the integral types Byte and UInt16 in that sequences of the UTF-* code unit types are meant to represent textual data, while sequences of the integral types are meant to represent binary data.
Drawing this distinction is important. With UTF-16 data (String, Char[]), this distinction historically hasn't been a source of confusion. Developers are generally cognizant of the fact that aside from RPC, most i/o involves some kind of transcoding mechanism. Binary data doesn't come in from disk or the network in a format that can be trivially projected as a textual string; it must go through validation, recombining, and substitution. Similarly, when writing a string to disk or the network, a trivial projection is again impossible. The transcoding step must run in reverse to get the text data int
This tests the performance of MemoryExtensions.ToUpperInvariant(this ReadOnlySpan<char>, Span<char>), String.GetHashCode(), and String.GetHashCode(StringComparison.OrdinalIgnoreCase).
In below table:
| Method | Toolchain | StringLength | Mean | Error | StdDev | Scaled | ScaledSD |
|---|---|---|---|---|---|---|---|
| ToUpperInvariant | baseline coreclr | 0 | 27.112 ns | 0.7416 ns | 1.1763 ns | 1.00 | 0.00 |
| /* | |
| * !! WARNING !! | |
| * | |
| * COMPLETELY UNTESTED CODE | |
| */ | |
| using Microsoft.Win32.SafeHandles; | |
| using System.Diagnostics; | |
| using System.Runtime.CompilerServices; | |
| using System.Runtime.ConstrainedExecution; |
This document describes the APIs of Memory<T>, IMemoryOwner<T>, and MemoryManager<T> and their relationships to each other.
See also the Memory<T> usage guidelines document for background information.
Memory<T> is the basic type that represents a contiguous buffer. This type is a struct, which means that developers cannot subclass it and override the implementation. The basic implementation of the type is aware of contigious memory buffers backed by T[] and System.String (in the case of ReadOnlyMemory<char>).