Skip to content

Instantly share code, notes, and snippets.

@GrabYourPitchforks
Last active March 31, 2021 17:46
Show Gist options
  • Save GrabYourPitchforks/7ab6a440100467a82cfe5998cd1e91be to your computer and use it in GitHub Desktop.
Save GrabYourPitchforks/7ab6a440100467a82cfe5998cd1e91be to your computer and use it in GitHub Desktop.
C# single-copy struct proposal

Problem statement and core scenario

We want to introduce the idea of a value type where the underlying data is only ever "live" in at most one place. The canonical example is the internal ValueStringBuilder struct type, which performs internal ArrayPool management.

ValueStringBuilder builder = new ValueStringBuilder(); // VSB is a struct type
builder.Append("foo");
builder.Append(obj);
HelperMethod(ref builder); // builder passed by ref to helper methods
return builder.ToString(); // ToString releases underlying rented arrays back to pool

This would also allow us to expose an allocation-free variant of Utf8JsonWriter. In .NET Core 3.x, we had originally exposed this type as a struct, but before RTM we backtracked and turned it into a proper class due to consumers finding it too easy to pass these struct by value and corrupt the internal state of the writer.

More generally, the problem we're trying to solve is that if a value type is responsible for manual resource management, we don't want the value type's internal state to be duplicated in such a manner that resource management becomes unreliable.

Proposal

The proposal here is inspired by C++'s concept of move-only types like std::unique_ptr<T>. There may be other ways to solve this problem more generally which I have not considered.

This isn't trying to solve the issue of single ownership for all possible types; e.g., "lots of people have a reference to this Stream instance, but who is ultimately responsible for calling Stream.Dispose?"

We can introduce the concept of a "single-copy" type in C#. To be annotated as single-copy, the following attribute is applied to the type.

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Struct, AllowMultiple = false, Inherited = false)]
    public sealed class SingleCopyAttribute : Attribute { }
}

// an example of a single-copy struct
[SingleCopy]
public ref struct MySingleCopyStruct
{
    public MySingleCopyStruct(/* ... */) { }
    public int SomeMethod(/* ... */) { }
    // ...
}

Single-copy types have the following restrictions:

  1. Only ref structs may be marked single-copy. (This implies single-copy types cannot be boxed.)
  2. If an address of a single-copy type is assigned a value, the source of that copy must be considered uninitialized.

The second bullet above is the most interesting, as it introduces several code patterns which are valid for normal structs and ref structs, but invalid for copy-only structs. Consider the following examples.

Examples

[SingleCopy]
public ref struct MySingleCopyStructFoo
{
    public MySingleCopyStructFoo(int foo)
    {
        DoSomethingWith(this); // ERROR: copy of 'this' is made, which means 'this' is now unassigned before ctor returns
    }

    public static void DoSomethingWith(MySingleCopyStructFoo value) { /* ... */ }
}

[SingleCopy]
public ref struct MySingleCopyStructBar
{
    public MySingleCopyStructBar(int foo)
    {
        DoSomethingWith(ref this); // OK: no copy of 'this' is made
        DoSomethingElseWith(this); // OK: copy of 'this' is made, 'this' now unassigned
        this = default;            //     but this line re-assigns before ctor exits
    }

    public static void DoSomethingWith(ref MySingleCopyStructBar value) { /* ... */ }
    public static void DoSomethingElseWith(MySingleCopyStructBar value) { /* ... */ }
}

[SingleCopy]
public ref struct MySingleCopyStructBaz
{
    private int _field;

    public void SomeInstanceMethod()
    {
        this._field = 42; // OK: no copy of 'this'
        Console.WriteLine(this._field); // OK: no copy of 'this'
        this.SomeOtherMutatingInstanceMethod(); // OK: 'this' implicitly passed by ref

        var copy = this; // ERROR: 'this' (passed by ref as arg0 to this method) now unassigned, nonsensical
        Console.WriteLine(copy._field); // OK: no copy of 'copy'
    }

    public void SomeOtherMutatingInstanceMethod()
    {
        this._field = 100; // OK: no copy of 'this'
    }

    public static void SomeStaticMethod(in MySingleCopyStructBaz value)
    {
        value.SomeOtherMutatingInstanceMethod(); // ERROR: implicit copy of 'value' due to readonly -> mutable semantics
    }
}

For advanced scenarios, we could also introduce utility methods to allow power developers to bypass these restrictions as needed.

public static class SingleCopyUtility
{
    // assumes allowing passing ref structs as generic 'T'
    public static T DangerousCopy<T>(ref T value) where T : singlecopy
    { /* ... */ }

    // roughly equivalent to C++'s std::move<T>(T&&)
    public static T Move<T>(ref T value) where T : singlecopy
    {
        T copy = DangerousCopy(ref value);
        value = default;
        return copy;
    }
}

Assumptions

  • It is nonsensical (and thus forbidden) to consider the target of a ref unassigned. Therefore the following patterns would be illegal in all cases. (We can implement speciality helpers like Move<T> within the runtime.)

    void MyMethodFoo(ref MySingleCopyType a, ref MySingleCopyType b)
    {
        a = b; // ERROR: setting 'a' is ok, but marking 'b' unassigned is nonsensical, hence forbidden
        b = default; // definite assignment, but doesn't prevent the line above from erroring out
    }
    
    void MyMethodBar(ref MySingleCopyType a, ref MySingleCopyType b)
    {
        a = Move(ref b); // OK
    }
    
    /* following examples consider a struct with a field of a single-copy type */
    
    ref MySingleCopyType MyOtherMethod(ref MySingleCopyOuterType value)
    {
        return ref value._mySingleCopyType; // OK: ref manipulation, no copies made
    }
    
    MySingleCopyType MyOtherMethod(ref MySingleCopyOuterType value)
    {
        return value._mySingleCopyType; // ERROR: marking 'value._mySingleCopyType' as unassigned is nonsensical, hence forbidden
        return Move(ref value._mySingleCopyType); // OK
    }
  • Struct instance methods are modeled as TReturn Method(ref TStruct @this, ...). This allows instance methods to be called over and over sequentially without a copy being made.

    MySingleCopyStruct val = new MySingleCopyStruct();
    val.InstanceMethod(); // no copy
    DoSomethingWithByRef(ref val); // no copy
    val.InstanceMethod(); // no copy
    DoSomethingWithByVal(val); // copy, 'val' is now unassigned
    val.InstanceMethod(); // ERROR: 'val' was marked unassigned per line above
  • Struct ctors are modeled as instance methods (void .ctor(ref TStruct @this, ...)) or as value-returning methods (TStruct .ctor(...)). This subtle distinction affects which operations are valid within the ctor. For example, if the ctor returns void and operates on an implicit ref this, then no copy of this may be made at all without going through Move<T>(ref T). If the ctor is instead struct-returning, then a single copy of this may be made as long as this is reassigned before the ctor returns.

  • The compiler and runtime will get support for passing ref structs as the generic T, required for Move<T>(ref T). If this doesn't come in, maybe we introduce a __move keyword or similar. (Ugh.)

@sharwell
Copy link

Only ref structs may be marked single-copy

This is not a good use of ref struct for a few reasons:

  1. It prevents use of these types in state machines, where the semantics are otherwise sound
  2. It prevents move to box, which is a valid operation for single-copy types (the single instance lives in the box at this point)

Auto-properties are another interesting case (the get; implicitly copies the underlying value).

@GrabYourPitchforks
Copy link
Author

I agree that ref struct might be overly restrictive. It was mainly a way to get things like "can't be put inside normal copyable structs, can't be boxed, can't be used as generics, etc." for free. If we're willing to enlighten the compiler, this restriction can go away.

@sharwell
Copy link

sharwell commented Mar 31, 2021

If we're willing to enlighten the compiler, this restriction can go away.

The analyzer would be expected to handle all these cases. Currently RS0042 handles all except for use as a generic argument.

... can't be boxed ...

These types can be boxed. It's both useful in practice and not a semantic error. It's functionally identical to storing it in a StrongBox<T> or as a field of a reference type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment