Skip to content

Instantly share code, notes, and snippets.

@CyrusNajmabadi
Last active May 4, 2023 02:38
Show Gist options
  • Save CyrusNajmabadi/0f43bcefbb4d1b0ebe84997b19fa4c80 to your computer and use it in GitHub Desktop.
Save CyrusNajmabadi/0f43bcefbb4d1b0ebe84997b19fa4c80 to your computer and use it in GitHub Desktop.
AnonymousList.md

Hey Mads! I wanted to give you a little sketch of one of the approaches we've been considering to help out in the var/natural-type space. We discussed with stephen yesterday. While not a "slam slam dunk" it def seemed to have some real upsides that made it him feel better about some things (though it also came with downsides as you'll see).

Core goals/thoughts of mine were:

  1. Perf sensitive people should not feel bad using 'var' the feature (or have to ban it).
  2. We should be able to tell people "you can trust we do a really​ good job" or, ideally, "we do the best​ job" when you use these.
  3. Limitations should be minimal. However, when present, should be sensible, acceptable, and easy to explain. Most users will feel things "just work" (However, those who like to know how the sausage is made might discover a lot of complexity to make that happen).
  4. We may be stretching our comfort zone, but hopefully the value makes that ok.

To that end, i'm starting with the idea that we have an ​ type. Similar to ​, this is an unspeakable name, and cannot be used in signatures. When you write:

var v = [1, 2, 3]; 

Then you get an , just like var v = new { A = 1 }; gives you an anonymous type.

This anonymous-list is list-like wrt to what you can do with it. That includes an API that is probably very close to List. So being able to get the count, being able to index into it (both get/set), being able to add/remove from it. However​, this is not​ List, and the compiler is 100% free to swap in whatever impl it wants to support the set of operations the user is performing on it within​ that method. Similar to anonymous types, you get no guarantees around things like the System.Type you would get if you did .GetType​ on it.

Now, what happens if the value is passed out​ of the method, or is passed to something that needs an explicit type. There are a few ways this could happen:

  1. being passed to something that takes a constructible-collection-type as defined in the spec (e.g. arrays, spans, normal instantiable collections, etc.)
  2. passed to object/dynamic.
  3. passed to a naked generic type-parameter.

For '1', at the point this value is needed, it is produced (using the mechanisms present in the spec already for creating such a value). However, at that point, the becomes 'frozen' and cannot be mutated anymore. The latter prevents confusing scenarios like:

void Foo(List<int> list);
void Bar(List<int> list);

var v = [1, 2, 3];
Foo(v);
v.Add(4);
Bar(v); // what happens here?  A fresh copy?  The original list?  Very confusing.

'2' and '3' don't have answers yet. BUt i would be ok with 'disallow' this and error.

Also, to prevent very confusing scenarios, in all the places where 'v' is passed/reified, all those types need to be identical. This prevents confusion for something like:

void Foo(List<int> list);
void Bar(ImmutableArray<int> list);

var v = [1, 2, 3];
Foo(v);
Bar(v); // what happens here?  A fresh copy?  What if Foo mutated the List it was given? Best to just disallow.

Effectively, the '[...]​' syntax is giving you a lightweight builder. But once the final value is built​, then it's done being able to do anymore building.

So why is this approach helpful? Well, first, it gives the compiler a lot​ of flexibility in terms of how it generates stuff. Importantly:

  1. If the value is used for Span/ReadOnlySpan clients, and is not mutated at all, it can avoid any memory overhead at all. Compiler can literally just emit Span/ReadOnlySpan.
  2. If the value is used for Span/ReadOnlySpan, and is mutated, we can emit something similar to ValueListBuilder (either CLR provided, or our own synthesized type) and only have the minimal overhead to support mutation.
  3. If the value is used for ImmutableArray/List/Array/etc. you get effective optimal codegen (e.g. the minimal work to just get the values ready, then construct the final collection-instance from them). It would be generally hard for users to beat without a lot of effort.

Second, It also enables really​ interesting, potentially idiomatic forms we see in other languages like JS/TS/Python like so:

var v = []; // yes, this has no elements in it.
foreach (var x in DoSomething())
{
    if (x.HasWidgets)
    {
        v.AddRange(x.Widgets);
    }
    else if (v.Count > 0)
    {
        v.RemoveAt(^1);
    }
}

PassToSomethingWhichTakesImmutableArray(v);

Yes, this allows both an empty literal, and allows very disparate forms of mutation (including removal). You basically say you want a starting list, work with it, then send it off to whoever needs it, and only the minimal work is done. If you don't 'pass it' to anyone, this could also just stay in pure Span form for extreme​ efficiency, without feeling like you have to be a Span/ref-struct expert.

--

So what are the downsides? Well, as you likely have been seeing: this approach means we're impacting what is legal or not based on how the value is used. Specifically: determining the 'element type' of the based on usage. determining what final type/value is created (if needed) based on usage, and erroring if we think it's not ok. This really is a new area with little precedent. That said, while we were all discussing this, we did discuss there were areas of the language that felt a teeny bit​ similar. Specifically, that we figure out what the user wants by looking at disparate statements to make a determination. For example:

void Foo<T>(Func<T> func);

Foo(() =>
{
    if (whatever)
        return "";
    else
        return (object)""; 
});

Here, all​ the return-statements are interrogated to determine the return type of the Func delegate being created. So, in a real sense, we go look at what the user is doing to determine this. I fully admit this is not a direct analogy, as one is about lambda-inference, and the other is about var-inference. But it at least made me feel less oogy about the idea of looking at a potentially larger space to make determinations on what is going on.

--

Anyways, that was a lot. We'll go over it again with you tomorrow. I'm very interested to know if this feels revolting to you, or if this feels like a potentially interesting space that could bear fruit. Personally, i think from teh user perspective this has a bit of magic (similar to how 'captures' just magically work in lambdas, or how state machines work for async/await/iterators). But, hopefully it's good magic people can accept, given the end benefit in terms of ease of use and perf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment