skids · February 8, 2015 21:04
diff --git a/coerce-native b/coerce-native
 A deeper look at coercion.

 This is both an elaboration on a few points that were touched on
 in ab5tract's Day 4 advent post, and a deep look at the definition
 of "coerce" focusing on the soon-to-be-timely subject of buffers
 and native data types, and "coercive types".

 The most important part of the specification when it comes to
 defining what it means to "coerce" is this (in S02):

    ...if you say:

         $fido = Dog.new($spot)

    it certainly creates a new C<Dog> object.  But if you say:

         $fido = Dog($spot)

    it might call C<Dog.new>, or it might pull a C<Dog> with Spot's
    identity from the dog cache, or it might do absolutely nothing if
    C<$spot> already knows how to be a C<Dog>. ...

 It is this last part that is critical, because a mutable object
 coerced to another type may be modifiable through the product
 of that coercion.

 Currently on Rakudo that does not seem to be the case for built-in types.
 We can look at the simple case of a type being cast to itself:

    my @a = Array.new(0,1,2);
    my @b := Array(@a);
    @b[0] = 2;
    @a.say; # 0 1 2 ... this is not guaranteed on all implementations
    @b.say; # 2 1 2
    @b := @a.Array;
    @b[0] = 2;
    @a.say; # 0 1 2 ... neither is this guaranteed to be same as above
    @b.say; # 2 1 2

 As an aside, the second comment merely draws attention to the fact that
 the rules for finding the right way to "coerce" are different for the
 method and sub forms, and there are effectively two different paths by
 which we can define coercion -- one from the coercer and the other
 from the ceorcee.

 Anyway, any module author can decide to do it this way:

    class A {
        has $.foo is rw;
        method Int () is rw { $!foo }
    }
    my $a = A.new(:foo(4));
    my $b := Int($a);
    $b = 5;
    $a.foo.say; # 5

 Of course, in the majority of cases coercion is simply used to
 read from the object, and in a lot of cases there is no good
 way to implement writing back to the orginal object from the
 product of the coercion.

 Now take a look at the next sentence in the spec:

    As a fallback, if no method responds to a coercion request, the
    class will be asked to attempt to do C<Dog.new($spot)> instead.

 When this scenario happens, the product of the coercion may end
 up being a mutable copy of the thing we coerced.  So to recap,
 a coercion may produce:

 1) A mutable copy of the original object
 2) An immutable view into the original object that pretends to
   be the new type for all readable purposes
 3) A mutable view into the original object that allows
   modification of the original object
 4) Some mixture of 2 and 3 for complex classes

 TLDR: Coercion is at heart a way to fit square pegs in round
 holes, and those pegs can be mashed in with a wide selection
 of hammers.

 Modules, of course, can document what their behavior is
 in their manpage along with all the other particulars of
 a module.  For base types we have a situation where a coder
 will have to know which types coerced to which other types
 result in a writeable view of the original value, which
 of those views are copies as opposed to write-back, and on
 which implementations that behavior changes.

 It is tempting to say "I don't want to memorize another table
 like operator precedence chart, let's just spec all base type
 coercions to behave the same way."  In fact this is what has
 been done with "coercion types".  Oh but wait, we are
 getting ahead of ourselves for those not on IRC.  What's a
 coercion type?

 That's this:

    sub foo (Int(Rat) $f) { $f + 1 };
    foo(5/4).say;  # 2

 The above did not even parse until today when the 6pe merged.  This
 deprecated syntax does what the above is supposed to on Rakudo,
 for those that don't take their code intraveinously:

    sub foo (Rat $f as Int) { $f + 1 };
    foo(5/4).say; # 2

 This is not merely a syntactic macro that converts the above to:

    sub foo (Rat $f) { $f = Int($f); $f + 1 };

 ...as the "Int(Rat)" is actually going to be a type unto
 itself, apparently.  Which means you can introspect it.  Which,
 we will see later, might be a very good thing.

 So, where were we?  Oh yes, wha the spec currently says about
 "coercive types" is this:

   This only works for one-way coercion, so you may not declare any C<rw>
   parameter with a coercive type.

 I suspect there might be some back-pressure from users over that clause
 in the spec, especially when it comes to native arrays and buffers, because
 it seems kind of arbitrary to force just those cases to do it longhand,
 and also it is less introspectable that way.

 So let's talk about bufs and native arrays for a bit.  First let's look
 at what the spec has to say about the relationship between buffers and
 native typed arrays a.k.a "compact arrays":

    A compact array is for most purposes interchangeable with the
    corresponding buffer type.  For example, apart from the sigil,
    these are equivalent declarations:

        my uint8 @buffer;
        my buf8 $buffer;

    (Note: If you actually said both of those, you'd still get two
    different names, since the sigil is part of the name.)

    So given C<@buffer> you can say

        $piece = substr(@buffer, $beg, $end - $beg);

    and given C<$buffer> you can also say

        @pieces = $buffer[$n ..^ $end];

 I haven't been able to find it in the spec explicitly, but in addition to
 the above, this is currently implemented behavior:

    $ perl6 -e 'my @b := buf8.new(1,2,3); @b[1].say; @b.say'
    2
    Buf[uint8]:0x<01 02 03>

 ...which makes sense since a buffer does Positional.  Another critical
 part of the spec is the following:

    the presence of a low-level type tells Perl that it is free to
    implement the array with "compact storage", that is, with a chunk
    of memory containing contiguous (or as contiguous as practical)
    elements of the specified type without any fancy object boxing that
    typically applies to undifferentiated scalars.

 ...but Buf is defined as:

    A mutable container for an array of integer values in contiguous
    memory.

 So Bufs are guaranteed to be stored contiguously in memory, while
 native typed arrays are only contiguous on the backend to the point
 that it is practical to do so.  NativeCall users take note: Buf/Blob is
 what is safe to pass to C functions.

 This tells us that we have to look really hard as what "is rw", which
 normally pertains only to the "container" part of an argument, mean in
 the case of these objects when they are bound directly to an @-sigiled
 parameter.  I don't know the answer to that; I suspect it will be
 banged out in time (last I saw from outside "the loop", there was some
 chafe between implementation and spec as to how deep the default
 read-only protection is supposed to go even on undifferentiated Scalar
 arrays.)

 The arrival of native typed arrays will remove roadblocks to more
 sophisticated handling of buffers than we have had previously.  Also the
 new support for Buf in NativeCall is going to have module authors
 working with more buffers in ways that have not been thoroughly
 exercised before.  Buffers are different than objects in that they
 have an increased tendency to be very large.  In the case of crypto
 functionality, they are also potential targets for a lot of iterative
 math, and could play a big role on the back-end of hyperoperators.

 So, efficiency topics like this come up:

 grondilu: anyway so me, if I had to convert a Buf[uint8] to a Buf[uint16],
          I'd first get the list of bytes, group them by two and then
          create the corresponding 16-bits words.
 jnthn:    I'm sure we can provide a better way to do that :)
 grondilu: ideally there should be a constructor candidate that takes a
          buffer of an other type as argument.
          something like:  my Buf[uint16] $a .= new: Buf[uint8].new: ^10
 grondilu: That's not a bad idea.

 Now, it isn't clear exactly what the use case was here: there are indeed
 situations where you need to copy the contents of a buf8 into a buf16
 and then modify the buf16 while leaving the original buf8 untouched.
 There are many more situations where all you need to do is read values
 from the same memory area as the buf8 but read them as words, so there's
 no good reason to be copying all the values to a new memory location.
 And finally, there are some situations where you need to have writes to
 the buf16 not only alter the buf16, but also alter the corresponding
 values in the original buf8 view.

 And while construction can always be used to get copy semantics, the
 specification of "coercion" is (neccessarily) too broad to allow us
 to ask explicitly for the other two behaviors.  Also, even when you
 want copy semantics, you may want "copy on write" a.k.a. lazy mutation,
 a.k.a. "COW" semantics, so that large buffers are not copied until
 someone actually decides to write to them.  Or you may not, depending
 on when you want the performance hit to occur.

 Now, structured native data (CStruct in NativeCall, "compact structs" in
 the spec) is also supposed to behave as if it is packed, even if the
 implementation plays tricks on the back-end.  That behavior means you
 can pass it back to C (or whatever) as a properly serialized structure.
 Specifically the spec says:

    The packing serialization is performed by coercion to an appropriate
    buffer type.  The unpacking is performed by coercion of such a buffer
    type back to the type of the compact struct.

    Of course, a lazy implementation will probably find it easiest just
    to keep the object in its serialized form all the time.  In particular,
    an array of compact structs must be stored in their serialized form
    (see next section).

 Again, Buf is what is safe to pass to NativeCall, though NativeCall has
 rules about its REPRs that make this seamless by skipping a manual Buf
 coercion.  Also again, the definition of "coerce" when it comes to
 mutability, write-back, and COW behavior is left up to the implementation
 and also to to indiviudual modules.

 TDLR: There are 4 types of behavior C interfacers and pure-Perl6
 data acrobats will need to be able to explicitly ask Perl 6 for when
 working with native data aggregates in their serialized Buf forms:

 1) Read-only views with no copy performed when possible.
 2) Mutable copies that are copied when they are created.
 3) Mutable copies that copy-on-write ("COWercion"?)
 4) Mutable views that write mutations back to the originating object.

 ...and this is currently unspecced territory.
	A deeper look at coercion.

	This is both an elaboration on a few points that were touched on
	in ab5tract's Day 4 advent post, and a deep look at the definition
	of "coerce" focusing on the soon-to-be-timely subject of buffers
	and native data types, and "coercive types".

	The most important part of the specification when it comes to
	defining what it means to "coerce" is this (in S02):

	...if you say:

	$fido = Dog.new($spot)

	it certainly creates a new C<Dog> object. But if you say:

	$fido = Dog($spot)

	it might call C<Dog.new>, or it might pull a C<Dog> with Spot's
	identity from the dog cache, or it might do absolutely nothing if
	C<$spot> already knows how to be a C<Dog>. ...

	It is this last part that is critical, because a mutable object
	coerced to another type may be modifiable through the product
	of that coercion.

	Currently on Rakudo that does not seem to be the case for built-in types.
	We can look at the simple case of a type being cast to itself:

	my @a = Array.new(0,1,2);
	my @b := Array(@a);
	@b[0] = 2;
	@a.say; # 0 1 2 ... this is not guaranteed on all implementations
	@b.say; # 2 1 2
	@b := @a.Array;
	@b[0] = 2;
	@a.say; # 0 1 2 ... neither is this guaranteed to be same as above
	@b.say; # 2 1 2

	As an aside, the second comment merely draws attention to the fact that
	the rules for finding the right way to "coerce" are different for the
	method and sub forms, and there are effectively two different paths by
	which we can define coercion -- one from the coercer and the other
	from the ceorcee.

	Anyway, any module author can decide to do it this way:

	class A {
	has $.foo is rw;
	method Int () is rw { $!foo }
	}
	my $a = A.new(:foo(4));
	my $b := Int($a);
	$b = 5;
	$a.foo.say; # 5

	Of course, in the majority of cases coercion is simply used to
	read from the object, and in a lot of cases there is no good
	way to implement writing back to the orginal object from the
	product of the coercion.

	Now take a look at the next sentence in the spec:

	As a fallback, if no method responds to a coercion request, the
	class will be asked to attempt to do C<Dog.new($spot)> instead.

	When this scenario happens, the product of the coercion may end
	up being a mutable copy of the thing we coerced. So to recap,
	a coercion may produce:

	1) A mutable copy of the original object
	2) An immutable view into the original object that pretends to
	be the new type for all readable purposes
	3) A mutable view into the original object that allows
	modification of the original object
	4) Some mixture of 2 and 3 for complex classes

	TLDR: Coercion is at heart a way to fit square pegs in round
	holes, and those pegs can be mashed in with a wide selection
	of hammers.

	Modules, of course, can document what their behavior is
	in their manpage along with all the other particulars of
	a module. For base types we have a situation where a coder
	will have to know which types coerced to which other types
	result in a writeable view of the original value, which
	of those views are copies as opposed to write-back, and on
	which implementations that behavior changes.

	It is tempting to say "I don't want to memorize another table
	like operator precedence chart, let's just spec all base type
	coercions to behave the same way." In fact this is what has
	been done with "coercion types". Oh but wait, we are
	getting ahead of ourselves for those not on IRC. What's a
	coercion type?

	That's this:

	sub foo (Int(Rat) $f) { $f + 1 };
	foo(5/4).say; # 2

	The above did not even parse until today when the 6pe merged. This
	deprecated syntax does what the above is supposed to on Rakudo,
	for those that don't take their code intraveinously:

	sub foo (Rat $f as Int) { $f + 1 };
	foo(5/4).say; # 2

	This is not merely a syntactic macro that converts the above to:

	sub foo (Rat $f) { $f = Int($f); $f + 1 };

	...as the "Int(Rat)" is actually going to be a type unto
	itself, apparently. Which means you can introspect it. Which,
	we will see later, might be a very good thing.

	So, where were we? Oh yes, wha the spec currently says about
	"coercive types" is this:

	This only works for one-way coercion, so you may not declare any C<rw>
	parameter with a coercive type.

	I suspect there might be some back-pressure from users over that clause
	in the spec, especially when it comes to native arrays and buffers, because
	it seems kind of arbitrary to force just those cases to do it longhand,
	and also it is less introspectable that way.

	So let's talk about bufs and native arrays for a bit. First let's look
	at what the spec has to say about the relationship between buffers and
	native typed arrays a.k.a "compact arrays":

	A compact array is for most purposes interchangeable with the
	corresponding buffer type. For example, apart from the sigil,
	these are equivalent declarations:

	my uint8 @buffer;
	my buf8 $buffer;

	(Note: If you actually said both of those, you'd still get two
	different names, since the sigil is part of the name.)

	So given C<@buffer> you can say

	$piece = substr(@buffer, $beg, $end - $beg);

	and given C<$buffer> you can also say

	@pieces = $buffer[$n ..^ $end];

	I haven't been able to find it in the spec explicitly, but in addition to
	the above, this is currently implemented behavior:

	$ perl6 -e 'my @b := buf8.new(1,2,3); @b[1].say; @b.say'
	2
	Buf[uint8]:0x<01 02 03>

	...which makes sense since a buffer does Positional. Another critical
	part of the spec is the following:

	the presence of a low-level type tells Perl that it is free to
	implement the array with "compact storage", that is, with a chunk
	of memory containing contiguous (or as contiguous as practical)
	elements of the specified type without any fancy object boxing that
	typically applies to undifferentiated scalars.

	...but Buf is defined as:

	A mutable container for an array of integer values in contiguous
	memory.

	So Bufs are guaranteed to be stored contiguously in memory, while
	native typed arrays are only contiguous on the backend to the point
	that it is practical to do so. NativeCall users take note: Buf/Blob is
	what is safe to pass to C functions.

	This tells us that we have to look really hard as what "is rw", which
	normally pertains only to the "container" part of an argument, mean in
	the case of these objects when they are bound directly to an @-sigiled
	parameter. I don't know the answer to that; I suspect it will be
	banged out in time (last I saw from outside "the loop", there was some
	chafe between implementation and spec as to how deep the default
	read-only protection is supposed to go even on undifferentiated Scalar
	arrays.)

	The arrival of native typed arrays will remove roadblocks to more
	sophisticated handling of buffers than we have had previously. Also the
	new support for Buf in NativeCall is going to have module authors
	working with more buffers in ways that have not been thoroughly
	exercised before. Buffers are different than objects in that they
	have an increased tendency to be very large. In the case of crypto
	functionality, they are also potential targets for a lot of iterative
	math, and could play a big role on the back-end of hyperoperators.

	So, efficiency topics like this come up:

	grondilu: anyway so me, if I had to convert a Buf[uint8] to a Buf[uint16],
	I'd first get the list of bytes, group them by two and then
	create the corresponding 16-bits words.
	jnthn: I'm sure we can provide a better way to do that :)
	grondilu: ideally there should be a constructor candidate that takes a
	buffer of an other type as argument.
	something like: my Buf[uint16] $a .= new: Buf[uint8].new: ^10
	grondilu: That's not a bad idea.

	Now, it isn't clear exactly what the use case was here: there are indeed
	situations where you need to copy the contents of a buf8 into a buf16
	and then modify the buf16 while leaving the original buf8 untouched.
	There are many more situations where all you need to do is read values
	from the same memory area as the buf8 but read them as words, so there's
	no good reason to be copying all the values to a new memory location.
	And finally, there are some situations where you need to have writes to
	the buf16 not only alter the buf16, but also alter the corresponding
	values in the original buf8 view.

	And while construction can always be used to get copy semantics, the
	specification of "coercion" is (neccessarily) too broad to allow us
	to ask explicitly for the other two behaviors. Also, even when you
	want copy semantics, you may want "copy on write" a.k.a. lazy mutation,
	a.k.a. "COW" semantics, so that large buffers are not copied until
	someone actually decides to write to them. Or you may not, depending
	on when you want the performance hit to occur.

	Now, structured native data (CStruct in NativeCall, "compact structs" in
	the spec) is also supposed to behave as if it is packed, even if the
	implementation plays tricks on the back-end. That behavior means you
	can pass it back to C (or whatever) as a properly serialized structure.
	Specifically the spec says:

	The packing serialization is performed by coercion to an appropriate
	buffer type. The unpacking is performed by coercion of such a buffer
	type back to the type of the compact struct.

	Of course, a lazy implementation will probably find it easiest just
	to keep the object in its serialized form all the time. In particular,
	an array of compact structs must be stored in their serialized form
	(see next section).

	Again, Buf is what is safe to pass to NativeCall, though NativeCall has
	rules about its REPRs that make this seamless by skipping a manual Buf
	coercion. Also again, the definition of "coerce" when it comes to
	mutability, write-back, and COW behavior is left up to the implementation
	and also to to indiviudual modules.

	TDLR: There are 4 types of behavior C interfacers and pure-Perl6
	data acrobats will need to be able to explicitly ask Perl 6 for when
	working with native data aggregates in their serialized Buf forms:

	1) Read-only views with no copy performed when possible.
	2) Mutable copies that are copied when they are created.
	3) Mutable copies that copy-on-write ("COWercion"?)
	4) Mutable views that write mutations back to the originating object.

	...and this is currently unspecced territory.