Created
February 8, 2015 21:04
-
-
Save skids/aabd2aad3d0b5ad8481b to your computer and use it in GitHub Desktop.
A deeper look at coercion.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A deeper look at coercion. | |
This is both an elaboration on a few points that were touched on | |
in ab5tract's Day 4 advent post, and a deep look at the definition | |
of "coerce" focusing on the soon-to-be-timely subject of buffers | |
and native data types, and "coercive types". | |
The most important part of the specification when it comes to | |
defining what it means to "coerce" is this (in S02): | |
...if you say: | |
$fido = Dog.new($spot) | |
it certainly creates a new C<Dog> object. But if you say: | |
$fido = Dog($spot) | |
it might call C<Dog.new>, or it might pull a C<Dog> with Spot's | |
identity from the dog cache, or it might do absolutely nothing if | |
C<$spot> already knows how to be a C<Dog>. ... | |
It is this last part that is critical, because a mutable object | |
coerced to another type may be modifiable through the product | |
of that coercion. | |
Currently on Rakudo that does not seem to be the case for built-in types. | |
We can look at the simple case of a type being cast to itself: | |
my @a = Array.new(0,1,2); | |
my @b := Array(@a); | |
@b[0] = 2; | |
@a.say; # 0 1 2 ... this is not guaranteed on all implementations | |
@b.say; # 2 1 2 | |
@b := @a.Array; | |
@b[0] = 2; | |
@a.say; # 0 1 2 ... neither is this guaranteed to be same as above | |
@b.say; # 2 1 2 | |
As an aside, the second comment merely draws attention to the fact that | |
the rules for finding the right way to "coerce" are different for the | |
method and sub forms, and there are effectively two different paths by | |
which we can define coercion -- one from the coercer and the other | |
from the ceorcee. | |
Anyway, any module author can decide to do it this way: | |
class A { | |
has $.foo is rw; | |
method Int () is rw { $!foo } | |
} | |
my $a = A.new(:foo(4)); | |
my $b := Int($a); | |
$b = 5; | |
$a.foo.say; # 5 | |
Of course, in the majority of cases coercion is simply used to | |
read from the object, and in a lot of cases there is no good | |
way to implement writing back to the orginal object from the | |
product of the coercion. | |
Now take a look at the next sentence in the spec: | |
As a fallback, if no method responds to a coercion request, the | |
class will be asked to attempt to do C<Dog.new($spot)> instead. | |
When this scenario happens, the product of the coercion may end | |
up being a mutable copy of the thing we coerced. So to recap, | |
a coercion may produce: | |
1) A mutable copy of the original object | |
2) An immutable view into the original object that pretends to | |
be the new type for all readable purposes | |
3) A mutable view into the original object that allows | |
modification of the original object | |
4) Some mixture of 2 and 3 for complex classes | |
TLDR: Coercion is at heart a way to fit square pegs in round | |
holes, and those pegs can be mashed in with a wide selection | |
of hammers. | |
Modules, of course, can document what their behavior is | |
in their manpage along with all the other particulars of | |
a module. For base types we have a situation where a coder | |
will have to know which types coerced to which other types | |
result in a writeable view of the original value, which | |
of those views are copies as opposed to write-back, and on | |
which implementations that behavior changes. | |
It is tempting to say "I don't want to memorize another table | |
like operator precedence chart, let's just spec all base type | |
coercions to behave the same way." In fact this is what has | |
been done with "coercion types". Oh but wait, we are | |
getting ahead of ourselves for those not on IRC. What's a | |
coercion type? | |
That's this: | |
sub foo (Int(Rat) $f) { $f + 1 }; | |
foo(5/4).say; # 2 | |
The above did not even parse until today when the 6pe merged. This | |
deprecated syntax does what the above is supposed to on Rakudo, | |
for those that don't take their code intraveinously: | |
sub foo (Rat $f as Int) { $f + 1 }; | |
foo(5/4).say; # 2 | |
This is not merely a syntactic macro that converts the above to: | |
sub foo (Rat $f) { $f = Int($f); $f + 1 }; | |
...as the "Int(Rat)" is actually going to be a type unto | |
itself, apparently. Which means you can introspect it. Which, | |
we will see later, might be a very good thing. | |
So, where were we? Oh yes, wha the spec currently says about | |
"coercive types" is this: | |
This only works for one-way coercion, so you may not declare any C<rw> | |
parameter with a coercive type. | |
I suspect there might be some back-pressure from users over that clause | |
in the spec, especially when it comes to native arrays and buffers, because | |
it seems kind of arbitrary to force just those cases to do it longhand, | |
and also it is less introspectable that way. | |
So let's talk about bufs and native arrays for a bit. First let's look | |
at what the spec has to say about the relationship between buffers and | |
native typed arrays a.k.a "compact arrays": | |
A compact array is for most purposes interchangeable with the | |
corresponding buffer type. For example, apart from the sigil, | |
these are equivalent declarations: | |
my uint8 @buffer; | |
my buf8 $buffer; | |
(Note: If you actually said both of those, you'd still get two | |
different names, since the sigil is part of the name.) | |
So given C<@buffer> you can say | |
$piece = substr(@buffer, $beg, $end - $beg); | |
and given C<$buffer> you can also say | |
@pieces = $buffer[$n ..^ $end]; | |
I haven't been able to find it in the spec explicitly, but in addition to | |
the above, this is currently implemented behavior: | |
$ perl6 -e 'my @b := buf8.new(1,2,3); @b[1].say; @b.say' | |
2 | |
Buf[uint8]:0x<01 02 03> | |
...which makes sense since a buffer does Positional. Another critical | |
part of the spec is the following: | |
the presence of a low-level type tells Perl that it is free to | |
implement the array with "compact storage", that is, with a chunk | |
of memory containing contiguous (or as contiguous as practical) | |
elements of the specified type without any fancy object boxing that | |
typically applies to undifferentiated scalars. | |
...but Buf is defined as: | |
A mutable container for an array of integer values in contiguous | |
memory. | |
So Bufs are guaranteed to be stored contiguously in memory, while | |
native typed arrays are only contiguous on the backend to the point | |
that it is practical to do so. NativeCall users take note: Buf/Blob is | |
what is safe to pass to C functions. | |
This tells us that we have to look really hard as what "is rw", which | |
normally pertains only to the "container" part of an argument, mean in | |
the case of these objects when they are bound directly to an @-sigiled | |
parameter. I don't know the answer to that; I suspect it will be | |
banged out in time (last I saw from outside "the loop", there was some | |
chafe between implementation and spec as to how deep the default | |
read-only protection is supposed to go even on undifferentiated Scalar | |
arrays.) | |
The arrival of native typed arrays will remove roadblocks to more | |
sophisticated handling of buffers than we have had previously. Also the | |
new support for Buf in NativeCall is going to have module authors | |
working with more buffers in ways that have not been thoroughly | |
exercised before. Buffers are different than objects in that they | |
have an increased tendency to be very large. In the case of crypto | |
functionality, they are also potential targets for a lot of iterative | |
math, and could play a big role on the back-end of hyperoperators. | |
So, efficiency topics like this come up: | |
grondilu: anyway so me, if I had to convert a Buf[uint8] to a Buf[uint16], | |
I'd first get the list of bytes, group them by two and then | |
create the corresponding 16-bits words. | |
jnthn: I'm sure we can provide a better way to do that :) | |
grondilu: ideally there should be a constructor candidate that takes a | |
buffer of an other type as argument. | |
something like: my Buf[uint16] $a .= new: Buf[uint8].new: ^10 | |
grondilu: That's not a bad idea. | |
Now, it isn't clear exactly what the use case was here: there are indeed | |
situations where you need to copy the contents of a buf8 into a buf16 | |
and then modify the buf16 while leaving the original buf8 untouched. | |
There are many more situations where all you need to do is read values | |
from the same memory area as the buf8 but read them as words, so there's | |
no good reason to be copying all the values to a new memory location. | |
And finally, there are some situations where you need to have writes to | |
the buf16 not only alter the buf16, but also alter the corresponding | |
values in the original buf8 view. | |
And while construction can always be used to get copy semantics, the | |
specification of "coercion" is (neccessarily) too broad to allow us | |
to ask explicitly for the other two behaviors. Also, even when you | |
want copy semantics, you may want "copy on write" a.k.a. lazy mutation, | |
a.k.a. "COW" semantics, so that large buffers are not copied until | |
someone actually decides to write to them. Or you may not, depending | |
on when you want the performance hit to occur. | |
Now, structured native data (CStruct in NativeCall, "compact structs" in | |
the spec) is also supposed to behave as if it is packed, even if the | |
implementation plays tricks on the back-end. That behavior means you | |
can pass it back to C (or whatever) as a properly serialized structure. | |
Specifically the spec says: | |
The packing serialization is performed by coercion to an appropriate | |
buffer type. The unpacking is performed by coercion of such a buffer | |
type back to the type of the compact struct. | |
Of course, a lazy implementation will probably find it easiest just | |
to keep the object in its serialized form all the time. In particular, | |
an array of compact structs must be stored in their serialized form | |
(see next section). | |
Again, Buf is what is safe to pass to NativeCall, though NativeCall has | |
rules about its REPRs that make this seamless by skipping a manual Buf | |
coercion. Also again, the definition of "coerce" when it comes to | |
mutability, write-back, and COW behavior is left up to the implementation | |
and also to to indiviudual modules. | |
TDLR: There are 4 types of behavior C interfacers and pure-Perl6 | |
data acrobats will need to be able to explicitly ask Perl 6 for when | |
working with native data aggregates in their serialized Buf forms: | |
1) Read-only views with no copy performed when possible. | |
2) Mutable copies that are copied when they are created. | |
3) Mutable copies that copy-on-write ("COWercion"?) | |
4) Mutable views that write mutations back to the originating object. | |
...and this is currently unspecced territory. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment