Last active
January 9, 2022 04:29
-
-
Save alabamenhu/3877fa665012e24ce74495d1661f69f9 to your computer and use it in GitHub Desktop.
Strongly Typed Raku (WIP)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin pod | |
Raku is a gradual typed language, allowing you to completely ignore typing for quick mock ups, | |
or enforce very strict typing to ensure reliability in critical programs. This is a guide for | |
certain issues that can come up when writing strongly typed Raku code. | |
=head1 Basic typing | |
By default, Raku doesn't care what you store in variables. | |
my $foo; | |
$foo = 1; | |
$foo = "one"; | |
$foo = { * + 1 }; | |
In a perfect world, we would always remember exactly what is being passed where | |
and make no mistakes whatsoever. But sometimes as programmers we can screw up: | |
my $number; | |
$number = "one"; | |
say $number + 2; | |
In this case, it's quite easy to ensure that C<$number> always contains something | |
numeric. Compare the results of the following two lines. | |
my Numeric $number; | |
$number = 1; # ✓ | |
$number = 1.11; # ✓ | |
$number = 1 / 10; # ✓ | |
$number = "one"; # error | |
The type of a variable can be set as either a class or a role. In Raku, classes | |
and roles function fairly independently of each other thanks to mixins, so don't | |
worry too much about whether you're using a class or a role: just ensure it locks | |
values into what you want. For numbers, note the difference: | |
# Classes | |
my Int $a; # only accepts 1, 2, 3, etc. | |
my Rat $b; # only accepts 1/1, 2/1, 3/1, etc. | |
my Num $c; # only accepts 1.0, 2.0, 3.0, etc. | |
# Roles | |
my Numeric $d; # accepts an Int, Rat, or Num | |
Be advised that a subclass will always be considered valid value for its parent's | |
class, so if you ask for C<ParentClass>, you might also get C<ChildClass> unless | |
you specifically disallow it (which can be done using subsets, see below). | |
All typing information can I<also> be included on signatures: | |
sub foo (Num $float, Int $integer, Numeric $any-number) { … } | |
One extra feature of signatures currently not elsewhere is the ability to autocoerce. | |
The two blocks are functionally equivalent: | |
sub foo (Str() $text) { … } | |
sub foo (Str $text) { | |
$text = $text.Str unless $text ~~ Str | |
… | |
} | |
This could works fine (ignoring that $text isn't r/w here) and will fail if the variable | |
passed cannot coerce itself into Str. Autocoercion is great in situations like logging | |
functions that will ultimately use the string value anyway. Be very judicious with it, | |
however, because many objects may have unexpected or undesirable coercions (C<Array> | |
coerces to numbers with the number of elements, so doing math on it with that value other | |
than to calculate offsets or loops likely represents an error. | |
=head2 Summary | |
Whenever you declare any variable, always ensure that you have the type that | |
best fits the variable. (Double and triple ask yourself if autocoercion is useful, | |
when in doubt, have calling code coerce manually). | |
=head1 Defined vs Types | |
Generally, it's okay for most variables to go undefined and still be passed around. | |
There are several subtle ways that undefined variables can accidentally come up | |
and not cause problems until sometime later. For instance, consider this code: | |
sub add(Int $a, Int $b) { | |
... | |
$a + $b | |
} | |
my Int @array = 1, 2, 3; | |
my Int $value = @array[3]; # oops, we forgot Raku is 0-based. | |
... | |
foo $a, $b; | |
The code will error, but it will error at the end of C<foo()>, even though the | |
mistakes happened far earlier. This can make debugging more difficult. One | |
possibility could be to say: | |
sub add(Int $a, Int $b) { | |
die unless $a.defined; | |
die unless $b.defined; | |
... | |
$a + $b | |
} | |
That's an improvement: even though we don't need to use $a in a defined way, | |
we'll catch it at the very beginning. But we can keep it in the sub's | |
signature: | |
sub foo(Int $a where *.defined, Int $b where *.defined) { … } | |
By putting it in the signature, the error can be caught as soon as the | |
sub is called. Using `where *.defined` is a bit line-noise-y, so there's | |
a syntactical shortcut for it: | |
sub foo(Int:D $a, Int:D $b) { … } | |
The C<:D> constraint doesn't actually have to have a type defined (although | |
we're talking about strictly typing here so let's avoid that, mmkay?), in | |
which case it's implied to be C<Any:D> | |
sub add(:D $a, :D $b) { … } | |
Sometimes, instead of desiring a defined object, you might want to | |
explicitly work with undefined variables. This is useful when | |
you want to pass types as types (for example, in parameterization). | |
Undefinedness can be enforced simply with the C<:U> constraint. | |
sub foo(:U $a) | |
=head2 Summary | |
For strongly typed Raku, you should always ensure that every parameter | |
in a signature has a definedness constraint. Here's the interpretations | |
Any:D $defined-only | |
Any:U $undefined-only | |
Any:_ $either-DANGER | |
Any $also-either-DANGER | |
=head1 Subsets and Constraints | |
When you want to restrict the values, rather than the type, you'll want to use a subset. | |
For instance, dividing a number can be done by anything but zero, so let's imagine a | |
C<divide> function that takes two numbers and returns the result: | |
sub divide(Numeric $dividend, Numeric $divisor) { | |
die "Division by zero is impossible" if $divisor == 0; | |
… | |
return $quotient; | |
} | |
There are better ways to handle this. One way is to explicitly state the limitations: | |
sub divide(Numeric $dividend, Numeric $divisor where * != 0) | |
This is great for single ad-hoc constraints. But let's consider a function that wants | |
a CSS color value as a string. | |
sub pretty(Str $color where /<[0..9a..fA..F]> ** 6/) { … } | |
The problem here is two fold: (1) color is something we'll probably use over and over | |
again and we'd have to keep rewriting the constraint and (2) a CSS color is more than | |
just six hexadecimal digits. It could also be three hex digits, or a name, or use an | |
explicit color type format like C<rgba(123,45,67,.89)>. What can be done instead is | |
create a CSSColor subset which is a Str by type, but whose values are limited to those | |
that are valid in CSS: | |
subset CSSColor of Str where { | |
|| $_ ~~ /<[0..9a..fA..F> ** 6/ | |
|| $_ ~~ /<[0..9a..fA..F> ** 3/ | |
|| $_ ∈ <red green blue yellow> | |
|| … | |
} | |
sub pretty(CSSColor $color) { … } | |
Now in the sub we can safely print out $color being assured it is a valid CSS, as well | |
use it in any other sub. If CSS changes its definition of colors, we can modify just | |
the subset and it will apply to every instance. | |
Subsets don't have to work on just the values. For instance, imagine I want a sub | |
to accept objects that are I<both> Positional I<and> Associative? I can't include | |
that in the type information itself — only via where clauses: | |
sub double-duty($foo where Positional & Associative) { … } | |
But with a subset, I can: | |
subset TwoWays where Positional & Associative; | |
sub double-duty(TwoWays $foo) { … } | |
You can make very complex subsets: | |
subset Overkill where $_ ~~ Positional | |
&& .[0] == 0 | |
&& .[3] == 3 | |
&& .all ~~ Int | |
&& .elems < 10; | |
my Overkill $a = (0,1,2,3,4); # perfect | |
my Overkill $b = (1,1,2,3,4); # error: first element must be 0 | |
my Overkill $c = (0..10).List; # error: more than ten elements | |
A really cool thing about subsets is that, written in a particular way, you can give | |
some very useful error messages. | |
subset Overkill where ($_ ~~ Positional || die "Must give a Positional value") | |
&& (.[0] == 0 || die "The first element must be zero") | |
&& (.[3] == 3 || die "The third element must be 3, but it was ", .[3]) | |
&& (.all ~~ Int || die "All elements must be Ints") | |
&& (.elems < 10 || die "Must have less than 10 elements but found ", .elems); | |
my Overkill $a = (0,1,2,2,4); # die output: "The third element must be 3, but it was 2" | |
my Overkill $b = (1,1,2,3,4); # die output: "The first element must be zero" | |
my Overkill $c = (0..10).List; # die output: "Must have less than 10 elements but found 11" | |
Usually die is what you want, but you can also create specific exception types and throw them | |
instead if you intended to use C<CATCH> blocks regularly. | |
=head2 Summary | |
Define a subset by using a where clause followed a single value (will be smartmatched) or one | |
or more operations. | |
subset 8bit where 0..255 # ranges' smart match checks values | |
subset SmallOrBig where (0..10) | (100..1000) # junctions are allowed | |
subset UnderTen where * < 10 # whatevers are valid (only use one) | |
subset UnderTenEven where $_ < 10 && # have to use topic variable | |
&& $_ % 2 # to reference value more than once | |
subset ThreeItems where .elems == 3 # since it's topicalized, you can use .method | |
# without the $_ reference. | |
Use a subset for whenever there are bad values or complicated restrictions for a type. | |
You can also use them for complicated types (that mix two or more). Subsets are fairly | |
fundamental to both strongly typed and safe Raku coding. | |
=head1 Arrays and Hashes | |
Arrays and hashes present some problems when trying to strongly type because of the way that | |
parameterization can be defined. Consider an array: | |
my @integers = 1,2,3,'4'; | |
This assignment is considered valid, but clearly contains a C<Str> that we don't want to | |
allow. Positional objects stored with a C<@> sigil can be quickly typed by simply placing | |
the type in front (remember that the C<@> sigil I<implies> the Positional type). | |
my Int @integers = … | |
And now it would catch accidentally putting in a Str. This is fairly straightforward. But | |
what if you want to have an array of arrays? The basic type definition is | |
my Array @arrays | |
But we can't say C<my Int Array @arrays>, because the C<Array> isn't defined in a C<@> sigil, | |
but we can use another format (which also works for C<$>-sigiled positionals): | |
my Array[Int] @arrays; | |
my Array[Array[Int]] $arrays-in-scalar; | |
Strongly typed Raku is very strict: the above arrays require extra work to work with: | |
my Array[Int] @arrays; | |
@arrays[0] = 1, 2, 3; # error! 1,2,3 is List[Any] | |
@arrays[0] = Array[Int].new(1,2,3); # correct | |
@arrays = (1,2,3), (4,5,6), (7,8,9) # each value is a List[Any] | |
@arrays = Array[Int].new(1,2,3), | |
Array[Int].new(4,5,6), | |
Array[Int].new(7,8,9); # correct, each value is an Array[Int] | |
Note the use of the C<Class[Type]> in the object constructor to enforce the correct | |
object type. | |
The same principle exists for Associatives like Hashes or Maps. You can almost think of | |
Positionals as hashes whose keys are integers. By default, the key of Associatives | |
are C<Str>, and the values are defeined similar to Positional value types: | |
my Int %integer-values = … | |
Sometimes, however, you may want to use a key type other than Str, for instance, | |
if you are creating a mapping between various objects. There is a special syntax | |
that uses curly braces to do this: | |
my Int %foo{Any} = … | |
This enables any kind of object to be used as a key. You don't actually need | |
the value to be defined to constrain the key, so to allow for any value, but | |
requiring integer keys, you can say: | |
my %any-value{Int} | |
For an Associative with keys | |
restricted to things like C<Rat> with values that are strings, you could say: | |
my Str %foo{Rat} = 1/2 => 'one half', 1 => 'one', 3/2 => 'one and a half'; | |
Just like with positionals, it is possible to define type constraints along | |
with the main class when using a scalar value: | |
my Map[Str,Rat] = … | |
In this format, note the order of the values C<Associative[Value,Key]>. If | |
you want to only define the key with this format, you will need to explicitly | |
use the value type C<Any>. | |
=head2 Summary | |
Here's a quick review of the different ways to define type/value relationships: | |
Positionals (remember, the @sigil implies Positional, NOT Array or List) | |
####################################### | |
my Str @array; # a Positional containing Str values | |
my List[Int] $array-in-scalar; # a List containing Int values, stored in a scalar | |
my Array[Rat] @array; # a Positional containing Arrays that contain Rat values | |
my Array[List[Str]] @array; # a Positional containing Arrays that contain Lists that contain Str values | |
Associatives (remember, the %sigil implies Associative, NOT Hash or Map, and defaults to Str keys) | |
####################################### | |
my Rat %hash; # an Associative containing Rat values with Str keys | |
my Rat %hash{Int}; # an Associative containing Rat values with Int keys | |
my %hash{Int}; # an Associative containing Rat values with Any keys | |
my Hash[Rat] $hash-in-scalar; # a Hash containing Str keys and Rat values, stored in a scalar | |
my Hash[Rat,Int] $hash-in-scalar; # a Hash containing Num keys and Rat values, stored in a scalar | |
my Hash[Any,Int] $hash-in-scalar; # a Hash containing Num keys and Any values, stored in a scalar | |
While you can go crazy and define a value like `my Hash[Array[Map[Callable]],Str] %hash{str}`, | |
it's really overkill. If the structure is know in advance, you are almost I<always> in such | |
case better off to make small objects. In particular, hash access is slower than attribute access, | |
can help with code completion in IDEs, and very importantly, will error when accessing non-existent | |
attributes (both Hashes and Arrays will return type objects which might not error immediately, see | |
Definedness and Types above). | |
=head1 Other Parameterization | |
To create your own parameterization, define your role with extra brackets | |
role Positional[::Value] { … } | |
It is possible to parameterize classes, but it is also a bit more difficult because it | |
involves making a special inner-role and is currently Rakudo-specific (at the moment | |
not a problem, as Rakudo is the only Raku compiler). For more information, see | |
L<this SO post|https://stackoverflow.com/questions/57554660>. | |
class DescriptiveName { | |
my role R[::T] { | |
has T $.value; # our parameterized value | |
# All "actual" methods will go begin here… | |
method new(Container: T $value) { self.bless: :$value } | |
# … and end here | |
} | |
# This handles the mixin process that combines the outer class with its inner role | |
method ^parameterize(Mu:U \this, Mu \T) { | |
my $type := this.^mixin: R[T]; | |
$type.^set_name: this.^name ~ '[' ~ T.^name ~ ']'; | |
$type | |
} | |
} | |
The C<::Identifier> syntax is defines a type capture. You can then use the captured type | |
anywhere else you would use a type. For example, if you have an array-like class, you | |
want to make sure you only add things of the same type. | |
role ArrayWrapper[::TypeCapture] { | |
has TypeCapture @internal; | |
method push(TypeCapture $elem) { @internal.push: $elem } | |
method pop( --> TypeCapture) { @internal.pop } | |
} | |
It's somewhat traditional to use a single letter for the type capture, but you're free to use | |
whatever name you want. You can also use use type captures in signatures: | |
sub same-type-only(::T $foo, T $bar) { | |
# dies unless $foo and $bar have matching types | |
} | |
Unfortunately, this isn't available for slurpies because they don't (currently) allow typing. | |
=head2 Summary | |
=head1 Slurpies | |
Slurpies present a major problem for strongly typed Raku programming: they don't allow typing | |
(yet, at least). In generally, just don't. But if you need to use this, there are a few | |
work arounds. The easiest way is to add a where clause: | |
sub only-strings(*@slurpy where .all ~~ Str) { … } | |
Hashes are far more complicated because if you use .all, you can only match with a Pair, which | |
is not parameterizable. So we need to do two checks: | |
sub only-string-key-int-val( | |
*%slurpy where .keys.all ~~ Str | |
&& .values.all ~~ Int | |
) { … } | |
There is a I<huge> caveat with this method for both Positionals and Associatives. Your slurpy | |
will *not* be typed. To ensure further type matching, you'll need to create an entirely | |
new variable first. | |
sub only-strings(*@slurpy where .all ~~ Str) { | |
my Str @typed-slurpy = Array.new: @slurpy; | |
} | |
sub only-string-key-int-val( | |
*%slurpy where .keys.all ~~ Str | |
&& .values.all ~~ Int | |
) { | |
my Int %typed-slurpy{Str} = Hash[Int,Str].new(%slurpy); | |
} | |
This is a complicated bit of boilerplate and fraught with many places where you can mess | |
up. While there may be a use case for it, it is probably much safer to just require a | |
typed array to be passed. Calling C<only-takes-str-array(Array[Str].new(…))> might be | |
annoying, but it helps to reinforce across the code base that we only want certain values. | |
Furthermore, IDEs and compilers are more likely to catch problems sooner with the typical | |
typed array/hash, than they are to catch C<where> clauses which by definition are checked | |
at runtime (maybe some day simple ones can be caught, but that's a long way off, and will | |
never be fully accurate). | |
=head2 Summary | |
Don't use slurpies in strongly typed Raku. Just. Don't. Do. It. | |
=end pod |
Thanks for this guide – even in draft form, it's very helpful.
One minor nit I noticed when reading over it: on line 85, you give the example of an add
sub. But then you seem to call it foo
on lines 95 and 97 (before switching back to add
on line 101). You also have a few other examples of the same function with different signatures right after that where the sub is name foo
(lines 112 & 118) – that's less of an issue since it's being redefined, but it still might be clearer to call it add
all the way through.
Other than that (minor!) issue, it seems like a great addition to the Raku learning materials.
$foo = { * + 1 };
throws an error.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Need to update this, but I need to discuss more specifically on classes, since they're a bit more complicated: (Linking to remind myself)
https://stackoverflow.com/questions/57554660/how-can-classes-be-made-parametric-in-perl-6