masak · March 21, 2009 13:10
diff --git a/things I know about Buf b/things I know about Buf
 $ cat things-I-know-about-Buf 
 A C<Buf> is a stringish view of an array of
 integers, and has no Unicode or character properties without explicit
 conversion to some kind of C<Str>.  (A C<buf> is the native counterpart.)
 Typically it's an array of bytes serving as a buffer.  Bitwise
 operations on a C<Buf> treat the entire buffer as a single large
 integer.  Bitwise operations on a C<Str> generally fail unless the
 C<Str> in question can provide an abstract C<Buf> interface somehow.
 Coercion to C<Buf> should generally invalidate the C<Str> interface.
 As a generic type C<Buf> may be instantiated as (or bound to) any
 of C<buf8>, C<buf16>, or C<buf32> (or to any type that provides the
 appropriate C<Buf> interface), but when used to create a buffer C<Buf>
 defaults to C<buf8>.

 Unlike C<Str> types, C<Buf> types prefer to deal with integer string
 positions, and map these directly to the underlying compact array
 as indices.  That is, these are not necessarily byte positions--an
 integer position just counts over the number of underlying positions,
 where one position means one cell of the underlying integer type.
 Builtin string operations on C<Buf> types return integers and expect
 integers when dealing with positions.  As a limiting case, C<buf8> is
 just an old-school byte string, and the positions are byte positions.
 Note, though, that if you remap a section of C<buf32> memory to be
 C<buf8>, you'll have to multiply all your positions by 4.

 Bitwise string operators (those starting with C<~>) may only be
 applied to C<Buf> types or similar compact integer arrays, and treat
 the entire chunk of memory as a single huge integer.  They differ from
 the C<+> operators in that the C<+> operators would try to convert
 the string to a number first on the assumption that the string was an
 ASCII representation of a number.

    Actual type                 Use entries for
    ===========                 ===============
    Buf                         Str or Array of Int

 A C<Buf> type containing any bytes or integers outside the ASCII
 range may silently promote to a C<Str> type for pattern matching if
 and only if its relationship to Unicode is clearly declared or typed.
 This type information might come from an input filehandle, or the
 C<Buf> role may be a parametric type that allows you to instantiate
 buffers with various known encodings.  In the absence of such typing
 information, you may still do pattern matching against the buffer, but
 (apart from assuming the lowest 7 bits represent ASCII) any attempt
 to treat the buffer as other than a sequence integers is erroneous,
 and warnings may be generously issued.

    $_      X      Type of Match Wanted   What to use on the right
    ======  ===    ====================   ========================
    Buf     Int    buffer contains int    .match(X)

 C<Buf> types are based on fixed-width cells and can therefore
 handle integer positions just fine, and treat them as array indices.
 In particular, C<buf8> (also known as C<buf>) is just an old-school byte string.
 Matches against C<Buf> types are restricted to ASCII semantics in
 the absence of an I<explicit> modifier asking for the array's values
 to be treated as some particular encoding such as UTF-32.  (This is
 also true for those compact arrays that are considered isomorphic to
 C<Buf> types.)  Positions within C<Buf> types are always integers,
 counting one per unit cell of the underlying array.  Be aware that
 "from" and "to" positions are reported as being between elements.
 If matching against a compact array C<@foo>, a final position of 42
 indicates that C<@foo[42]> was the first element I<not> included.

 =item open

    multi open (Str $name,
    Bool :$rw = False,
    Bool :$bin = False,
    Str  :$enc = "Unicode",
    Any  :$nl = "\n",
    Bool :$chomp = True,
    ...
    --> IO
    ) is export

 A convenience method/function that hides most of the OO complexity.
 It will only open normal files.  Text is the default.  Note that
 the "Unicode" encoding implies figuring out which actual UTF is
 in use, either from a BOM or other heuristics.  If heuristics are
 inconclusive, UTF-8 will be assumed.  (No 8-bit encoding will ever
 be picked implicitly.)  A file opened with C<:bin> may still be
 processed line-by-line, but IO will be in terms of C<Buf> rather
 than C<Str> types.

 =item slurp

    method slurp ($handle:
    Bool :$bin = False,
    Str  :$enc = "Unicode",
    --> Str|Buf
    ) is export
    multi slurp (Str $filename
    Bool :$bin = False,
    Str  :$enc = "Unicode",
    --> Str|Buf
    )

 Slurps the entire file into a C<Str> (or C<Buf> if C<:bin>) regardless of context.
 (See also C<lines>.)
	$ cat things-I-know-about-Buf
	A C<Buf> is a stringish view of an array of
	integers, and has no Unicode or character properties without explicit
	conversion to some kind of C<Str>. (A C<buf> is the native counterpart.)
	Typically it's an array of bytes serving as a buffer. Bitwise
	operations on a C<Buf> treat the entire buffer as a single large
	integer. Bitwise operations on a C<Str> generally fail unless the
	C<Str> in question can provide an abstract C<Buf> interface somehow.
	Coercion to C<Buf> should generally invalidate the C<Str> interface.
	As a generic type C<Buf> may be instantiated as (or bound to) any
	of C<buf8>, C<buf16>, or C<buf32> (or to any type that provides the
	appropriate C<Buf> interface), but when used to create a buffer C<Buf>
	defaults to C<buf8>.

	Unlike C<Str> types, C<Buf> types prefer to deal with integer string
	positions, and map these directly to the underlying compact array
	as indices. That is, these are not necessarily byte positions--an
	integer position just counts over the number of underlying positions,
	where one position means one cell of the underlying integer type.
	Builtin string operations on C<Buf> types return integers and expect
	integers when dealing with positions. As a limiting case, C<buf8> is
	just an old-school byte string, and the positions are byte positions.
	Note, though, that if you remap a section of C<buf32> memory to be
	C<buf8>, you'll have to multiply all your positions by 4.

	Bitwise string operators (those starting with C<~>) may only be
	applied to C<Buf> types or similar compact integer arrays, and treat
	the entire chunk of memory as a single huge integer. They differ from
	the C<+> operators in that the C<+> operators would try to convert
	the string to a number first on the assumption that the string was an
	ASCII representation of a number.

	Actual type Use entries for
	=========== ===============
	Buf Str or Array of Int

	A C<Buf> type containing any bytes or integers outside the ASCII
	range may silently promote to a C<Str> type for pattern matching if
	and only if its relationship to Unicode is clearly declared or typed.
	This type information might come from an input filehandle, or the
	C<Buf> role may be a parametric type that allows you to instantiate
	buffers with various known encodings. In the absence of such typing
	information, you may still do pattern matching against the buffer, but
	(apart from assuming the lowest 7 bits represent ASCII) any attempt
	to treat the buffer as other than a sequence integers is erroneous,
	and warnings may be generously issued.

	$_ X Type of Match Wanted What to use on the right
	====== === ==================== ========================
	Buf Int buffer contains int .match(X)

	C<Buf> types are based on fixed-width cells and can therefore
	handle integer positions just fine, and treat them as array indices.
	In particular, C<buf8> (also known as C<buf>) is just an old-school byte string.
	Matches against C<Buf> types are restricted to ASCII semantics in
	the absence of an I<explicit> modifier asking for the array's values
	to be treated as some particular encoding such as UTF-32. (This is
	also true for those compact arrays that are considered isomorphic to
	C<Buf> types.) Positions within C<Buf> types are always integers,
	counting one per unit cell of the underlying array. Be aware that
	"from" and "to" positions are reported as being between elements.
	If matching against a compact array C<@foo>, a final position of 42
	indicates that C<@foo[42]> was the first element I<not> included.

	=item open

	multi open (Str $name,
	Bool :$rw = False,
	Bool :$bin = False,
	Str :$enc = "Unicode",
	Any :$nl = "\n",
	Bool :$chomp = True,
	...
	--> IO
	) is export

	A convenience method/function that hides most of the OO complexity.
	It will only open normal files. Text is the default. Note that
	the "Unicode" encoding implies figuring out which actual UTF is
	in use, either from a BOM or other heuristics. If heuristics are
	inconclusive, UTF-8 will be assumed. (No 8-bit encoding will ever
	be picked implicitly.) A file opened with C<:bin> may still be
	processed line-by-line, but IO will be in terms of C<Buf> rather
	than C<Str> types.

	=item slurp

	method slurp ($handle:
	Bool :$bin = False,
	Str :$enc = "Unicode",
	--> Str\|Buf
	) is export
	multi slurp (Str $filename
	Bool :$bin = False,
	Str :$enc = "Unicode",
	--> Str\|Buf
	)

	Slurps the entire file into a C<Str> (or C<Buf> if C<:bin>) regardless of context.
	(See also C<lines>.)