Read
is intended to be a counterpart to the standard std::fmt
library:
it parses values out of the string instead of formatting the string out of values.
It (mostly) shares the same formatting syntax as std::fmt
.
Warning: This is a library design and yet to be implemented.
Like Show
which is a customizable trait for std::fmt
,
Read has the Read
trait.
Its main interface is as follows:
enum Flags {
FlagSignPlus,
FlagSignMinus,
FlagAlternate,
FlagSignAwareZeroPad,
}
enum Alignment {
AlignLeft,
AlignRight,
AlignCenter,
AlignUnknown,
}
struct Scanner {
flags: uint, // packed
fill: Option<char>, // None for every whitespace
align: Alignment,
width: Option<uint>,
// more private fields exist
}
impl Scanner {
fn skip_prepad<'a>(&mut self, buf: &'a str) -> &'a str { ... }
fn skip_postpad<'a>(&mut self, buf: &'a str) -> &'a str { ... }
fn request_more(&mut self, minchars: uint) { ... }
}
struct ScanBuffer<'a> {
scanner: Scanner,
priv buf: &'a mut std::io::Buffer,
// more private fields exist
}
impl<'a> ScanBuffer<'a> {
fn skip_prepad(&mut self) -> std::io::IoResult<()> { ... }
fn skip_postpad(&mut self) -> std::io::IoResult<()> { ... }
fn request_more(&mut self, minchars: uint) { ... }
}
impl<'a> std::io::Buffer for ScanBuffer<'a> {
fn fill<'a>(&'a mut self) -> IoResult<&'a [u8]> { ... }
fn consume(&mut self, amt: uint) { ... }
}
trait Read {
fn scan<'a>(s: &mut Scanner, buf: &'a str) -> Result<Option<(T, &'a str)>, std::str::SendStr>;
fn scan_buf(s: &mut ScanBuffer) -> std::io::IoResult<Option<Self>> { ... }
}
scan_buf
is used to provide an optional Buffer
-based interface, since with scan
alone one should refill the buffer and restart the entire parsing.
The scan result has three states:
Ok(Some((value, next)))
indicates thevalue
has been parsed and the scanning should continue withnext
slice.Ok(None)
indicates the parsing has been paused and it will require more characters. It may optionally calls.request_more(n)
to indicate that the caller should read at leastn
more characters before retrying. (s.request_more(0)
is possible when the parsing can stop at the current position but cannot determine if there are more to read.)Err(err)
indicates the parsing has been aborted.
There are a number of Read
-like traits for non-default formatting specifications:
Integer
(i
): A signed integer with optional radix prefix.Signed
(d
): A decimal signed integer. Analogous tostd::fmt::Signed
.Unsigned
(u
): A decimal unsigned integer. Does accept signs (but errors on underflows). Analogous tostd::fmt::Unsigned
.Char
(c
): A single Unicode character. Analogous tostd::fmt::Char
.Octal
(o
): An octal signed integer with optional radix prefix. Analogous tostd::fmt::Octal
.Hex
(x
/X
): A case-insensitive hexadecimal signed integer with optional radix prefix. Analogous tostd::fmt::{Lower,Upper}Hex
.String
(s
): A whitespace-separated string, or the entire remaining string in the alternative mode.Binary
(t
): A binary signed integer with optional radix prefix. Analogous tostd::fmt::Binary
.Float
(f
): A decimal real number without an exponent, or one ofinf
,+inf
,-inf
ornan
case-insensitively. Analogous tostd::fmt::Float
.Exp
(e
/E
): A decimal real number with an optional exponent, or one ofinf
,+inf
,-inf
ornan
case-insensitively. Analogous tostd::fmt::{Upper,Lower}Exp
.
?
type is reserved for the future extension with the reflection or auto-generated unserialization.
<spec> ::= <piece>*
<piece> ::= <literal>+
| <trim>
| <scan>
<literal> ::= <<any non-whitespace character except for backslash>>
| '\' <<any character>>
<trim> ::= <<any whitespace character>>+
<scan> ::= '{' <name>? (':' <spec>)? '}'
<name> ::= <<integer>> | <<identifier>> | '*'
<spec> ::= [[<fill>] <align>] [<sign>] ['#'] [<width>] <type>
<fill> ::= <<any character except for { or }>>
<align> ::= '<' | '>' | '^'
<sign> ::= '+' | '-'
<width> ::= <<integer>>
<type> ::= <<identifier>> | ''
- Non-escaped sequence of whitespace matches zero or more whitespace characters including newline.
*
in the name indicates the parsed but suppressed value. The specification with non-default type is required in that case.- Alignment characters controls the trimming. E.g.
{:_^s}
strips surrounding underscores from the result. For the non-string specifications, they try to trim before the parsing. - Alignment without a fill character strips any whitespaces. E.g.
{:^s}
strips surrounding whitespaces from the result. - Otherwise specified, the scanning specification does not strip the whitespace by default (unlike C
scanf
). - The sign
+
indicates the mandatory sign for the numbers, or the non-empty requirement for the strings. - The sign
-
is currently unused. - The alternative form
#
depends on the type. The built-in implementation only specially recognizes#s
, and ignores other combinations. - The width is the maximum number of characters to parse, including padding characters.
The scanning interface is analogous to the std::fmt
interface,
except for format_args!
(which does not have a counterpart):
input!("{:i}").unwrap().x <-> print!("{:i}", x)
inputln!("{:i}").unwrap().x <-> println!("{:i}", x)
read!(reader, "{:i}").unwrap().x <-> write!(writer, "{:i}", x)
lex!(string, "{:i}").unwrap().x <-> let string = format!("{:i}", x);
The specification string can be followed with a set of modifiers (parsed as an identifier, e.g. lex!(s, "foo"i)
):
i
modifier indicates the case-insensitivity for literals. This does not affect the scanning specifications, particularly, radix prefixes. (0XABCD
is an invalid number in Rust, anyway.)p
modifier indicates the partial parsing. TODO
The macro can have unnamed and named types (e.g. lex!(s, "{:i}", int)
or lex!(s, "{x:i}", x: i32)
) to force the type.
Unlike std::fmt
, they cannot be mixed due to the way to return the values.
They are optional (and can be omitted partially or entirely)
and inferred from the context if not supplied.
There are some type ambiguities for the simplest cases though:
println!("{}", lex!(s, "{x:i}").unwrap().x)
cannot determine if x
is int
or other integral type.
The macros return one of these types depending on the types:
Result<(), std::str::SendStr>
for no non-suppressed values;Result<T, std::str::SendStr>
for one unnamed value of typeT
;Result<(T1, ..., Tn), std::str::SendStr>
for unnamed values of typeT1
, ...,Tn
; andResult<S, std::str::SendStr>
for named values of typex1: T1
, ...,xn: Tn
, whereS
is an invisible struct defined asstruct S { x1: T1, ..., xn: Tn }
.
Note that the inference is done with the type parameters, so the actual type wouldn't look like these.
E.g. the actual struct defined in the last case is struct S<T1,...,Tn> { x1: T1, ..., xn: Tn }
with the explicit type parameter at the scanning process, like scan::<~str>()
.
TODO: There is a concern about the borrowed slice for lex!
(which is a lot faster than copying the value around).