Skip to content

Instantly share code, notes, and snippets.

@lifthrasiir
Last active August 29, 2015 13:57
Show Gist options
  • Save lifthrasiir/9589206 to your computer and use it in GitHub Desktop.
Save lifthrasiir/9589206 to your computer and use it in GitHub Desktop.
read.rs design

What the hell is this

Read is intended to be a counterpart to the standard std::fmt library: it parses values out of the string instead of formatting the string out of values. It (mostly) shares the same formatting syntax as std::fmt.

Warning: This is a library design and yet to be implemented.

Read trait

Like Show which is a customizable trait for std::fmt, Read has the Read trait. Its main interface is as follows:

enum Flags {
    FlagSignPlus,
    FlagSignMinus,
    FlagAlternate,
    FlagSignAwareZeroPad,
}

enum Alignment {
    AlignLeft,
    AlignRight,
    AlignCenter,
    AlignUnknown,
}

struct Scanner {
    flags: uint, // packed
    fill: Option<char>, // None for every whitespace
    align: Alignment,
    width: Option<uint>,
    // more private fields exist
}

impl Scanner {
    fn skip_prepad<'a>(&mut self, buf: &'a str) -> &'a str { ... }
    fn skip_postpad<'a>(&mut self, buf: &'a str) -> &'a str { ... }
    fn request_more(&mut self, minchars: uint) { ... }
}

struct ScanBuffer<'a> {
    scanner: Scanner,
    priv buf: &'a mut std::io::Buffer,
    // more private fields exist
}

impl<'a> ScanBuffer<'a> {
    fn skip_prepad(&mut self) -> std::io::IoResult<()> { ... }
    fn skip_postpad(&mut self) -> std::io::IoResult<()> { ... }
    fn request_more(&mut self, minchars: uint) { ... }
}

impl<'a> std::io::Buffer for ScanBuffer<'a> {
    fn fill<'a>(&'a mut self) -> IoResult<&'a [u8]> { ... }
    fn consume(&mut self, amt: uint) { ... }
}

trait Read {
    fn scan<'a>(s: &mut Scanner, buf: &'a str) -> Result<Option<(T, &'a str)>, std::str::SendStr>;

    fn scan_buf(s: &mut ScanBuffer) -> std::io::IoResult<Option<Self>> { ... }
}

scan_buf is used to provide an optional Buffer-based interface, since with scan alone one should refill the buffer and restart the entire parsing.

The scan result has three states:

  • Ok(Some((value, next))) indicates the value has been parsed and the scanning should continue with next slice.
  • Ok(None) indicates the parsing has been paused and it will require more characters. It may optionally call s.request_more(n) to indicate that the caller should read at least n more characters before retrying. (s.request_more(0) is possible when the parsing can stop at the current position but cannot determine if there are more to read.)
  • Err(err) indicates the parsing has been aborted.

There are a number of Read-like traits for non-default formatting specifications:

  • Integer (i): A signed integer with optional radix prefix.
  • Signed (d): A decimal signed integer. Analogous to std::fmt::Signed.
  • Unsigned (u): A decimal unsigned integer. Does accept signs (but errors on underflows). Analogous to std::fmt::Unsigned.
  • Char (c): A single Unicode character. Analogous to std::fmt::Char.
  • Octal (o): An octal signed integer with optional radix prefix. Analogous to std::fmt::Octal.
  • Hex (x/X): A case-insensitive hexadecimal signed integer with optional radix prefix. Analogous to std::fmt::{Lower,Upper}Hex.
  • String (s): A whitespace-separated string, or the entire remaining string in the alternative mode.
  • Binary (t): A binary signed integer with optional radix prefix. Analogous to std::fmt::Binary.
  • Float (f): A decimal real number without an exponent, or one of inf, +inf, -inf or nan case-insensitively. Analogous to std::fmt::Float.
  • Exp (e/E): A decimal real number with an optional exponent, or one of inf, +inf, -inf or nan case-insensitively. Analogous to std::fmt::{Upper,Lower}Exp.

? type is reserved for the future extension with the reflection or auto-generated unserialization.

The scanning specification

<spec> ::= <piece>*
<piece> ::= <literal>+
          | <trim>
          | <scan>

<literal> ::= <<any non-whitespace character except for backslash>>
            | '\' <<any character>>
<trim> ::= <<any whitespace character>>+

<scan> ::= '{' <name>? (':' <spec>)? '}'
<name> ::= <<integer>> | <<identifier>> | '*'

<spec> ::= [[<fill>] <align>] [<sign>] ['#'] [<width>] <type>
<fill> ::= <<any character except for { or }>>
<align> ::= '<' | '>' | '^'
<sign> ::= '+' | '-'
<width> ::= <<integer>>
<type> ::= <<identifier>> | ''
  • Non-escaped sequence of whitespace matches zero or more whitespace characters including newline.
  • * in the name indicates the parsed but suppressed value. The specification with non-default type is required in that case.
  • Alignment characters controls the trimming. E.g. {:_^s} strips surrounding underscores from the result. For the non-string specifications, they try to trim before the parsing.
  • Alignment without a fill character strips any whitespaces. E.g. {:^s} strips surrounding whitespaces from the result.
  • Otherwise specified, the scanning specification does not strip the whitespace by default (unlike C scanf).
  • The sign + indicates the mandatory sign for the numbers, or the non-empty requirement for the strings.
  • The sign - is currently unused.
  • The alternative form # depends on the type. The built-in implementation only specially recognizes #s, and ignores other combinations.
  • The width is the maximum number of characters to parse, including padding characters.

The scanning interface

The scanning interface is analogous to the std::fmt interface, except for format_args! (which does not have a counterpart):

input!("{:i}").unwrap().x <-> print!("{:i}", x)
inputln!("{:i}").unwrap().x <-> println!("{:i}", x)
read!(reader, "{:i}").unwrap().x <-> write!(writer, "{:i}", x)
lex!(string, "{:i}").unwrap().x <-> let string = format!("{:i}", x);

The specification string can be followed with a set of modifiers (parsed as an identifier, e.g. lex!(s, "foo"i)):

  • i modifier indicates the case-insensitivity for literals. This does not affect the scanning specifications, particularly, radix prefixes. (0XABCD is an invalid number in Rust, anyway.)
  • p modifier indicates the partial parsing. TODO

The macro can have unnamed and named types (e.g. lex!(s, "{:i}", int) or lex!(s, "{x:i}", x: i32)) to force the type. Unlike std::fmt, they cannot be mixed due to the way to return the values. They are optional (and can be omitted partially or entirely) and inferred from the context if not supplied. There are some type ambiguities for the simplest cases though: println!("{}", lex!(s, "{x:i}").unwrap().x) cannot determine if x is int or other integral type.

The macros return one of these types depending on the types:

  • Result<(), std::str::SendStr> for no non-suppressed values;
  • Result<T, std::str::SendStr> for one unnamed value of type T;
  • Result<(T1, ..., Tn), std::str::SendStr> for unnamed values of type T1, ..., Tn; and
  • Result<S, std::str::SendStr> for named values of type x1: T1, ..., xn: Tn, where S is an invisible struct defined as struct S { x1: T1, ..., xn: Tn }.

Note that the inference is done with the type parameters, so the actual type wouldn't look like these. E.g. the actual struct defined in the last case is struct S<T1,...,Tn> { x1: T1, ..., xn: Tn } with the explicit type parameter at the scanning process, like scan::<~str>().

TODO: There is a concern about the borrowed slice for lex! (which is a lot faster than copying the value around).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment