utaal/Goals.md

Last active April 13, 2021 19:24

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/utaal/13bb6acf492dd9e7e6c4839019a93377.js"></script>
Save utaal/13bb6acf492dd9e7e6c4839019a93377 to your computer and use it in GitHub Desktop.

iron language design

Raw

This fork of Rust is for developing an experimental verification framework for Rust-like code. The main goal is to verify full functional correctness of low-level systems code, building on ideas from existing verification frameworks like Dafny, Boogie, F*, VCC, Prusti, Cogent, Coq, and Isabelle/HOL.

Goals:

provide a pure mathematical language for expressing specifications (like Dafny, F*, Coq, Isabelle/HOL)
provide a mathematical language for expressing proofs (like Dafny, F*, Coq, Isabelle/HOL) based exclusively on classical logic (like Dafny)
provide a low-level, imperative language for expressing executable code (like VCC), based on Rust (like Prusti)
generate small, simple verification conditions that an SMT solver like Z3 can solve efficiently, based on the following principles:
- keep the mathematical specification language close to the SMT solver's mathematical language (like Boogie)
- use lightweight linear type checking, rather than SMT solving, to reason about memory and aliasing (like Cogent and linear Dafny)

We do not intend to ...

... support all Rust features and libraries (instead, we will focus a subset that is easy to verify)
... verify unsafe Rust code (instead, verification will rely on Rust's type safety, as ensured by Rust's type checker)
... verify the verifier itself
... verify the Rust/LLVM compilers

Raw

Home.md

Design and project goals: [[Goals]]

Language Design

Google Doc

Ideas with Macros

Idiomatic Syntax

Distinct Mathematical Universe

State Machine (Refinement)

Early Mockup

Mypyvy-inspired and Rust-inspired Mockup

Raw

Language-Design,-Macros.md

Code samples with Macros

Prusti-isms

use prusti_contracts::*;

#[pure]
fn divides(v: u64, u: u64) -> bool {
    exists!(|k: u64| k * u == v)
}

#[ensures(exists(|k: u64| k * result == a))]
#[ensures(exists(|k: u64| k * result == b))]
#[ensures(forall(|d: u64| exists(|ka: u64, kb: u64| ka * d == a && kb * d == b) ==> d <= result))]
fn gcd(a: u64, b: u64) -> u64 {
    unimplemented!() 
}

fn main() {
    let mut a = 0; let mut i = 3;
    while i > 0 {
        body_invariant!(i > 0 && a == 3 - i);
        a += 1; i -= 1;
    }
    assert!(i == 0);
}

Ideas by Andrea

spec!(
    fn divides(v: nat, u: nat) -> bool {
        exists(|k: nat| k * u == v)
    }
)
#[spec(
    ensures(exists(|k: nat| k * return == a)),
    ensures(exists(|k: nat| k * return == b)),
    ensures(forall(|d: nat| (divides(a, d) && divides(b, d)) ==> d <= return)),
)]
fn gcd(a: u64, b: u64) -> u64 {
    unimplemented!()
}
fn main() {
    let mut a = 0; let mut i = 3;
    #[spec(
        invariant(i > 0 && a == 3 - i),
        decreases(3 - i),
    )]
    while i > 0 {
        a += 1; i -= 1;
    }
    assert!(i == 0);
}

Ideas by @jaybosamiya

This version stays within Rust's syntax by placing things into macros (in particular the v! macro: an example definition of it is given below; in practice, one would want to use a proc-macro rather than a macro_rules-based macro though, for extensibility and greater control over the syntax). The version below can be compiled via rustc and all the annotations and such will immediately be removed (so essentially we get extraction "for free").

Using `macro_rules! v {...}` (Click to expand)

macro_rules! v {
    (state) => { () };
    (assert $($a:tt)*) => {};
    (assume $($a:tt)*) => {};
    (invariant ($($i:tt)*)
     decreases ($($d:tt)*)) => {};
    ($ty:ty
     $(| requires ($($req:tt)*))?
     | ensures (|$res:tt| {$($ens:tt)*})
     $(| decreases ($($dec:tt)*))?
    ) => {
        $ty
    };
}

fn foo(x: u64) -> v!(
    u64
    | requires (x < u64::MAX)
    | ensures (|r| {r == x + 1 && r <= u64::MAX})
) {
    x + 1
}

fn bar(
    l: &mut Vec<u64>,
    n: usize,
    v: u64,
) -> v!(
    ()
    | ensures (|()| {
        forall i. if 0 <= i && i < l.len() {
            if i <= n {
                l[i] == v
            } else {
                l[i] == old(l)[i]
            }
        }
    })
    | decreases (n)
) {
    let _st1 = v!(state);
    if n < l.len() {
        l[n] = v;
    }
    v!(assert {
        forall i. if n <= i && i < l.len() {
            if i <= n {
                l[i] == v // Current version of `l`
            } else {
                l@_st1[i] == old(l)[i] // Can explicitly refer to a specific `l`
            }
        }
    });
    let _st2 = v!(state);
    if n > 0 {
        bar(l, n - 1, v);
    }
    v!(assume false); // Eg: I am too lazy to prove this :P
}

fn baz() {
    let mut a = 0; 
    let mut i = 3;

    while i > 0  {
        v!(invariant (i > 0 && a == 3 - i) decreases (3 - i));
        a += 1;
        i -= 1;
    }
    v!(assert i == 0); // verification/static assert
    assert!(i == 0); // runtime assert (already exists in Rust)

    println!("{}", a);
}

The above syntax has some warts due to implementing it via macro_rules rather than a proc-macro which would allow arbitrary token streams within the macro. Nonetheless, it is a reasonable approximation of what might be doable with macros.

Sidenote: The loop invariant within the body of the while loop is actually a feature, not a bug. For v1, we might want to restrict it to only be at the start of the body, but as a v2 idea, we might want to allow it to float anywhere in the body (and use weakest-preconditions to figure out the actual invariant at the head of the loop). Most people would want to use loop invariants only at the head, but in certain (admittedly rare) situations, a loop invariant in the middle of the loop is actually a more "natural" place to put it. By leaving it inside the loop, we allow for this sort of "floating" invariant in the future.

Another sidenote: Looks like these macros already confuse GitHub's syntax highlighting 🙃

Raw

Language-Design,-Mathematical-Universe.md

Travis:

I'm less focused on syntax here, and more focused on emphasizing a distinct mathematical universe and how the interactions with code can be made seamless.

In particular, I'd like to put forward the case that we could declare a world of mathematical types distinct from any native rust type and map the rust types onto that world. While this would mean using types that are less familiar to rust-experts (e.g., seq in the example) it would make it clear that the mathematical world is its own thing, not beholden to any quirks of a given implementation type (Vec, array, slice, etc.) and give us more conceptual freedom to build the mathematical types from the ground up to be convenient for proofs.

/*
 * In this example I use `seq<T>` to refer to a mathematical
 * sequence type. It's assumed to have operations
 * length: s.len()
 * index: s[i]
 * update: s[i := blah]
 * concat: s ++ t
 * slice: s[i..j]
 */


// seq<T> is only allowed in proof code
// so this is a compile error:
fn compiled_seq<T>(s: seq<T>) -> seq<T> {
  s
}

// This is fine:
ghost fn ghost_seq<T>(s: seq<T>) -> seq<T> {
}

fn do_stuff_with_seqs<T>() -> {
  // Valid: Vec<T> type is the same as seq<T>
  // from the verifier's point-of-view
  let vec = Vec::new();
  ghost let s = ghost_seq(vec); 

  // Valid: an array is a seq
  let ar = [0; 3];
  ghost let s = ghost_seq(ar); 

  // slice is a seq
  let slice = &vec[1..2];
  ghost let s = ghost_seq(slice); 

  // this is valid, it's just comparing two seqs
  assert slice == ar;

  // iter is a sequence
  let iter = vec.filter_map(|x| x); 
  ghost let s = ghost_seq(iter); 

  // iter.collect() gives a Vec
  // (which is, again, just a sequence)
  // iterator's 'collect' is just the identity
  assert iter == iter.collect();
}

fn mutate_vec(v: Vec<u8>)
requires v.len() == 2
{
  // Make a 'ghost' copy
  ghost let v_original = v;

  v[0] = 5;

  assert v == original_v[0 := 5];

  v.push(19);

  assert v == original_v[0 := 5] ++ [19];
}

// Let's define our own list type

// By default, an enum would have an interpretation
// as an algebraic datatype.

enum ListImpl<T> {
  EmptyList,
  Cons(head: T, tail: Box<List<T>>),
}

// Let's define a List which uses a private
// ListImpl<T> field but exposes a seq<T>
// interface.

#[implements(seq<T>)]
struct List<T> {
  list: ListImpl,
}

impl List<T> {
  // List<T> was marked as implementing seq<T>, so it has to
  // support the following:

  ghost fn WF(&self) -> bool { true }

  ghost fn I(&self) -> seq<T>
  requires WF()
  {
    match self {
      List::EmptyList => []
      List::Cons(head, tail) => [head] ++ tail.I()
    }
  }

  // Contracts have to be written in terms of self.I()

  pub fn push_to_front(&mut self, new_head: T)
  ensures self.I() == [new_head] ++ old(self.I())
  {
    self.list = ListImpl::Cons(new_head, self.list);
  }

  // List implements .len() which is a feature of seq
  // so we require that it have the same behavior.

  // We add an additional 'requires' clause, though, which must
  // be met if a client wants to call 'len()' in non-ghost code.

  pub fn len(&self) -> (result: u64)
  requires self.I().len() < 0x10000000000000000
  ensures result == self.I().len()
  {
    match self.list {
      ListImpl::EmptyList => 0
      ListImpl::Cons(head, tail) => 1 + tail.len()
    }
  }
}

fn do_stuff_with_list<T>(l: List<T>) {
  // As with vecs, arrays, etc., l is a seq<T>
  ghost let s = ghost_seq(l);

  // Calling 'len' will demand the precondition, l.len() < 2^64
  let the_len = l.len();

  // In this line, 'l' is the mathematical seq<T> so no precondition is needed
  ghost let the_len_ghost = l.len();

  // This line fails, as List didn't define any slice functionality
  let the_slice = &l[0..1];

  // This is fine, 'l' is the mathematical seq<T> here
  ghost let the_slice_ghost = l[0..1];
}

Raw

Language-Design,-State-Machine-Mockup.md

Defining a state machine:

module LogSpec {
  type Element
  type Log = seq<Element>

  datatype Index = Index(idx: int)

  state machine Constants() Variables(log: Log)

  init(c, v)
  {
    v == Variables([])
  }

  step Query(c, v, v', idx: Index, result: Element)
  {
    && 0 <= idx.idx < |s.log|
    && result == v.log[idx.idx]
    && v' == v
  }

  step Append(c, v, v', element: Element)
  {
    && v'.log == v.log + [element]
  }

  step Stutter()
  {
    && v' == v
  }
}

Combined state machine:

include "DiskLog.dfy"
include "Disk.dfy"

module DiskLogSystem {
  import DiskLog
  import Disk

  state machine
    Constants(disk: Disk.Constants, machine: DiskLog.Constants)
    Variables(disk: Disk.Variables<DiskLog.LBAType.LBA, DiskLog.Sector>, machine: DiskLog.Variables)

  init(c, v)
  {
    && DiskLog.Mkfs(c.machine, c.disk, v.disk)
    && DiskLog.Init(c.machine, v.machine)
  }

  step Machine(c, v, v', something: Something, diskOp: DiskLog.DiskOp)
  {
    && DiskLog.Next(c.machine, v.machine, v'.machine, diskOp)
    && Disk.Next(c.disk, v.disk, v'.disk, diskOp)
  }

  step Crash(c, v, v', something: Something)
  {
    && DiskLog.Init(c.machine, v'.machine)
    && v'.disk == v.disk
  }
}

Predicates (e.g. invariants):

module DiskLogSystemInv {
  import state machine DiskLogSystem

  // ...

  predicate MemoryMatchesDisk(c, v)
  requires DiskIsValidLog(c.disk, v.disk)
  {
    forall idx: Index :: 0 <= idx.idx < |v.machine.log| && (
      || idx.idx < v.machine.stagedLength
      || IndexValidForDiskLog(c.disk, v.disk, idx)) ==> (
        && IsALogSectorBlock(v.disc.blocks, idx)
        && v.machine.log[idx.idx] == ElementFromDiskLog(c.disk, v.disk, idx)
    )
  }

  predicate Inv(c, v)
  {
    && DiskIsValidLog(c.disk, v.disk)
    && MachinePersistentWhenReadyMatchesDiskSuperblock(k, s)
    && StagedTrailsLogLength(k, s)
    && MemoryMatchesDisk(k, s)
    && LogIsZeroLengthWhenUnready(k, s)
  }

  lemma InitImpliesInv(c, v)
    on init(c, v)
    ensures Inv(c, v)
  {
  }

  lemma NextPreservesInv(c, v, v')
    requires Inv(c, s)
    on step(c, v, v')
    ensures Inv(c, v')
  {
  }

In refinement proofs:

how to express interpretation functions?
how to express the refinement lemma? original example

Bryan questions:

It seems like many of the variables above don't have types. Is the expectation that they will be inferred?
- [Andrea:] we were taking a page out of Rust's book here, where self is required syntactically but doesn't name a type; the types here are implicitly the declared types Constants and Variables (thus, they don't need to be inferred, they're just implicit)
We had some preliminary discussion about this, but just to document it: I like Ivy's more imperative-style steps, since they make it (a) look more natural, and (b) makes it less likely that you accidentally leave all but one of the fields in the new state underspecified.

Nested updates syntax alternatives

Ivy inspired imperative update with fallback to two vocabulary formula:

r(a) := true

r := * suchthat forall X. r(X) <-> (old r(X) | X = a)

r :| r.x == old(r.x) && r.y == old(r.y) + 1

r := r' :| r'.x == r.x && r'.y == r.y + 1

Alternatives for field and nested field updates:

r.x.y := 7 ;
r.x := {y: 8, ..};

r[x.y := 7, z[a]:= 8, z[b] := 6]

r.x.y := 7;

r.z[a] := 8;
r.z[b] := 6;

z[a := 8][b := 6]

r' = r[
  x := @[
    y := 7
  ]
]

r[x := 1][x:= 2]

r.x := 7;
r.y := 8;

r.x := 7;
r.y := r.x + 1;

ML syntax:

{ r with x = { r.x with y = 5 }, z = { r.z with a = 9 }}

Rust syntax:

struct Inner {
    a: nat,
    b: nat,
}

struct State {
    x: Inner,
    y: nat,
    z: nat,
}

fn transition(before: State) -> State {
    State {
        y: before.y + 1,
        ..before
    }
}

fn transition(before: State) -> State {
    let x = before.x;
    State {
        x: Inner {
            a: x.a - 1,
            ..x
        },
        y: before.y + 1,
        ..before
    }
}

Raw

Language-Design,-State-Machine-Mypyvy-Rust.md

Defining a state machine:

type Element
type Log = seq<Element>

struct LogSpec {
  log: Log,
}

statemachine LogSpec {
  init predicate() // acts as a function `LogSpec -> bool`
  {
    log == []
  }

  step twostate query(idx: nat, result: Element) // acts as a function `(LogSpec, LogSpec) -> bool`
  // no "affects" = no state changes
  {
    && 0 <= idx < |log|
    && result == log[idx]
  }

  step twostate append(element: Element)
  affects log
  {
    && log' == log + [element]
  }

  step twostate stutter()
  {
  }
}

Crash-safe log spec:

use LogSpec;

struct CrashSafeLog {
  persistent: LogSpec,
  ephemeral: LogSpec, // two "copies" of the same state machine (state)
}

statemachine CrashSafeLog {
  init predicate()
  {
    && persistent.init()
    && ephemeral == persistent
  }

  step twostate ephemeral_move()
  affects ephemeral
  {
    // persistent' == persistent (implied)
    && next(ephemeral)
  }

  twostate partial_sync()
  affects ephemeral, persistent
  {
    && persistent' == ephemeral
  }

  step twostate sync()
  affects persistent
  {
    && partial_sync // && persistent' == ephemeral
    // ephemeral' == ephemeral (implied)
  }

  step twostate crash()
  affects ephemeral
  {
    && ephemeral' == persistent
  }
}

use log_spec::Log;

struct DiskLog {
  log: Log,
  cached_superblock: CachedSuperblock,
  staged_length: nat,
}

statemachine DiskLog {
  init predicate() {
    && log == []
    && cached_superblock == CachedSuperblock::Unready
    && staged_length == 0
  }

  predicate supersedes_disk()
  {
    && cached_superblock == CachedSuperblock::Ready
    && cached_superblock.superblock.length <= |log|
  }

  // ?? disk_op is repeated for each step
  step twostate query(disk_op: DiskOp, idx: nat, result: Element)
  // affects nothing
  {
    && 0 <= idx < |log|
    && result == log[idx]
    && disk_op == DiskOp::NoDiskOp
  }

  step twostate fetch_element(disk_op: DiskOp, idx: Index, element: Element)
  affects log, staged_length // but not cached_superblock
  {
    && cached_superblock == CachedSuperblock::Ready
    && idx < cached_superblock.superblock.length
    && |log| == idx
    && disk_op == DiskOp::ReadOp(LBAType::index_to_lBA(idx), Log_sector(element))
    && log' == log + [element]
    // && cached_superblock' == cached_superblock (implied)
    && staged_length' == |log'|
  }
  
  step twostate stutter(disk_op: DiskOp)
  {
    && disk_op == DiskOp::NoDiskOp
  }

  // ?? does the implicit next need disk_op?
}

Combined state machine:

use DiskLog;
use Disk;

struct DiskLogSystem {
  disk: Disk<DiskLog.LBAType.LBA, DiskLog.Sector>,
  machine: DiskLog,
}

statemachine DiskLogSystem {
  init predicate()
  {
    && machine.mkfs(disk)
    && machine.init()
  }

  step twostate machine()
  affects disk, machine
  {
    && next(disk) // invoke steps of combined state machines
    && next(machine)
  }

  step twostate crash()
  affects machine
  {
    && machine'.init()
  }
}

Alternative using :=:

  step sync
  {
    && x > 0  // precondition
    && x := x - 1 // update
    && y := * // havoc, means we're modifying y
    && y' > x' // two-vocabulary assumptions about modified symbols
    && a.b := 5 // nested update, change only a.b, but leave other fields of a intact
    && a.arr[5] := 6 // nested array update
    && z' = z + 1 // syntax error, only way to make z' available is to use z := ...
  }
  // questions: how do these compose? what about if-then-else, disjunctions, aliasing, etc

Predicates (e.g. invariants):

mod disk_log_system_inv {
  with statemachine DiskLogSystem {

    // ...

    predicate memory_matches_disk()
    requires disk.disk_is_valid_log
    {
      forall idx: Index :: 0 <= idx.idx < |machine.log| && (
        || idx.idx < machine.stagedLength
        || disk.index_valid_for_disk_log(idx)) ==> (
          && disk.is_a_log_sector_block(idx)
          && machine.log[idx.idx] == disk.element_from_disk_log(idx)
      )
    }

    inv predicate()
    {
      && disk.disk_is_valid_log()
      && machine_persistent_when_ready_matches_disk_superblock()
      && staged_trails_log_length()
      && memory_matches_disk()
      && log_is_zero_length_when_unready()
      // ?? what if I need to refer to "this" (DiskLogSystem), e.g. to pass it as a parameter?
    }

    inv on init()
    {
      // prove that init produces a state that satisfies the invariant
    }

    inv on next
    {
      // prove that the `crash` step preserves the invariant
    }

Refinement:

use CrashSafeLog;
use DiskLogSystem;
use LogSpec;

mod disk_log_system_refines_crash_safe_log {
  refine DiskLogSystem from CrashSafeLog {

    // ...

    pure fn abstract_disk(disk: DiskLog) -> LogSpec {
      let log = if disk.is_valid_log {
        extract_disk_prefix(disk, disk.blocks[DiskLog::SuperblockLBA].superblock.length)
      } else {
        []
      };

      LogSpec { log }
    }

    pure fn abstract_ephemeral(sys: DiskLogSystem) -> LogSpec {
      if sys.machine.supersedes_disk {
        LogSpec { log: sys.machine.log }
      } else {
        interpret_disk(sys.disk)
      }
    }

    abstract // function DiskLogSystem -> CrashSafeLog (interpretation function)
    {
      CrashSafeLog {
        persistent: abstract_disk(disk),
        persistent: abstract_ephemeral(this), // ??: how to refer to "this" (DiskLogSystem)?
      }
    }

    on next { // prove the refinement
      match next {
        case step machine() => {
          match machine.next {
            case step query(disk_op: DiskOp, idt: nat, result: Element) => {
              // use the interpretation function to get a CrashSafeLog, and invoke the query step on the `ephemeral` state machine
              assert abstract().ephemeral.query(); 
              // use the interpretation function to get a CrashSafeLog, and invoke the `ephemeral_move` step
              assert abstract().ephemeral_move(); 
            }
            case step flush(disk_op: DiskOp) => {
              assert abstract.sync();
            }
            case step stutter(disk_op: DiskOp) => {
              assert abstract.ephemeral.stutter();
              assert abstract.ephemeral_move();
            }
          }
        }
        case step crash => {
          // ...
        }
      }
    }

  }
}

Raw

Language-Design,-Syntax.md

Andrea: I wrote up a few examples of what rust extended with verification syntax may look like; the idea here was to not worry at all about what’s easier/harder to implement atop rust, but just what I felt would feel idiomatic for rustaceans (reasonable people may strongly disagree).

// --> various kinds of functions
#[pure] // deterministic (maybe implied when ghost?)
#[transparent]
ghost fn divides(v: nat, u: nat) -> bool { // 'ghost' == no implementation
    // --> quantifier syntax
    exists(|k: nat| k * u == v)
}

#[pure]
fn gcd(a: u64, b: u64) -> u64
    // --> spec syntax
    ensures exists(|k: nat| k * return == a),
            exists(|k: nat| k * return == b),
            forall(|d: nat| (divides(a, d) && divides(b, d)) ==> d <= return)
    // (this is consistent with the 'where' clause in vanilla rust)
{
    // --> ghost variables and blocks
    let max = ghost { math::max(a, b) }; // inspired by unsafe { ... } blocks

    for c in std::math::max(a, b)..0
        decreases c
        invariant forall(|cc: nat| cc > c ==> !divides(a, c) || !divides(b, c))
    {
        if a % c == 0 && b % c == 0 {
            // --> vc assertions
            assert divides(a, c) && divides(b, c); // assert isn't a function, like 'return'
            return c;
        }
    }
}

ghost struct Map<K, V>;

/// Linear probing Hashtable
struct Hashtable<T> {
    storage: Vec<Option<(u64, T)>>,
    pub ghost contents: Map<nat, T>,
}

impl<T> Hashtable<T> {
    #[pure]
    #[transparent]
    pub ghost fn contents_from_storage(storage: Vec<T>) -> Map<nat, T> { // return is implicitly ghost
        // --> map comprehension
        storage[..].filter_map(|x| x).collect()
    }
}

#[pure]
#[transparent]
pub ghost fn slice_of_doubles(length: nat) -> [nat] { // return is a 'slice' (dafny seq) of nats
    // --> list comprehension
    (0..length).map(|i| i * 2).collect()
}

ghost struct TwoThings {
    a: nat,
    b: nat,
}

pub ghost fn some_op(ghost two_things: TwoThings) -> ghost TwoThings
    ensures return.a == two_things.a + 1,
            return.b == two_things.b
{
    // --> (no) field update syntax? just use the constructor with some fancier (de)construction syntax?
    TwoThings { a: two_things.a + 1, ..two_things } // '..two_things' represents "the other fields of two_things"
}

// for the next example
#[pure]
// not transparent!
pub ghost fn something(a: TwoThings, b: TwoThings) {
    // ...
}

// --> lemmas
#[proof]
pub ghost fn something_transitive(a: TwoThings, b: TwoThings, c: TwoThings)
    requires something(a, b) && something(b, c)
    ensures something(a, c)
{
    // --> calc, reveal
    calc {
        a == by { reveal something; }
        b == by { reveal something; }
        c
    }

    // --> limit proof scope (proof goes before the assertion, contrary to dafny's assert .. by)
    using { reveal something; }
    assert something(a, c),
           potentially_something_else(a, c);
}