dlukes · May 6, 2021 06:38
diff --git a/multi_borrow.rs b/multi_borrow.rs
 // Some instances of E0499 ("cannot borrow X as mutable more than once
 // at a time") are straightforward to understand, but some can be
 // tricky. Among the latter ones (at least for me) are those related to
 // the way a function potentially extends the lifetime of a borrow by
 // tying it to another value.
 //
 // For instance, this function: https://github.com/benhoyt/countwords/blob/5318b1acdd5bd313039d480af535cf79565c2e62/rust/optimized-unsafe/main.rs#L72
 //
 // Try changing it so that it accepts a &'a mut Vec<u8> instead of a
 // &'a Cell<Vec<u8>>. You'll get the following error:
 //
 // error[E0499]: cannot borrow `keys` as mutable more than once at a time
 //   --> main.rs:32:27
 //    |
 // 32 |                 increment(&mut keys, &mut counts, &buf[..offset + 1]);
 //    |                           ^^^^^^^^^  ----------- first borrow later used here
 //    |                           |
 //    |                           second mutable borrow occurs here
 // ...
 // 45 |                     increment(&mut keys, &mut counts, &buf[start..i]);
 //    |                               --------- first mutable borrow occurs here
 //
 // error[E0499]: cannot borrow `keys` as mutable more than once at a time
 //   --> main.rs:45:31
 //    |
 // 45 |                     increment(&mut keys, &mut counts, &buf[start..i]);
 //    |                               ^^^^^^^^^ `keys` was mutably borrowed here in the previous iteration of the loop
 //
 // At first glance, you might be surprised: both the Vec and the HashMap
 // are repeatedly mutably borrowed, each time the function is called. So
 // why does only the Vec's borrow from the previous iteration of the
 // loop mysteriously stick around and cause problems later on? In what
 // sense does &mut counts on l. 32 "use" the first mutable borrow of
 // keys? Let's find out!
 //
 // Additional resources:
 //
 // - https://stackoverflow.com/a/49929322
 // - https://stackoverflow.com/a/32300133
 // - https://stackoverflow.com/a/31067272
 use std::cell::{Cell, RefCell};

 fn main() {
    // ----------------------------------------- bind_lifetimes_together {{{1

    // Let's create a situation analogous to that in the code from the
    // GitHub link above, but trimmed down to the essentials, so that
    // it's clearer what causes (or doesn't cause) the problem we're
    // seeing.

    struct X(u8);
    struct Y;

    // bind_lifetimes_together is our analog to increment in the
    // original code; X corresponds to Vec<u8>, and Option<&Y>
    // corresponds to HashMap<&[u8], u64>.
    fn bind_lifetimes_together<'a>(_x: &'a mut X, _y: &mut Option<&'a Y>) {}

    let mut x = X(0);
    let mut y = None;

    bind_lifetimes_together(&mut x, &mut y);

    // When we first call this function, we create two mutable borrows:
    // of both x and y. But while the borrow of y is free to end
    // whenever convenient (it's not constrained by any explicit
    // lifetime, as per the function definition above), the mutable
    // borrow of x is now tied to the lifetime of y (or more precisely,
    // the reference inside y), via the explicit lifetime 'a.
    //
    // This happens even though they're clearly unrelated otherwise --
    // the function has no body, so it can't intertwine the two values
    // in any way, and even if it had a body, it couldn't, because it
    // looks like there is no potential for overlap between the types
    // (x: X and y: Option<&Y>). Except for the lifetimes, of course --
    // and that caveat is the crux of the biscuit (I can personally
    // attest it's hard to fully grok that lifetimes are part of the
    // type). In practice though, this type of problem will often happen
    // in code which *does* somehow intertwine the values of x and y (as
    // in the introductory example), e.g. y could be an Option<&X> and
    // the function would store &x in there or something of the sort.
    //
    // But it's important to realize that while Rust wouldn't allow us
    // to do those things without specifying the appropriate
    // constraints, the reverse -- i.e. overspecifying the constraints
    // when it's not necessary -- is entirely possible, and violating
    // them will still trigger a compilation error, even though the code
    // does nothing that would create a problem in practice when the
    // constraints are broken.

    // To wit: a second call to the same function will fail to
    // compile...

    // bind_lifetimes_together(&mut x, &mut y);

    // ... with a message along the following lines:
    //
    //   |
    //   |     bind_lifetimes_together(&mut x, &mut y);
    //   |                             ------ first mutable borrow occurs here
    //   |     bind_lifetimes_together(&mut x, &mut y);
    //   |                             ^^^^^^  ------ first borrow later used here
    //   |                             |
    //   |                             second mutable borrow occurs here
    //
    // This can be confusing -- in what sense does the second &mut y
    // "use" the first &mut x? Well, again, in the sense that after the
    // first call, the lifetime of the mutable borrow of x is tied to
    // the lifetime of y. That means that when we call the function for
    // the second time, Rust can drop the first mutable borrow of y and
    // make a new one (there's no constraint there), but it *cannot* do
    // the same thing with the first mutable borrow of x -- y is still
    // alive at that point, so the first &mut x must be kept alive as
    // well.
    //
    // Still, let's acknowledge that the choice of word "use" to
    // describe this state of affairs can be somewhat misleading or
    // unintuitive, especially if you cause the error by accidentally
    // over-constraining the lifetimes of two otherwise unrelated
    // references. In our running example here, x and y don't (and
    // cannot) share access to any values, so it's hard for a budding
    // Rustacean to make sense of the claim that y somehow "uses" a
    // previous borrow of x.
    //
    // The same type of problem can easily happen when calling the
    // function in a loop, in which case you'll get an error message
    // stating that the previous mutable borrow happened in the previous
    // iteration of the loop.
    //
    // The moral of the story is: don't just automatically sprinkle 'a
    // on every reference. Wait for Rust to complain that you need to
    // specify lifetimes, and then try to use as many different ones as
    // possible, so as to have the most relaxed constraints possible.
    // You might still end up needing to bind lifetimes together, in
    // which case see the tips below on how to reconcile that with
    // mutability and repeated calls, but at least you'll know you're
    // not needlessly hamstringing yourself.

    // ---------------------------------------------------- bind_and_ret {{{1

    // If you end up in a situation where Rust requires x and y
    // lifetimes to be bound together in this way, one way to make it
    // possible to call the function repeatedly is to sort of
    // "roundtrip" the mutable borrow of x through the function.
    fn bind_and_ret<'a>(x: &'a mut X, _y: &mut Option<&'a Y>) -> &'a mut X {
        x
    }

    let mut x = X(0);
    let mut y = None;

    // Using distinct variable names for clarity (though you could keep
    // reassigning the same binding):
    let mut_x1 = &mut x;
    let mut_x2 = bind_and_ret(mut_x1, &mut y);
    let mut_x3 = bind_and_ret(mut_x2, &mut y);

    // What happens here (I think): when mut_x1 is passed into the
    // function, it's reborrowed. That reborrow's lifetime is then bound
    // to the lifetime of y, as previously. But unlike previously, we
    // then return the reborrow and store it in mut_x2. That means that
    // when we next need to call the function and provide a mutable
    // borrow of x, we don't need to borrow again (which would be an
    // error, since the first mutable borrow is still active): we can
    // just use the active borrow, which we happen to have a handle onto
    // (mut_x2), unlike previously.

    // This works in a loop as well, of course:
    let mut mut_x = mut_x3;
    for _ in 0..10 {
        mut_x = bind_and_ret(mut_x, &mut y);
    }

    // I'm not sure whether this would be considered idiomatic or hacky,
    // what the performance characteristics are etc. The alternative I
    // saw in real-world code by experienced Rustaceans is to use a "get
    // out of immutability free card", like a shared ref to a (Ref)Cell
    // wrapping x (instead of a mutable ref to x; see below).

    // -------------------------------------------- bind_shared_ref_cell {{{1

    // This is how you could achieve repeated calls to a function of the
    // same general shape and capabilities as those above via using a
    // RefCell.

    fn bind_shared_ref_cell<'a>(x: &'a RefCell<X>, _y: &mut Option<&'a Y>) {
        // proof that you can mutate x in this setup
        x.borrow_mut().0 = 42;
    }

    let x = RefCell::new(X(0));
    let mut y = None;
    for _ in 0..10 {
        bind_shared_ref_cell(&x, &mut y);
    }

    // ------------------------------------------------ bind_shared_cell {{{1

    // And this is how you could do it with a Cell.
    //
    // In both cases, the borrows of x are still very much tied to the
    // lifetime of y (via the lifetime annotations), but since they're
    // now immutable, you can have as many of them as you like.

    fn bind_shared_cell<'a>(x_outer: &'a Cell<X>, _y: &mut Option<&'a Y>) {
        // proof that you can mutate x in this setup; X(0) is a dummy
        // value which won't be used; if the type wrapped by the Cell
        // implements default, you can also just use .take() here
        let mut x = x_outer.replace(X(0));
        x.0 = 42;
        x_outer.set(x);
    }

    let x = Cell::new(X(0));
    let mut y = None;
    for _ in 0..10 {
        bind_shared_cell(&x, &mut y);
    }

    // ----------------------------------------------------- Performance {{{1

    // Between the three functions that allow being called repeatedly,
    // which would be the most idiomatic solution? And performance-wise?
    // Are there significant differences, or is it a wash? The overheads
    // that come to mind:
    //
    // - bind_and_ret: This is the only function that returns a value,
    //   which is not free. On the other hand, it doesn't have to do any
    //   additional bookkeeping, it just uses plain mutable references.
    // - bind_shared_ref_cell: Requires wrapping in a RefCell and
    //   dynamically checking borrow rules.
    // - bind_shared_cell: Requires wrapping in a Cell and fiddling with
    //   its contents, including instantiating a dummy value to swap
    //   into the cell so that we can take out the "real" one.
 }

 // vi: set foldmethod=marker
	// Some instances of E0499 ("cannot borrow X as mutable more than once
	// at a time") are straightforward to understand, but some can be
	// tricky. Among the latter ones (at least for me) are those related to
	// the way a function potentially extends the lifetime of a borrow by
	// tying it to another value.
	//
	// For instance, this function: https://github.com/benhoyt/countwords/blob/5318b1acdd5bd313039d480af535cf79565c2e62/rust/optimized-unsafe/main.rs#L72
	//
	// Try changing it so that it accepts a &'a mut Vec<u8> instead of a
	// &'a Cell<Vec<u8>>. You'll get the following error:
	//
	// error[E0499]: cannot borrow `keys` as mutable more than once at a time
	// --> main.rs:32:27
	// \|
	// 32 \| increment(&mut keys, &mut counts, &buf[..offset + 1]);
	// \| ^^^^^^^^^ ----------- first borrow later used here
	// \| \|
	// \| second mutable borrow occurs here
	// ...
	// 45 \| increment(&mut keys, &mut counts, &buf[start..i]);
	// \| --------- first mutable borrow occurs here
	//
	// error[E0499]: cannot borrow `keys` as mutable more than once at a time
	// --> main.rs:45:31
	// \|
	// 45 \| increment(&mut keys, &mut counts, &buf[start..i]);
	// \| ^^^^^^^^^ `keys` was mutably borrowed here in the previous iteration of the loop
	//
	// At first glance, you might be surprised: both the Vec and the HashMap
	// are repeatedly mutably borrowed, each time the function is called. So
	// why does only the Vec's borrow from the previous iteration of the
	// loop mysteriously stick around and cause problems later on? In what
	// sense does &mut counts on l. 32 "use" the first mutable borrow of
	// keys? Let's find out!
	//
	// Additional resources:
	//
	// - https://stackoverflow.com/a/49929322
	// - https://stackoverflow.com/a/32300133
	// - https://stackoverflow.com/a/31067272
	use std::cell::{Cell, RefCell};

	fn main() {
	// ----------------------------------------- bind_lifetimes_together {{{1

	// Let's create a situation analogous to that in the code from the
	// GitHub link above, but trimmed down to the essentials, so that
	// it's clearer what causes (or doesn't cause) the problem we're
	// seeing.

	struct X(u8);
	struct Y;

	// bind_lifetimes_together is our analog to increment in the
	// original code; X corresponds to Vec<u8>, and Option<&Y>
	// corresponds to HashMap<&[u8], u64>.
	fn bind_lifetimes_together<'a>(_x: &'a mut X, _y: &mut Option<&'a Y>) {}

	let mut x = X(0);
	let mut y = None;

	bind_lifetimes_together(&mut x, &mut y);

	// When we first call this function, we create two mutable borrows:
	// of both x and y. But while the borrow of y is free to end
	// whenever convenient (it's not constrained by any explicit
	// lifetime, as per the function definition above), the mutable
	// borrow of x is now tied to the lifetime of y (or more precisely,
	// the reference inside y), via the explicit lifetime 'a.
	//
	// This happens even though they're clearly unrelated otherwise --
	// the function has no body, so it can't intertwine the two values
	// in any way, and even if it had a body, it couldn't, because it
	// looks like there is no potential for overlap between the types
	// (x: X and y: Option<&Y>). Except for the lifetimes, of course --
	// and that caveat is the crux of the biscuit (I can personally
	// attest it's hard to fully grok that lifetimes are part of the
	// type). In practice though, this type of problem will often happen
	// in code which does somehow intertwine the values of x and y (as
	// in the introductory example), e.g. y could be an Option<&X> and
	// the function would store &x in there or something of the sort.
	//
	// But it's important to realize that while Rust wouldn't allow us
	// to do those things without specifying the appropriate
	// constraints, the reverse -- i.e. overspecifying the constraints
	// when it's not necessary -- is entirely possible, and violating
	// them will still trigger a compilation error, even though the code
	// does nothing that would create a problem in practice when the
	// constraints are broken.

	// To wit: a second call to the same function will fail to
	// compile...

	// bind_lifetimes_together(&mut x, &mut y);

	// ... with a message along the following lines:
	//
	// \|
	// \| bind_lifetimes_together(&mut x, &mut y);
	// \| ------ first mutable borrow occurs here
	// \| bind_lifetimes_together(&mut x, &mut y);
	// \| ^^^^^^ ------ first borrow later used here
	// \| \|
	// \| second mutable borrow occurs here
	//
	// This can be confusing -- in what sense does the second &mut y
	// "use" the first &mut x? Well, again, in the sense that after the
	// first call, the lifetime of the mutable borrow of x is tied to
	// the lifetime of y. That means that when we call the function for
	// the second time, Rust can drop the first mutable borrow of y and
	// make a new one (there's no constraint there), but it cannot do
	// the same thing with the first mutable borrow of x -- y is still
	// alive at that point, so the first &mut x must be kept alive as
	// well.
	//
	// Still, let's acknowledge that the choice of word "use" to
	// describe this state of affairs can be somewhat misleading or
	// unintuitive, especially if you cause the error by accidentally
	// over-constraining the lifetimes of two otherwise unrelated
	// references. In our running example here, x and y don't (and
	// cannot) share access to any values, so it's hard for a budding
	// Rustacean to make sense of the claim that y somehow "uses" a
	// previous borrow of x.
	//
	// The same type of problem can easily happen when calling the
	// function in a loop, in which case you'll get an error message
	// stating that the previous mutable borrow happened in the previous
	// iteration of the loop.
	//
	// The moral of the story is: don't just automatically sprinkle 'a
	// on every reference. Wait for Rust to complain that you need to
	// specify lifetimes, and then try to use as many different ones as
	// possible, so as to have the most relaxed constraints possible.
	// You might still end up needing to bind lifetimes together, in
	// which case see the tips below on how to reconcile that with
	// mutability and repeated calls, but at least you'll know you're
	// not needlessly hamstringing yourself.

	// ---------------------------------------------------- bind_and_ret {{{1

	// If you end up in a situation where Rust requires x and y
	// lifetimes to be bound together in this way, one way to make it
	// possible to call the function repeatedly is to sort of
	// "roundtrip" the mutable borrow of x through the function.
	fn bind_and_ret<'a>(x: &'a mut X, _y: &mut Option<&'a Y>) -> &'a mut X {
	x
	}

	let mut x = X(0);
	let mut y = None;

	// Using distinct variable names for clarity (though you could keep
	// reassigning the same binding):
	let mut_x1 = &mut x;
	let mut_x2 = bind_and_ret(mut_x1, &mut y);
	let mut_x3 = bind_and_ret(mut_x2, &mut y);

	// What happens here (I think): when mut_x1 is passed into the
	// function, it's reborrowed. That reborrow's lifetime is then bound
	// to the lifetime of y, as previously. But unlike previously, we
	// then return the reborrow and store it in mut_x2. That means that
	// when we next need to call the function and provide a mutable
	// borrow of x, we don't need to borrow again (which would be an
	// error, since the first mutable borrow is still active): we can
	// just use the active borrow, which we happen to have a handle onto
	// (mut_x2), unlike previously.

	// This works in a loop as well, of course:
	let mut mut_x = mut_x3;
	for _ in 0..10 {
	mut_x = bind_and_ret(mut_x, &mut y);
	}

	// I'm not sure whether this would be considered idiomatic or hacky,
	// what the performance characteristics are etc. The alternative I
	// saw in real-world code by experienced Rustaceans is to use a "get
	// out of immutability free card", like a shared ref to a (Ref)Cell
	// wrapping x (instead of a mutable ref to x; see below).

	// -------------------------------------------- bind_shared_ref_cell {{{1

	// This is how you could achieve repeated calls to a function of the
	// same general shape and capabilities as those above via using a
	// RefCell.

	fn bind_shared_ref_cell<'a>(x: &'a RefCell<X>, _y: &mut Option<&'a Y>) {
	// proof that you can mutate x in this setup
	x.borrow_mut().0 = 42;
	}

	let x = RefCell::new(X(0));
	let mut y = None;
	for _ in 0..10 {
	bind_shared_ref_cell(&x, &mut y);
	}

	// ------------------------------------------------ bind_shared_cell {{{1

	// And this is how you could do it with a Cell.
	//
	// In both cases, the borrows of x are still very much tied to the
	// lifetime of y (via the lifetime annotations), but since they're
	// now immutable, you can have as many of them as you like.

	fn bind_shared_cell<'a>(x_outer: &'a Cell<X>, _y: &mut Option<&'a Y>) {
	// proof that you can mutate x in this setup; X(0) is a dummy
	// value which won't be used; if the type wrapped by the Cell
	// implements default, you can also just use .take() here
	let mut x = x_outer.replace(X(0));
	x.0 = 42;
	x_outer.set(x);
	}

	let x = Cell::new(X(0));
	let mut y = None;
	for _ in 0..10 {
	bind_shared_cell(&x, &mut y);
	}

	// ----------------------------------------------------- Performance {{{1

	// Between the three functions that allow being called repeatedly,
	// which would be the most idiomatic solution? And performance-wise?
	// Are there significant differences, or is it a wash? The overheads
	// that come to mind:
	//
	// - bind_and_ret: This is the only function that returns a value,
	// which is not free. On the other hand, it doesn't have to do any
	// additional bookkeeping, it just uses plain mutable references.
	// - bind_shared_ref_cell: Requires wrapping in a RefCell and
	// dynamically checking borrow rules.
	// - bind_shared_cell: Requires wrapping in a Cell and fiddling with
	// its contents, including instantiating a dummy value to swap
	// into the cell so that we can take out the "real" one.
	}

	// vi: set foldmethod=marker
No results found