-
-
Save eloff/6e0c236ae6388744a7d5b76a5e0e9b6c to your computer and use it in GitHub Desktop.
// Public Domain | |
use bytes; | |
struct BytesAlike { | |
data: *const u8, | |
len: usize, | |
_1: usize, | |
_2: usize, | |
} | |
/// unsplit_bytes checks if b2 follows directly after b1 in memory, and if so merges it | |
/// into b1 and returns b1. Otherwise returns b1 and b2 unmodified. | |
/// Safety: because of pointer provenance https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance | |
/// this may invoke undefined behavior if used to merge two Bytes not originally part of | |
/// the same allocation (e.g. not split from the same BytesMut or similar.) | |
pub unsafe fn unsplit_bytes(mut b1: Bytes, b2: Bytes) -> (Option<Bytes>, Option<Bytes>) { | |
if are_contiguous(&b1, &b2) { | |
assert_eq!(std::mem::size_of::<Bytes>(), std::mem::size_of::<BytesAlike>()); | |
// Safety: this is pretty unsafe, we assume the length is a usize stored after a pointer. | |
// So we check that both the pointer and length fields appear to be where expect them | |
// and if they're not, we simply treat them as if they cannot be merged. | |
let p = b1.as_ptr(); | |
let len = b1.len(); | |
let bytes_ref = unsafe { &mut *(&mut b1 as *mut Bytes as *mut BytesAlike) }; | |
if bytes_ref.data == p && bytes_ref.len == len { | |
bytes_ref.len += b2.len(); | |
return (Some(b1), None); | |
} | |
} | |
(Some(b1), Some(b2)) | |
} | |
/// Returns true if b2 immediately follows b1 in memory | |
fn are_contiguous(b1: &Bytes, b2: &Bytes) -> bool { | |
// Safety: we don't dereference end | |
let end = unsafe { b1.as_ptr().add(b1.len()) }; | |
end == b2.as_ptr() | |
} | |
#[cfg(test)] | |
mod tests { | |
use super::*; | |
use bytes::Buf; | |
#[test] | |
fn test_unsplit_bytes_success() { | |
let s = "foobar".as_bytes(); | |
let mut b1 = Bytes::from(s); | |
let b2 = b1.split_off(3); | |
assert!(are_contiguous(&b1, &b2)); | |
let (r1, r2) = unsafe { unsplit_bytes(b1, b2) }; | |
assert!(r1.is_some()); | |
assert!(r2.is_none()); | |
assert_eq!(r1.unwrap().chunk(), s); | |
} | |
#[test] | |
fn test_unsplit_bytes_fail() { | |
let foo = "foopad".as_bytes(); | |
let bar = "bar".as_bytes(); | |
let b1 = Bytes::from(&foo[..3]); | |
let b2 = Bytes::from(bar); | |
assert!(!are_contiguous(&b1, &b2)); | |
let (r1, r2) = unsafe { unsplit_bytes(b1, b2) }; | |
assert_eq!(r1.is_some(), r2.is_some()); | |
assert_eq!(r1.unwrap().chunk(), &foo[..3]); | |
assert_eq!(r2.unwrap().chunk(), bar); | |
} | |
} |
I don't see what Rust safety rule specifically is being violated here. Could this not just be an issue with miri?
I'm mutating memory marked as mutable through a non-aliased mutable reference.
The unsafe bit is, as you point out, is assuming the order of the fields in bytes::Bytes. It's made a little safer by checking at runtime that the length field matches the actual length. Given the other three fields are pointers, this is unlikely to occur by chance (the Bytes buffer would have to be very large for that to be possible, and because the numbers are then very large, they're unlikely to match by chance.)
It's not great. But I think it's the best that can be done barring this being included in the bytes library itself: tokio-rs/bytes#503
I think for the purposes of this hack, changing the assert so instead of panicking it just does not merge the Bytes would be an improvement. It would also make it a little safer (probabilistically) to check the data pointer is where it's expected to be as well.
I think the current miri error is due to the borrow in the assert before that line. A mutable borrow is created to b1, then it's invalidated by the shared borrow when calling b1.len()
, then we try to use the mutable borrow again in the last line. to work around that, can take the len before creating the mutable borrow.
let len = b1.len();
let bytes_ref = unsafe { &mut *(&mut b1 as *mut Bytes as *mut BytesAlike) };
assert_eq!(bytes_ref.len, len);
bytes_ref.len += b2.len();
but I do not think that makes us in an okay place yet. if we do something like the following,
fn main() {
let s = "foobar".as_bytes();
let b1 = Bytes::from(s);
let b2 = Bytes::from("aaaaaaaa");
assert!(are_contiguous(&b1, &b2));
let (r1, r2) = unsplit_bytes(b1, b2);
assert!(r1.is_some());
assert!(r2.is_none());
println!("{:?}", r1);
}
and compile in release and run it (because debug/miri do not end up making it contiguous), it will print
Some(b"foobaraaaaaaaa")
and I wonder if a case could be created that causes the issue mentioned in slice's docs / these links,
- https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance
https://github.com/rust-lang/rust/issues/62765
I think the current miri error is due to the borrow in the assert before that line. A mutable borrow is created to b1, then it's invalidated by the shared borrow when calling b1.len(), then we try to use the mutable borrow again in the last line.
Yes, you're right. Good catch! I'll update the code.
About the panic, that seems expected. The two allocations end up being contiguous (yes that surprised me too!) so they're joined and the strings are not equal because now it's "foobaraaaaaaaa" as expected.
However, you have a point about pointer provenance. This may be undefined behavior to combine two adjacent memory regions that weren't originally part of the same buffer. In practice I have difficulty imagining how the compiler could miscompile a program in this specific case. It's possible though. One can avoid that by just documenting that it's undefined behavior to use unsplit in cases like that (and add unsafe to the function.)
One can avoid that by just documenting that it's undefined behavior to use unsplit in cases like that (and add unsafe to the function.)
sounds reasonable ^^
Hey, thank you for your feedback. Your review improved the code. Are you affiliated with miri at all?
Glad it could help! I'm not affiliated, but have tried to help on a few issues on miri's repo in the past. Noticed your issue earlier while checking byte
's repo (not affiliated with it either) and was curious what it was about :)
miri errors with the following:
click
tools > miri
to run it https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2cc2a5b8e1bc0e4783c4ef6a88ee5986this appears to be trying to do a similar thing that slice's docs warn against
https://doc.rust-lang.org/nightly/std/slice/fn.from_raw_parts.html#incorrect-usage
(the code also assumes the layout of
bytes::Bytes
in the cast.Bytes
not only has private fields, which means you shouldn't assume anything as an user of the library, it is alsorepr(rust)
which means rustc is free to re-order the fields)