-
-
Save huntc/d3042b4171ac5f6e3da3b634aee26abc to your computer and use it in GitHub Desktop.
use std::env; | |
use std::fs::File; | |
use std::io::{BufRead, BufReader}; | |
use std::os::raw::c_double; | |
use chrono::{Duration, NaiveDateTime}; | |
struct State { | |
x0: Option<(NaiveDateTime, c_double)>, | |
x1: Option<(NaiveDateTime, c_double)>, | |
y: Option<(NaiveDateTime, c_double)>, | |
} | |
impl State { | |
fn empty() -> State { | |
State { | |
x0: None, | |
x1: None, | |
y: None, | |
} | |
} | |
} | |
fn main() { | |
let args: Vec<String> = env::args().collect(); | |
let filename = &args[1]; | |
let file = File::open(filename).unwrap(); | |
let reader = BufReader::new(file); | |
reader | |
.lines() | |
.enumerate() | |
.fold(State::empty(), |last_state, (_, line)| { | |
let line = line.unwrap(); | |
let cols: Vec<&str> = line.split(',').collect(); | |
let timestamp_components: Vec<_> = cols[0].split('.').collect(); | |
let time = NaiveDateTime::from_timestamp( | |
timestamp_components[0].parse::<i64>().unwrap(), | |
timestamp_components[1].parse::<u32>().unwrap() * 1000, | |
); | |
let x: c_double = cols[1].parse().unwrap(); | |
let y: c_double = cols[2].parse().unwrap(); | |
let new_state = match last_state { | |
// ... Omitted as this is an algorithm I don't want to share, but I don't believe that it is the cause | |
}; | |
if let State { | |
x0: Some((new_x0_time, new_x0)), | |
x1: Some((_, new_x1)), | |
y: Some((_, new_y)), | |
} = new_state | |
{ | |
println!("{}, {}. {}, {}", new_x0_time, new_x0, new_x1, new_y); | |
State::empty() | |
} else { | |
new_state | |
} | |
}); | |
} |
That's a lot of allocations that you're doing there with each collect in each iteration.
Also check out the lexical crate.
let mut cols = line.split(',');
let mut timestamp = cols.next().unwrap().split(',');
let time = NaiveDateTime::from_timestamp(
lexical::parse(timestamp.next().unwrap()).unwrap(),
lexical::parse(timestamp.next().unwrap()).unwrap() * 1000
);
let x: c_double = lexical::parse(cols.next().unwrap()).unwrap();
let y: c_double = lexical::parse(cols.next().unwrap()).unwrap();
Thanks! I’ll give it a go.
If you really do want to collect, you can convert your closure into a move closure move || {}
, and above that create some vectors that you will reuse between iterations. Simply make sure to clear them. That said, there's really no need to allocate here.
Actually, my colleague @longshorej also came up with:
let line = line.unwrap();
let mut cols = line.split(',');
let col_0 = cols.next().unwrap();
let col_1 = cols.next().unwrap();
let col_2 = cols.next().unwrap();
let mut timestamp_components = col_0.split('.');
let mut ts_0 = timestamp_components.next().unwrap();
let mut ts_1 = timestamp_components.next().unwrap();
let time = NaiveDateTime::from_timestamp(
ts_0.parse::<i64>().unwrap(),
ts_1.parse::<u32>().unwrap() * 1000,
);
let x: c_double = col_1.parse().unwrap();
let y: c_double = col_2.parse().unwrap();
...which should reduce the allocations. I didn't notice much of a saving here though either.
If you're doing any sort of float parsing (c_double
), Rust's default parser is optimized for precision, rather than speed. Hence you will see a significant improvement in parsing times using lexical
's speed-optimized parsers.
Additionally, unfortunately, the .lines()
method on the BufReader
does allocate a new string with each line read. You can eliminate the allocations with
let line = &mut String::new();
let mut read;
loop {
read = reader.read_line(line).unwrap();
if read == 0 {
break
}
let line = line.as_str();
// Code here
}
linereader does offer an alternative line reader with a .for_each()
method for interacting with lines in an allocation-free way.
Additionally, unfortunately, the .lines() method on the BufReader does allocate a new string with each line read. You can eliminate the allocations with
Ah, I had wondered about that. I'll bet that's my problem.
linereader does offer an alternative line reader with a .for_each() method for interacting with lines in an allocation-free way.
Interesting. Again, thanks. Having to seek alternatives to core Rust is starting to put me off though...
So, this was posted on Reddit, and I hope you don't mind the intrusion. I just have two ideas:
-
I assume you're running this in release mode, so I'd be curious to see what adding
-C target-cpu=native
as the RUSTFLAGS environment variable when compiling. You'd have to compile it on the same machine that will be running it though. -
The
println!
macro needs to lock the stdout (which is behind.. .a mutex, I think?). You can lock it at the top and usewriteln!
instead. I'm not as familiar with the performance implications here in the JVM, but I'm guessing that it gets optimized out in a single threaded context.
I still say that relying on Rust's native float parser, as opposed to using lexical, is going to be responsible for significant CPU costs, since Rust's float parser is optimized for precision, not speed.
As mentioned above, try to reduce the number of allocations. Two easy changes:
- Use
ArrayVec
for stack allocation ofcols
:
instead of:
let cols: Vec<&str> = line.split(',').collect();
use:
let cols: ArrayVec<[&str; 3]> = line.split(',').take(3).collect();
- Use
LineReader
to not allocate a String for each line:
let mut reader = LineReader::new(File::open(filename).unwrap());
while let Some(line) = reader.next_line() {
...
}
Or you could use itertools
's .collect_tuple()
for a nicer api:
use itertools::Itertools;
let (timestamp_str, x_str, y_str) = line.split(',').take(3).collect_tuple();
let (secs, nanos) = timestamp_str.split('.').take(2).collect_tuple();
Interesting. Again, thanks. Having to seek alternatives to core Rust is starting to put me off though...
It should be noted that, when doing things at the level of Rust, there are often many ways to do things, and the "right" way depends on the tradeoffs that you want to make. One of those examples is in the float parser, as already mentioned (Do you want the most precise parsing of the float value, or a good fast approximation). Rust's core is, by design, relatively thin. It provides a good many things, done in a way that is acceptable for a good number of use cases. Depending on what you want to optimize for, there are a ways to help you do that (lexical
, linereader
, and holding a lock on stdout
as examples).
Also, if you're worried about speed, you definitely want release mode (cargo run --release
). And, if you're only going to use the binary on your own machine, using RUSTFLAGS="-C target-cpu=native"
is a pretty good way to eek out a bit more performance from your SIMD registers and instructions.
I think println!
locks and unlocks the stdout at each call, try locking once and eritting to the lock. Not sure how much speed improvement you can get though.
Another possible improvement is to parallelize reading the file in chunks with rayon
. If the source file can fit at list twice in memory, you can buffer the parsed chunks and return them on each parallel iteration and then join them for one single write (rayon guarantees order of the returns). If the file is too large, you can write each parsed chunk to individual temp files and then join them to your final output
I think println! locks and unlocks the stdout at each call, try locking once and eritting to the lock. Not sure how much speed improvement you can get though.
Possibly some. The major issue is likely that in Rust stdout is line-buffered (always IIRC).
I'm assuming the JVM uses the C runtime behaviour of line-buffering stdout when it's connected to a tty but fully buffering when it's connected to a pipe (and we don't have a benchmark script but this would usually be piped to /dev/null), so it's only going to flush stdout every 4kB or so whereas rust is going to flush on every iteration.
Here's a loop just println-ing 10 million integers to stdout on my machine:
cargo run --release > /dev/null 9.00s user 5.56s system 99% cpu 14.583 total
locking and writeln-ing:
cargo run --release > /dev/null 8.42s user 6.20s system 99% cpu 14.733 total
wrapping StdoutLock into a BufWriter:
cargo run --release > /dev/null 1.05s user 0.03s system 99% cpu 1.089 total
Of course the issue there is that you lose line buffering in "interactive" mode, so ideally you'd need something like atty to check whether your stdout is a tty, and probably box your Writer as a trait object e.g.
let mut out: Box<dyn Write> = if atty::is(Stream::Stdout) {
Box::new(stdout.lock())
} else {
Box::new(BufWriter::new(stdout.lock()))
};
Here's using socat to connect stdout to a tty without being slowed down by the actual terminal:
cargo run --release > /dev/null 1.16s user 0.04s system 91% cpu 1.322 total
socat - EXEC:'cargo run --release',pty,setsid,ctty > /dev/null 17.15s user 24.01s system 194% cpu 21.205 total
Relevant Scala comparison: