Add rustfmt and use target-cpu=native Add option for libc malloc, basic thread pinning, touch-free allocation Split modules |
||
|---|---|---|
| .. | ||
| .cargo | ||
| src | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
| rustfmt.toml | ||
rust-stream
This is an implementation of BabelStream in Rust.
Currently, we support three CPU threading API as devices:
- Plain - basic single-threaded
forversion, see plain_stream.rs - Rayon - Parallel with high level API, see rayon_stream.rs
- Crossbeam - Parallel with partitions per thread, see crossbeam_stream.rs
In addition, this implementation also supports the following extra flags:
--init Initialise each benchmark array at allocation time on the main thread
--malloc Use libc malloc instead of the Rust's allocator for benchmark array allocation
--pin Pin threads to distinct cores, this has NO effect in Rayon devices
There is an ongoing investigation on potential performance issues under NUMA situations. As part of the experiment, this implementation made use of the provisional Allocator traits which requires rust unstable. We hope a NUMA aware allocator will be available once the allocator API reaches rust stable.
Build & Run
Prerequisites:
Once the toolchain is installed, enable the nightly channel:
> rustup install nightly
> rustup default nightly # optional, this sets `+nightly` automatically for cargo calls later
With cargo on path, compile and run the benchmark with:
> cd rust-stream/
> cargo +nightly build --release # or simply `cargo build --release` if nightly channel is the default
> ./target/release/rust-stream --help
rust-stream 3.4.0
USAGE:
rust-stream [FLAGS] [OPTIONS]
FLAGS:
--csv Output as csv table
--float Use floats (rather than doubles)
-h, --help Prints help information
--init Initialise each benchmark array at allocation time on the main thread
--list List available devices
--malloc Use libc malloc instead of the Rust's allocator for benchmark array allocation
--mibibytes Use MiB=2^20 for bandwidth calculation (default MB=10^6)
--nstream-only Only run nstream
--pin Pin threads to distinct cores, this has NO effect in Rayon devices
--triad-only Only run triad
-V, --version Prints version information
OPTIONS:
-s, --arraysize <arraysize> Use <arraysize> elements in the array [default: 33554432]
--device <device> Select device at <device> [default: 0]
-n, --numtimes <numtimes> Run the test <numtimes> times (NUM >= 2) [default: 100]