BabelStream/rust-stream/README.md
Tom Lin fdb2c181cc Add Crossbeam implementation
Add rustfmt and use target-cpu=native
Add option for libc malloc, basic thread pinning, touch-free allocation
Split modules
2021-06-15 23:13:14 +01:00

72 lines
2.7 KiB
Markdown

rust-stream
===========
This is an implementation of BabelStream in Rust.
Currently, we support three CPU threading API as devices:
* Plain - basic single-threaded `for` version, see [plain_stream.rs](src/plain_stream.rs)
* [Rayon](https://github.com/rayon-rs/rayon) - Parallel with high level API,
see [rayon_stream.rs](src/rayon_stream.rs)
* [Crossbeam](https://github.com/crossbeam-rs/crossbeam) - Parallel with partitions per thread,
see [crossbeam_stream.rs](src/crossbeam_stream.rs)
In addition, this implementation also supports the following extra flags:
```
--init Initialise each benchmark array at allocation time on the main thread
--malloc Use libc malloc instead of the Rust's allocator for benchmark array allocation
--pin Pin threads to distinct cores, this has NO effect in Rayon devices
```
There is an ongoing investigation on potential performance issues under NUMA situations. As part of
the experiment, this implementation made use of the
provisional [Allocator traits](https://github.com/rust-lang/rust/issues/32838) which requires rust
unstable. We hope a NUMA aware allocator will be available once the allocator API reaches rust
stable.
### Build & Run
Prerequisites:
* [Rust toolchain](https://www.rust-lang.org/tools/install)
Once the toolchain is installed, enable the nightly channel:
```shell
> rustup install nightly
> rustup default nightly # optional, this sets `+nightly` automatically for cargo calls later
```
With `cargo` on path, compile and run the benchmark with:
```shell
> cd rust-stream/
> cargo +nightly build --release # or simply `cargo build --release` if nightly channel is the default
> ./target/release/rust-stream --help
rust-stream 3.4.0
USAGE:
rust-stream [FLAGS] [OPTIONS]
FLAGS:
--csv Output as csv table
--float Use floats (rather than doubles)
-h, --help Prints help information
--init Initialise each benchmark array at allocation time on the main thread
--list List available devices
--malloc Use libc malloc instead of the Rust's allocator for benchmark array allocation
--mibibytes Use MiB=2^20 for bandwidth calculation (default MB=10^6)
--nstream-only Only run nstream
--pin Pin threads to distinct cores, this has NO effect in Rayon devices
--triad-only Only run triad
-V, --version Prints version information
OPTIONS:
-s, --arraysize <arraysize> Use <arraysize> elements in the array [default: 33554432]
--device <device> Select device at <device> [default: 0]
-n, --numtimes <numtimes> Run the test <numtimes> times (NUM >= 2) [default: 100]
```