BabelStream/rust-stream
Tom Lin fdb2c181cc Add Crossbeam implementation
Add rustfmt and use target-cpu=native
Add option for libc malloc, basic thread pinning, touch-free allocation
Split modules
2021-06-15 23:13:14 +01:00
..
.cargo Add Crossbeam implementation 2021-06-15 23:13:14 +01:00
src Add Crossbeam implementation 2021-06-15 23:13:14 +01:00
.gitignore Initial Rust implementation 2021-03-25 15:40:16 +00:00
Cargo.lock Add Crossbeam implementation 2021-06-15 23:13:14 +01:00
Cargo.toml Add Crossbeam implementation 2021-06-15 23:13:14 +01:00
README.md Add Crossbeam implementation 2021-06-15 23:13:14 +01:00
rustfmt.toml Add Crossbeam implementation 2021-06-15 23:13:14 +01:00

rust-stream

This is an implementation of BabelStream in Rust.

Currently, we support three CPU threading API as devices:

In addition, this implementation also supports the following extra flags:

--init    Initialise each benchmark array at allocation time on the main thread
--malloc  Use libc malloc instead of the Rust's allocator for benchmark array allocation
--pin     Pin threads to distinct cores, this has NO effect in Rayon devices

There is an ongoing investigation on potential performance issues under NUMA situations. As part of the experiment, this implementation made use of the provisional Allocator traits which requires rust unstable. We hope a NUMA aware allocator will be available once the allocator API reaches rust stable.

Build & Run

Prerequisites:

Once the toolchain is installed, enable the nightly channel:

> rustup install nightly
> rustup default nightly # optional, this sets `+nightly` automatically for cargo calls later

With cargo on path, compile and run the benchmark with:

> cd rust-stream/
> cargo +nightly build --release # or simply `cargo build --release` if nightly channel is the default 
> ./target/release/rust-stream --help
rust-stream 3.4.0

USAGE:
    rust-stream [FLAGS] [OPTIONS]

FLAGS:
        --csv             Output as csv table
        --float           Use floats (rather than doubles)
    -h, --help            Prints help information
        --init            Initialise each benchmark array at allocation time on the main thread
        --list            List available devices
        --malloc          Use libc malloc instead of the Rust's allocator for benchmark array allocation
        --mibibytes       Use MiB=2^20 for bandwidth calculation (default MB=10^6)
        --nstream-only    Only run nstream
        --pin             Pin threads to distinct cores, this has NO effect in Rayon devices
        --triad-only      Only run triad
    -V, --version         Prints version information

OPTIONS:
    -s, --arraysize <arraysize>    Use <arraysize> elements in the array [default: 33554432]
        --device <device>          Select device at <device> [default: 0]
    -n, --numtimes <numtimes>      Run the test <numtimes> times (NUM >= 2) [default: 100]