History

Tom Lin 4e6c56729b Inline AMDGPU's hard_wait Show the selected implementation and not a constant "threaded"		2021-06-30 18:09:54 +01:00
..
src	Inline AMDGPU's hard_wait	2021-06-30 18:09:54 +01:00
.gitignore	Initial Julia implementation	2021-06-10 04:20:40 +01:00
.JuliaFormatter.toml	Initial Julia implementation	2021-06-10 04:20:40 +01:00
Manifest.toml	Initial Julia implementation	2021-06-10 04:20:40 +01:00
Project.toml	Initial Julia implementation	2021-06-10 04:20:40 +01:00
README.md	Use addprocs() for DistributedStream	2021-06-10 04:57:52 +01:00

JuliaStream.jl

This is an implementation of BabelStream in Julia which contains the following variants:

PlainStream.jl - Single threaded for
ThreadedStream.jl - Threaded implementation with Threads.@threads macros
DistributedStream.jl - Process based parallelism with @distributed macros
CUDAStream.jl - Direct port of BabelStream's native CUDA implementation using CUDA.jl
AMDGPUStream.jl - Direct port of BabelStream's native HIP implementation using AMDGPU.jl

Prerequisites

With Julia on path, run the benchmark with:

> cd JuliaStream.jl
> julia --project -e 'import Pkg; Pkg.instantiate()' # only required on first run
> julia --project src/<IMPL>Stream.jl

Important:

Julia is 1-indexed, so N > 1 in --device N
Thread count for ThreadedStream must be set via the JULIA_NUM_THREADS environment variable (e.g export JULIA_NUM_THREADS=$(nproc)) otherwise it defaults to 1
DistributedStream uses addprocs() call directly which defaults to $(nproc), do not use the -p <N> flag as per the documentation.
Certain implementations such as CUDA and AMDGPU will do hardware detection at runtime and may download and/or compile further software packages for the platform.