BabelStream/JuliaStream.jl
Tom Lin 4e6c56729b Inline AMDGPU's hard_wait
Show the selected implementation and not a constant "threaded"
2021-06-30 18:09:54 +01:00
..
src Inline AMDGPU's hard_wait 2021-06-30 18:09:54 +01:00
.gitignore Initial Julia implementation 2021-06-10 04:20:40 +01:00
.JuliaFormatter.toml Initial Julia implementation 2021-06-10 04:20:40 +01:00
Manifest.toml Initial Julia implementation 2021-06-10 04:20:40 +01:00
Project.toml Initial Julia implementation 2021-06-10 04:20:40 +01:00
README.md Use addprocs() for DistributedStream 2021-06-10 04:57:52 +01:00

JuliaStream.jl

This is an implementation of BabelStream in Julia which contains the following variants:

  • PlainStream.jl - Single threaded for
  • ThreadedStream.jl - Threaded implementation with Threads.@threads macros
  • DistributedStream.jl - Process based parallelism with @distributed macros
  • CUDAStream.jl - Direct port of BabelStream's native CUDA implementation using CUDA.jl
  • AMDGPUStream.jl - Direct port of BabelStream's native HIP implementation using AMDGPU.jl

Build & Run

Prerequisites

  • Julia 1.6+

With Julia on path, run the benchmark with:

> cd JuliaStream.jl
> julia --project -e 'import Pkg; Pkg.instantiate()' # only required on first run
> julia --project src/<IMPL>Stream.jl

Important:

  • Julia is 1-indexed, so N > 1 in --device N
  • Thread count for ThreadedStream must be set via the JULIA_NUM_THREADS environment variable (e.g export JULIA_NUM_THREADS=$(nproc)) otherwise it defaults to 1
  • DistributedStream uses addprocs() call directly which defaults to $(nproc), do not use the -p <N> flag as per the documentation.
  • Certain implementations such as CUDA and AMDGPU will do hardware detection at runtime and may download and/or compile further software packages for the platform.