BabelStream/JuliaStream.jl
Tom Lin a26699c5b5 Add oneAPI and KA implementation
Isolate projects to avoid transitive dependency
Add parameter for passing devices
Incorporate further reviews
Update all dependencies
2021-08-17 14:28:47 +01:00
..
AMDGPU Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
CUDA Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
KernelAbstractions Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
oneAPI Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
src Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
Threaded Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
.gitignore Initial Julia implementation 2021-06-10 04:20:40 +01:00
.JuliaFormatter.toml Initial Julia implementation 2021-06-10 04:20:40 +01:00
Manifest.toml Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
Project.toml Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
README.md Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00
update_all.sh Add oneAPI and KA implementation 2021-08-17 14:28:47 +01:00

JuliaStream.jl

This is an implementation of BabelStream in Julia which contains the following variants:

  • PlainStream.jl - Single threaded for
  • ThreadedStream.jl - Threaded implementation with Threads.@threads macros
  • DistributedStream.jl - Process based parallelism with @distributed macros
  • CUDAStream.jl - Direct port of BabelStream's native CUDA implementation using CUDA.jl
  • AMDGPUStream.jl - Direct port of BabelStream's native HIP implementation using AMDGPU.jl
  • oneAPIStream.jl - Direct port of BabelStream's native SYCL implementation using oneAPI.jl
  • KernelAbstractions.jl - Direct port of miniBUDE's native CUDA implementation using KernelAbstractions.jl

Build & Run

Prerequisites

  • Julia >= 1.6+

A set of reduced dependency projects are available for the following backend and implementations:

  • AMDGPU supports:
    • AMDGPUStream.jl
  • CUDA supports:
    • CUDAStream.jl
  • oneAPI supports:
    • oneAPIStream.jl
  • KernelAbstractions supports:
    • KernelAbstractionsStream.jl
  • Threaded supports:
    • PlainStream.jl
    • ThreadedStream.jl
    • DistributedStream.jl

With Julia on path, run your selected benchmark with:

> cd JuliaStream.jl
> julia --project=<BACKEND> -e 'import Pkg; Pkg.instantiate()' # only required on first run
> julia --project=<BACKEND> src/<IMPL>Stream.jl

For example. to run the CUDA implementation:

> cd JuliaStream.jl
> julia --project=CUDA -e 'import Pkg; Pkg.instantiate()' 
> julia --project=CUDA src/CUDAStream.jl

Important:

  • Julia is 1-indexed, so N >= 1 in --device N.
  • Thread count for ThreadedStream must be set via the JULIA_NUM_THREADS environment variable (e.g export JULIA_NUM_THREADS=$(nproc)) otherwise it defaults to 1.
  • Worker count for DistributedStream is set with -p <N> as per the documentation.
  • Certain implementations such as CUDA and AMDGPU will do hardware detection at runtime and may download and/or compile further software packages for the platform.

Alternatively, the top-level project Project.toml contains all dependencies needed to run all implementations in src. There may be instances where some packages are locked to an older version because of transitive dependency requirements.

To run the benchmark using the top-level project, run the benchmark with:

> cd JuliaStream.jl
> julia --project -e 'import Pkg; Pkg.instantiate()'  
> julia --project src/<IMPL>Stream.jl