diff --git a/README.md b/README.md index e4c2a15..7df3397 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ This code was previously called GPU-STREAM. - [How is this different to STREAM?](#how-is-this-different-to-stream) - [Building](#building) - [CMake](#cmake) + - [Spack](#spack) - [GNU Make (removed)](#gnu-make) - [Results](#results) - [Contributing](#contributing) @@ -138,6 +139,35 @@ Alternatively, refer to the [CI script](./src/ci-test-compile.sh), which test-co *It is recommended that you delete the `build` directory when you change any of the build flags.* +### Spack + + +The project supports building with Spack >= 0.19.0, which can be installed without root via the [official GitHub repo](https://github.com/spack/spack). +The BabelStream Spack Package source code could be accessed from the link [here](https://github.com/spack/spack/tree/develop/var/spack/repos/builtin/packages/babelstream/package.py). +Each BabelStream implementation (programming model) is built as follows: + +```shell + +# Spack package installation starts with `spack install babelstream` for all programming models +# The programming model wish to be build needs to be specified with `+` option +# The model specific flags needs to be specified after defining model +$ spack install babelstream@% + + + +# The executables will be generated in: +# SPACK_INSTALL_DIRECTORY/opt/spack/system-name/compiler-name/babelstream-version-identifier/bin/ +# this address will be printed at the end of generation which could be easily copied +$ cd SPACK_INSTALL_DIRECTORY/opt/spack/system-name/compiler-name/babelstream-version-identifier/bin/ +$ ./-stream +``` +More detailed examples are provided in [Spack README file](./docs/spack_instructions.md). +The `MODEL` variant selects one implementation of BabelStream to build. + +Currently available models are: +``` +omp;ocl;std;std20;hip;cuda;kokkos;sycl;sycl2020;acc;raja;tbb;thrust +``` + ### GNU Make Support for Make has been removed from 4.0 onwards. diff --git a/docs/spack_instructions.md b/docs/spack_instructions.md new file mode 100644 index 0000000..708deed --- /dev/null +++ b/docs/spack_instructions.md @@ -0,0 +1,239 @@ +# Spack Instructions + + +## Table of contents +* [OpenMP](#omp) +* [OpenCL](#ocl) +* [STD](#std) +* [Hip(ROCM)](#hip) +* [Cuda](#cuda) +* [Kokkos](#kokkos) +* [Sycl](#sycl) +* [Sycl2020](#) +* [ACC](#acc) +* [Raja](#raja) +* [Tbb](#tbb) +* [Thrust](#thrust) + +## OpenMP + +* There are 3 offloading options for OpenMP: NVIDIA, AMD and Intel. +* If a user provides a value for `cuda_arch`, the execution will be automatically offloaded to NVIDIA. +* If a user provides a value for `amdgpu_target`, the operation will be offloaded to AMD. +* In the absence of `cuda_arch` and `amdgpu_target`, the execution will be offloaded to Intel. + +| Flag | Definition | +|-----------| ----------------------------------| +| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6)
- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures| +|amdgpu_target| List of supported architectures are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/rocm.py#LL93C1-L125C19) | + + +```shell +# Example 1: for Intel offload + $ spack install babelstream%oneapi +omp + +# Example 2: for Nvidia GPU for Volta (sm_70) + $ spack install babelstream +omp cuda_arch=70 + +# Example 3: for AMD GPU gfx701 + $ spack install babelstream +omp amdgpu_target=gfx701 +``` + + +## OpenCL + +* No need to specify `amdgpu_target` or `cuda_arch` here since we are using AMD and CUDA as backend respectively. + + +| Flag | Definition | +|-----------| ----------------------------------| +| backend | 4 different backend options:
- cuda
- amd
- intel
- pocl | + + +```shell +# Example 1: CUDA backend + $ spack install babelstream%gcc +ocl backend=cuda + +# Example 2: AMD backend + $ spack install babelstream%gcc +ocl backend=amd + +# Example 3: Intel backend + $ spack install babelstream%gcc +ocl backend=intel + +# Example 4: POCL backend + $ spack install babelstream%gcc +ocl backend=pocl +``` + +## STD +* Minimum GCC version requirement `10.1.0` +* NVHPC Offload will be added in the future release + +```shell +# Example 1: data + $ spack install babelstream +stddata + +# Example 2: ranges + $ spack install babelstream +stdranges + +# Example 3: indices + $ spack install babelstream +stdindices + +``` + +## HIP(ROCM) + +* `amdgpu_target` and `flags` are optional here. + + +| Flag | Definition | +|-----------| ----------------------------------| +|amdgpu_target| List of supported architectures are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/rocm.py#LL93C1-L125C19) | +|flags | Extra flags to pass | + + + +```shell +# Example 1: ROCM default + $ spack install babelstream +rocm + +# Example 2: ROCM with GPU target + $ spack install babelstream +rocm amdgpu_target= + +# Example 3: ROCM with extra flags option + $ spack install babelstream +rocm flags= + +# Example 4: ROCM with GPU target and extra flags + $ spack install babelstream +rocm amdgpu_target= flags= +``` + +## CUDA + +* The `cuda_arch` value is mandatory here. +* If a user provides a value for `mem`, device memory mode will be chosen accordingly +* If a user provides a value for `flags`, additional CUDA flags will be passed to NVCC +* In the absence of `mem` and `flags`, the execution will choose **DEFAULT** for device memory mode and no additional flags will be passed + + +| Flag | Definition | +|-----------| ----------------------------------| +| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6)
- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures| +|mem| Device memory mode:
- **DEFAULT** allocate host and device memory pointers.
- **MANAGED** use CUDA Managed Memory.
- **PAGEFAULT** shared memory, only host pointers allocated | +|flags | Extra flags to pass | + +```shell +# Example 1: CUDA no mem and flags specified + $ spack install babelstream +cuda cuda_arch=<70> + +# Example 2: for Nvidia GPU for Volta (sm_70) + $ spack install babelstream +cuda cuda_arch=<70> mem= + +# Example 3: CUDA with mem and flags specified + $ spack install babelstream +cuda cuda_arch=<70> mem= flags= +``` + +## Kokkos + +* Kokkos implementation requires kokkos source folder to be provided because it builds it from the scratch + + +| Flag | Definition | +|-----------| ----------------------------------| +| dir | Download the kokkos release from github repository ( https://github.com/kokkos/kokkos ) and extract the zip file to a directory you want and target this directory with `dir` flag | +| backend | 2 different backend options:
- cuda
- omp | +| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6)
- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures| + + +```shell +# Example 1: No Backend option specified + $ spack install babelstream +kokkos dir= + +# Example 2: CUDA backend + $ spack install babelstream +kokkos backend=cuda cuda_arch=70 dir= + +# Example 3: OMP backend + $ spack install babelstream +kokkos backend=omp dir= + +``` + + +## SYCL2020 +* Instructions for installing the intel compilers are provided [here](https://spack.readthedocs.io/en/latest/build_systems/inteloneapipackage.html#building-a-package-with-icx) + +| Flag | Definition | +|-----------| ----------------------------------| +| implementation | 3 different implementation options:
- OneAPI-ICPX
- OneAPI-DPCPP
- Compute-CPP
| + +```shell +# Example 1: No implementation option specified (build for OneAPI-ICPX) + $ spack install babelstream%oneapi +sycl2020 + +# Example 2: OneAPI-DPCPP implementation + $ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP +``` + +## SYCL + +| Flag | Definition | +|-----------| ----------------------------------| +| implementation | 2 different implementation options:
- OneAPI-DPCPP
- Compute-CPP
| + +```shell +# Example 1: OneAPI-DPCPP implementation + $ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP +``` +## ACC +* Target device selection process is automatic with 2 options: + * **gpu** : Globally set the target device to an NVIDIA GPU automatically if `cuda_arch` is specified + * **multicore** : Globally set the target device to the host CPU automatically if `cpu_arch` is specified + +| Flag | Definition | +|-----------| ----------------------------------| +| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6)
- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures| +| CPU_ARCH | This sets the `-tp` (target processor) flag, possible values are:
`px` - Generic x86 Processor
`bulldozer` - AMD Bulldozer processor
`piledriver` - AMD Piledriver processor
`zen` - AMD Zen architecture (Epyc, Ryzen)
`zen2` - AMD Zen 2 architecture (Ryzen 2)
`sandybridge` - Intel SandyBridge processor
`haswell` - Intel Haswell processor
`knl` - Intel Knights Landing processor
`skylake` - Intel Skylake Xeon processor
`host` - Link native version of HPC SDK cpu math library
`native` - Alias for -tp host | `cpu_arch=skylake` | + +```shell +# Example 1: For GPU Run + $ spack install babelstream +acc cuda_arch=<70> + +# Example 2: For Multicore CPU Run + $ spack install babelstream +acc cpu_arch= +``` + +## RAJA +* RAJA implementation requires RAJA source folder to be provided because it builds it from the scratch + + +| Flag | Definition | +|-----------| ----------------------------------| +| dir | Download the Raja release from github repository and extract the zip file to a directory you want and target this directory with `dir` flag | +| backend | 2 different backend options:
- cuda
- omp | +|offload| Choose offloading platform `offload= [cpu]/[nvidia]` | + +```shell +# Example 1: For CPU offload with backend OMP + $ spack install babelstream +raja offload=cpu backend=omp dir=/home/dir/raja +``` + +## TBB +```shell +# Example: + $ spack install babelstream +tbb +``` + +## THRUST + +| Flag | Definition | +|-----------| ----------------------------------| +|implementation| Choose one of the implementation for Thrust. Options are `cuda` and `rocm` | `implementation = [cuda]/[rocm]` | +|backend| CUDA's Thrust implementation supports the following backends:- CUDA- OMP - TBB | +| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6)
- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures| +|flags | Additional CUDA flags passed to nvcc, this is appended after `CUDA_ARCH` | + +```shell +# Example1: CUDA implementation +$ spack install babelstream +thrust implementation=cuda backend=cuda cuda_arch=<70> flags=