Merge pull request #155 from illuhad/patch-1
std-indices: Fix infinite recursion in ranged::operator!=
This commit is contained in:
commit
a0c5c00c9c
30
README.md
30
README.md
@ -18,6 +18,7 @@ This code was previously called GPU-STREAM.
|
||||
- [How is this different to STREAM?](#how-is-this-different-to-stream)
|
||||
- [Building](#building)
|
||||
- [CMake](#cmake)
|
||||
- [Spack](#spack)
|
||||
- [GNU Make (removed)](#gnu-make)
|
||||
- [Results](#results)
|
||||
- [Contributing](#contributing)
|
||||
@ -138,6 +139,35 @@ Alternatively, refer to the [CI script](./src/ci-test-compile.sh), which test-co
|
||||
|
||||
*It is recommended that you delete the `build` directory when you change any of the build flags.*
|
||||
|
||||
### Spack
|
||||
|
||||
|
||||
The project supports building with Spack >= 0.19.0, which can be installed without root via the [official GitHub repo](https://github.com/spack/spack).
|
||||
The BabelStream Spack Package source code could be accessed from the link [here](https://github.com/spack/spack/tree/develop/var/spack/repos/builtin/packages/babelstream/package.py).
|
||||
Each BabelStream implementation (programming model) is built as follows:
|
||||
|
||||
```shell
|
||||
|
||||
# Spack package installation starts with `spack install babelstream` for all programming models
|
||||
# The programming model wish to be build needs to be specified with `+` option
|
||||
# The model specific flags needs to be specified after defining model
|
||||
$ spack install babelstream@<version>%<compiler> +<model> <model specific flags>
|
||||
|
||||
|
||||
# The executables will be generated in:
|
||||
# SPACK_INSTALL_DIRECTORY/opt/spack/system-name/compiler-name/babelstream-version-identifier/bin/
|
||||
# this address will be printed at the end of generation which could be easily copied
|
||||
$ cd SPACK_INSTALL_DIRECTORY/opt/spack/system-name/compiler-name/babelstream-version-identifier/bin/
|
||||
$ ./<model>-stream
|
||||
```
|
||||
More detailed examples are provided in [Spack README file](./docs/spack_instructions.md).
|
||||
The `MODEL` variant selects one implementation of BabelStream to build.
|
||||
|
||||
Currently available models are:
|
||||
```
|
||||
omp;ocl;std;std20;hip;cuda;kokkos;sycl;sycl2020;acc;raja;tbb;thrust
|
||||
```
|
||||
|
||||
### GNU Make
|
||||
|
||||
Support for Make has been removed from 4.0 onwards.
|
||||
|
||||
239
docs/spack_instructions.md
Normal file
239
docs/spack_instructions.md
Normal file
@ -0,0 +1,239 @@
|
||||
# Spack Instructions
|
||||
|
||||
|
||||
## Table of contents
|
||||
* [OpenMP](#omp)
|
||||
* [OpenCL](#ocl)
|
||||
* [STD](#std)
|
||||
* [Hip(ROCM)](#hip)
|
||||
* [Cuda](#cuda)
|
||||
* [Kokkos](#kokkos)
|
||||
* [Sycl](#sycl)
|
||||
* [Sycl2020](#)
|
||||
* [ACC](#acc)
|
||||
* [Raja](#raja)
|
||||
* [Tbb](#tbb)
|
||||
* [Thrust](#thrust)
|
||||
|
||||
## OpenMP
|
||||
|
||||
* There are 3 offloading options for OpenMP: NVIDIA, AMD and Intel.
|
||||
* If a user provides a value for `cuda_arch`, the execution will be automatically offloaded to NVIDIA.
|
||||
* If a user provides a value for `amdgpu_target`, the operation will be offloaded to AMD.
|
||||
* In the absence of `cuda_arch` and `amdgpu_target`, the execution will be offloaded to Intel.
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6) <br />- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures|
|
||||
|amdgpu_target| List of supported architectures are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/rocm.py#LL93C1-L125C19) |
|
||||
|
||||
|
||||
```shell
|
||||
# Example 1: for Intel offload
|
||||
$ spack install babelstream%oneapi +omp
|
||||
|
||||
# Example 2: for Nvidia GPU for Volta (sm_70)
|
||||
$ spack install babelstream +omp cuda_arch=70
|
||||
|
||||
# Example 3: for AMD GPU gfx701
|
||||
$ spack install babelstream +omp amdgpu_target=gfx701
|
||||
```
|
||||
|
||||
|
||||
## OpenCL
|
||||
|
||||
* No need to specify `amdgpu_target` or `cuda_arch` here since we are using AMD and CUDA as backend respectively.
|
||||
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| backend | 4 different backend options: <br />- cuda <br />- amd <br />- intel <br />- pocl |
|
||||
|
||||
|
||||
```shell
|
||||
# Example 1: CUDA backend
|
||||
$ spack install babelstream%gcc +ocl backend=cuda
|
||||
|
||||
# Example 2: AMD backend
|
||||
$ spack install babelstream%gcc +ocl backend=amd
|
||||
|
||||
# Example 3: Intel backend
|
||||
$ spack install babelstream%gcc +ocl backend=intel
|
||||
|
||||
# Example 4: POCL backend
|
||||
$ spack install babelstream%gcc +ocl backend=pocl
|
||||
```
|
||||
|
||||
## STD
|
||||
* Minimum GCC version requirement `10.1.0`
|
||||
* NVHPC Offload will be added in the future release
|
||||
|
||||
```shell
|
||||
# Example 1: data
|
||||
$ spack install babelstream +stddata
|
||||
|
||||
# Example 2: ranges
|
||||
$ spack install babelstream +stdranges
|
||||
|
||||
# Example 3: indices
|
||||
$ spack install babelstream +stdindices
|
||||
|
||||
```
|
||||
|
||||
## HIP(ROCM)
|
||||
|
||||
* `amdgpu_target` and `flags` are optional here.
|
||||
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
|amdgpu_target| List of supported architectures are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/rocm.py#LL93C1-L125C19) |
|
||||
|flags | Extra flags to pass |
|
||||
|
||||
|
||||
|
||||
```shell
|
||||
# Example 1: ROCM default
|
||||
$ spack install babelstream +rocm
|
||||
|
||||
# Example 2: ROCM with GPU target
|
||||
$ spack install babelstream +rocm amdgpu_target=<gfx701>
|
||||
|
||||
# Example 3: ROCM with extra flags option
|
||||
$ spack install babelstream +rocm flags=<xxx>
|
||||
|
||||
# Example 4: ROCM with GPU target and extra flags
|
||||
$ spack install babelstream +rocm amdgpu_target=<gfx701> flags=<xxx>
|
||||
```
|
||||
|
||||
## CUDA
|
||||
|
||||
* The `cuda_arch` value is mandatory here.
|
||||
* If a user provides a value for `mem`, device memory mode will be chosen accordingly
|
||||
* If a user provides a value for `flags`, additional CUDA flags will be passed to NVCC
|
||||
* In the absence of `mem` and `flags`, the execution will choose **DEFAULT** for device memory mode and no additional flags will be passed
|
||||
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6) <br />- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures|
|
||||
|mem| Device memory mode: <br />- **DEFAULT** allocate host and device memory pointers.<br />- **MANAGED** use CUDA Managed Memory.<br />- **PAGEFAULT** shared memory, only host pointers allocated |
|
||||
|flags | Extra flags to pass |
|
||||
|
||||
```shell
|
||||
# Example 1: CUDA no mem and flags specified
|
||||
$ spack install babelstream +cuda cuda_arch=<70>
|
||||
|
||||
# Example 2: for Nvidia GPU for Volta (sm_70)
|
||||
$ spack install babelstream +cuda cuda_arch=<70> mem=<managed>
|
||||
|
||||
# Example 3: CUDA with mem and flags specified
|
||||
$ spack install babelstream +cuda cuda_arch=<70> mem=<managed> flags=<CUDA_EXTRA_FLAGS>
|
||||
```
|
||||
|
||||
## Kokkos
|
||||
|
||||
* Kokkos implementation requires kokkos source folder to be provided because it builds it from the scratch
|
||||
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| dir | Download the kokkos release from github repository ( https://github.com/kokkos/kokkos ) and extract the zip file to a directory you want and target this directory with `dir` flag |
|
||||
| backend | 2 different backend options: <br />- cuda <br />- omp |
|
||||
| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6) <br />- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures|
|
||||
|
||||
|
||||
```shell
|
||||
# Example 1: No Backend option specified
|
||||
$ spack install babelstream +kokkos dir=</home/user/Downloads/kokkos-x.x.xx>
|
||||
|
||||
# Example 2: CUDA backend
|
||||
$ spack install babelstream +kokkos backend=cuda cuda_arch=70 dir=</home/user/Downloads/kokkos-x.x.xx>
|
||||
|
||||
# Example 3: OMP backend
|
||||
$ spack install babelstream +kokkos backend=omp dir=</home/user/Downloads/kokkos-x.x.xx>
|
||||
|
||||
```
|
||||
|
||||
|
||||
## SYCL2020
|
||||
* Instructions for installing the intel compilers are provided [here](https://spack.readthedocs.io/en/latest/build_systems/inteloneapipackage.html#building-a-package-with-icx)
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| implementation | 3 different implementation options: <br />- OneAPI-ICPX <br />- OneAPI-DPCPP <br />- Compute-CPP <br />|
|
||||
|
||||
```shell
|
||||
# Example 1: No implementation option specified (build for OneAPI-ICPX)
|
||||
$ spack install babelstream%oneapi +sycl2020
|
||||
|
||||
# Example 2: OneAPI-DPCPP implementation
|
||||
$ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP
|
||||
```
|
||||
|
||||
## SYCL
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| implementation | 2 different implementation options: <br />- OneAPI-DPCPP <br />- Compute-CPP <br />|
|
||||
|
||||
```shell
|
||||
# Example 1: OneAPI-DPCPP implementation
|
||||
$ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP
|
||||
```
|
||||
## ACC
|
||||
* Target device selection process is automatic with 2 options:
|
||||
* **gpu** : Globally set the target device to an NVIDIA GPU automatically if `cuda_arch` is specified
|
||||
* **multicore** : Globally set the target device to the host CPU automatically if `cpu_arch` is specified
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6) <br />- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures|
|
||||
| CPU_ARCH | This sets the `-tp` (target processor) flag, possible values are: <br /> `px` - Generic x86 Processor <br /> `bulldozer` - AMD Bulldozer processor <br /> `piledriver` - AMD Piledriver processor <br /> `zen` - AMD Zen architecture (Epyc, Ryzen) <br /> `zen2` - AMD Zen 2 architecture (Ryzen 2) <br /> `sandybridge` - Intel SandyBridge processor <br /> `haswell` - Intel Haswell processor <br /> `knl` - Intel Knights Landing processor <br /> `skylake` - Intel Skylake Xeon processor <br /> `host` - Link native version of HPC SDK cpu math library <br /> `native` - Alias for -tp host | `cpu_arch=skylake` |
|
||||
|
||||
```shell
|
||||
# Example 1: For GPU Run
|
||||
$ spack install babelstream +acc cuda_arch=<70>
|
||||
|
||||
# Example 2: For Multicore CPU Run
|
||||
$ spack install babelstream +acc cpu_arch=<bulldozer>
|
||||
```
|
||||
|
||||
## RAJA
|
||||
* RAJA implementation requires RAJA source folder to be provided because it builds it from the scratch
|
||||
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
| dir | Download the Raja release from github repository and extract the zip file to a directory you want and target this directory with `dir` flag |
|
||||
| backend | 2 different backend options: <br />- cuda <br />- omp |
|
||||
|offload| Choose offloading platform `offload= [cpu]/[nvidia]` |
|
||||
|
||||
```shell
|
||||
# Example 1: For CPU offload with backend OMP
|
||||
$ spack install babelstream +raja offload=cpu backend=omp dir=/home/dir/raja
|
||||
```
|
||||
|
||||
## TBB
|
||||
```shell
|
||||
# Example:
|
||||
$ spack install babelstream +tbb
|
||||
```
|
||||
|
||||
## THRUST
|
||||
|
||||
| Flag | Definition |
|
||||
|-----------| ----------------------------------|
|
||||
|implementation| Choose one of the implementation for Thrust. Options are `cuda` and `rocm` | `implementation = [cuda]/[rocm]` |
|
||||
|backend| CUDA's Thrust implementation supports the following backends:- CUDA- OMP - TBB |
|
||||
| cuda_arch |- List of supported compute capabilities are provided [here](https://github.com/spack/spack/blob/0f271883831bec6da3fc64c92eb1805c39a9f09a/lib/spack/spack/build_systems/cuda.py#LL19C1-L47C6) <br />- Useful [link](https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/) for matching CUDA gencodes with NVIDIA architectures|
|
||||
|flags | Additional CUDA flags passed to nvcc, this is appended after `CUDA_ARCH` |
|
||||
|
||||
```shell
|
||||
# Example1: CUDA implementation
|
||||
$ spack install babelstream +thrust implementation=cuda backend=cuda cuda_arch=<70> flags=<option>
|
||||
$ spack install babelstream +thrust implementation=cuda backend=omp cuda_arch=<70> flags=<option>
|
||||
$ spack install babelstream +thrust implementation=cuda backend=tbb cuda_arch=<70> flags=<option>
|
||||
# Example1: ROCM implementation
|
||||
* spack install babelstream +thrust implementation=rocm backend=<option>
|
||||
```
|
||||
@ -37,7 +37,7 @@ public:
|
||||
iterator operator+=(int x) { iterator retval = *this; this->num+=x; return retval; }
|
||||
|
||||
bool operator==(iterator other) const { return num == other.num; }
|
||||
bool operator!=(iterator other) const { return *this != other; }
|
||||
bool operator!=(iterator other) const { return num != other.num; }
|
||||
bool operator<(iterator other) const { return num < other.num; }
|
||||
|
||||
reference operator*() const { return num;}
|
||||
|
||||
Loading…
Reference in New Issue
Block a user