Go to file

Tobias Burnus 75a4394830 Include stdlib.h for aligned_alloc Silence "error: there are no arguments to 'aligned_alloc' that depend on a template parameter, so a declaration of 'aligned_alloc' must be available" * OMPStream.cpp: #include <cstdlib>. * RAJAStream.cpp: Likewise.		2021-04-22 09:00:55 +02:00
.github/workflows	Re-add all compile and arch dependent flags	2021-03-11 15:46:23 +00:00
CL	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
cmake	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
legacy	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
results	Add Titan Xp numbers	2018-05-07 11:42:11 -04:00
.gitignore	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
ACC.cmake	Re-add all compile and arch dependent flags	2021-03-11 15:46:23 +00:00
ACCStream.cpp	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
ACCStream.h	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
CHANGELOG.md	Disable CI for RAJA on gcc-10+CUDA due to ICE	2021-04-21 16:28:12 +01:00
ci-prepare-bionic.sh	CMake: Update CI rocm to 4.1.0	2021-03-30 13:02:48 +01:00
ci-test-compile.sh	Disable CI for RAJA on gcc-10+CUDA due to ICE	2021-04-21 16:28:12 +01:00
CMakeLists.txt	Improve CMake messages	2021-03-30 17:08:03 +03:00
CUDA.cmake	Re-add all compile and arch dependent flags	2021-03-11 15:46:23 +00:00
CUDA.make	Tidy CUDA memory mode Makefile	2021-02-02 12:33:18 +00:00
CUDAStream.cu	Add CUDA nstream kernel	2021-02-02 12:32:33 +00:00
CUDAStream.h	Add CUDA nstream kernel	2021-02-02 12:32:33 +00:00
HIP.cmake	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
HIP.make	Add -O3 flat to HIP.make to fix segmentation fault	2020-08-12 14:09:22 +01:00
HIPStream.cpp	Add nstream kernel to HIP	2021-02-03 11:25:26 +00:00
HIPStream.h	Add nstream kernel to HIP	2021-02-03 11:25:26 +00:00
KOKKOS.cmake	Default to C++11	2021-03-24 17:20:11 +00:00
Kokkos.make	Fixed a bug where ComputeCpp's flags is omitted	2020-08-07 11:00:56 +01:00
KokkosStream.cpp	Add nstream to Kokkos	2021-02-02 15:58:00 +00:00
KokkosStream.hpp	Add nstream to Kokkos	2021-02-02 15:58:00 +00:00
LICENSE	Rename to BabelStream	2017-04-08 12:16:29 +01:00
main.cpp	Add option to run nstream in isolation	2021-02-18 13:32:35 +00:00
OCL.cmake	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
OCLStream.cpp	Add nstream kernel to OpenCL	2021-02-02 15:46:53 +00:00
OCLStream.h	Add nstream kernel to OpenCL	2021-02-02 15:46:53 +00:00
OMP.cmake	Use model name as exe prefix	2021-03-23 18:16:42 +00:00
OMPStream.cpp	Include stdlib.h for aligned_alloc	2021-04-22 09:00:55 +02:00
OMPStream.h	Add OpenMP nstream kernel	2021-02-02 11:44:37 +00:00
OpenACC.make	Remove Cray flags for OpenACC following removal of support in latest compiler	2020-07-10 13:27:21 +01:00
OpenCL.make	Allow user to override CXX in OpenCL.make	2017-02-24 09:33:59 -06:00
OpenMP.make	Add missing OpenMP flag to Intel CPU builds	2021-02-02 11:49:16 +00:00
RAJA.cmake	Re-add all compile and arch dependent flags	2021-03-11 15:46:23 +00:00
RAJA.make	[RAJA] Use xHost and streaming stores with the Intel compiler	2017-04-06 10:02:25 +01:00
RAJAStream.cpp	Include stdlib.h for aligned_alloc	2021-04-22 09:00:55 +02:00
RAJAStream.hpp	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
README.android	Move android instructions to seperate file	2017-02-23 16:45:55 +00:00
README.md	Fix some README typos	2021-03-30 16:54:48 +03:00
register_models.cmake	Improve CMake messages	2021-03-30 17:08:03 +03:00
STD20.cmake	Default to C++11	2021-03-24 17:20:11 +00:00
STD20.make	Add C++20 version using for_each_n and range factories	2020-12-07 14:55:54 +00:00
STD20Stream.cpp	Add nstream to C++ STD version -- untested as compilers not ready	2021-02-03 10:54:33 +00:00
STD20Stream.hpp	Add nstream to C++ STD version -- untested as compilers not ready	2021-02-03 10:54:33 +00:00
STD.cmake	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
STD.make	Add NVIDIA HPC SDK C++ parallel STL implementation	2020-11-23 03:08:44 -08:00
STDStream.cpp	Add nstream to C++ STD version -- untested as compilers not ready	2021-02-03 10:54:33 +00:00
STDStream.h	Initial CMake+CI integration	2021-03-05 13:41:35 +00:00
Stream.h	Revert "Update initial starting values"	2021-02-18 11:06:14 +00:00
SYCL.cmake	Re-add all compile and arch dependent flags	2021-03-11 15:46:23 +00:00
SYCL.make	Clean up SYCL.make with unified build target	2021-02-17 17:17:20 +00:00
SYCLStream.cpp	Add SYCL 1.2.1 nstream kernel	2021-02-02 12:29:03 +00:00
SYCLStream.h	Fix int to size_t narrowing for SYCL, closes #92	2021-03-10 15:48:41 +00:00

README.md

BabelStream

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models. Currently implemented are:

OpenCL
CUDA
OpenACC
OpenMP 3 and 4.5
C++ Parallel STL
Kokkos
RAJA
SYCL

This code was previously called GPU-STREAM.

How is this different to STREAM?

BabelStream implements the four main kernels of the STREAM benchmark (along with a dot product), but by utilising different programming models expands the platforms which the code can run beyond CPUs.

The key differences from STREAM are that:

the arrays are allocated on the heap
the problem size is unknown at compile time
wider platform and programming model support

With stack arrays of known size at compile time, the compiler is able to align data and issue optimal instructions (such as non-temporal stores, remove peel/remainder vectorisation loops, etc.). But this information is not typically available in real HPC codes today, where the problem size is read from the user at runtime.

BabelStream therefore provides a measure of what memory bandwidth performance can be attained (by a particular programming model) if you follow today's best parallel programming best practice.

BabelStream also includes the nstream kernel from the Parallel Research Kernels (PRK) project, available on GitHub. Details about PRK can be found in the following references:

Van der Wijngaart, Rob F., and Timothy G. Mattson. The parallel research kernels. IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014.

R. F. Van der Wijngaart, A. Kayi, J. R. Hammond, G. Jost, T. St. John, S. Sridharan, T. G. Mattson, J. Abercrombie, and J. Nelson. Comparing runtime systems with exascale ambitions using the Parallel Research Kernels. ISC 2016, DOI: 10.1007/978-3-319-41321-1_17.

Jeff R. Hammond and Timothy G. Mattson. Evaluating data parallelism in C++ using the Parallel Research Kernels. IWOCL 2019, DOI: 10.1145/3318170.3318192.

Website

uob-hpc.github.io/BabelStream/

Usage

Drivers, compiler and software applicable to whichever implementation you would like to build against is required.

CMake

The project supports building with CMake >= 3.13.0, it can be installed without root via the official script. As with any CMake project, first configure the project:

> cd babelstream
> cmake -Bbuild -H. -DMODEL=<model> <model specific flags prefixed with -D...> # configure the build, build type defaults to Release 
> cmake --build build # compile it 
> ./build/babelstream # executable available at ./build/

By default, we have defined a set of optimal flags for known HPC compilers. There are assigned those to RELEASE_FLAGS, and you can override them if required.

To find out what flag each model supports or requires, simply configure while only specifying the model. For example:

> cd babelstream
> cmake -Bbuild -H. -DMODEL=OCL 
...
- Common Release flags are `-O3`, set RELEASE_FLAGS to override
-- CXX_EXTRA_FLAGS: 
        Appends to common compile flags. These will be used at link phase at well.
        To use separate flags at link time, set `CXX_EXTRA_LINKER_FLAGS`
-- CXX_EXTRA_LINK_FLAGS: 
        Appends to link flags which appear *before* the objects.
        Do not use this for linking libraries, as the link line is order-dependent
-- CXX_EXTRA_LIBRARIES: 
        Append to link flags which appears *after* the objects.
        Use this for linking extra libraries (e.g `-lmylib`, or simply `mylib`) 
-- CXX_EXTRA_LINKER_FLAGS: 
        Append to linker flags (i.e GCC's `-Wl` or equivalent)
-- Available models:  OMP;OCL;STD;STD20;HIP;CUDA;KOKKOS;SYCL;ACC;RAJA
-- Selected model  :  OCL
-- Supported flags:

   CMAKE_CXX_COMPILER (optional, default=c++): Any CXX compiler that is supported by CMake detection
   OpenCL_LIBRARY (optional, default=): Path to OpenCL library, usually called libOpenCL.so
...

Alternatively, refer to the CI script, which test-compiles most of the models, and see which flags are used there.

It is recommended that you delete the build directory when you change any of the build flags.

GNU Make

We have supplied a series of Makefiles, one for each programming model, to assist with building. The Makefiles contain common build options, and should be simple to customise for your needs too.

General usage is make -f <Model>.make Common compiler flags and names can be set by passing a COMPILER option to Make, e.g. make COMPILER=GNU. Some models allow specifying a CPU or GPU style target, and this can be set by passing a TARGET option to Make, e.g. make TARGET=GPU.

Pass in extra flags via the EXTRA_FLAGS option.

The binaries are named in the form <model>-stream.

Building Kokkos for Make

Kokkos version >= 3 requires setting the KOKKOS_PATH flag to the source directory of a distribution. For example:

cd 
wget https://github.com/kokkos/kokkos/archive/3.1.01.tar.gz
tar -xvf 3.1.01.tar.gz # should end up with ~/kokkos-3.1.01
cd BabelStream
make -f Kokkos.make KOKKOS_PATH=~/kokkos-3.1.01

See make output for more information on supported flags.

Building RAJA for Make

We use the following command to build RAJA using the Intel Compiler.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DCMAKE_BUILD_TYPE=ICCBuild -DRAJA_ENABLE_TESTS=Off

For building with CUDA support, we use the following command.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DRAJA_ENABLE_CUDA=1 -DRAJA_ENABLE_TESTS=Off

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Contributing

As of v4.0, the main branch of this repository will hold the latest released version.

The develop branch will contain unreleased features due for the next (major and/or minor) release of BabelStream. Pull Requests should be made against the develop branch.

Citing

Please cite BabelStream via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany.

Other BabelStream publications:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States. You can view the Poster and Extended Abstract.

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM: Now in 2D!. 2016. Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, United States. You can view the Poster and Extended Abstract.

Raman K, Deakin T, Price J, McIntosh-Smith S. Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG Spring Meeting, Cambridge, UK, 2017.

Deakin T, Price J, Martineau M, McIntosh-Smith S. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering. Special issue (in press). 2017.

Deakin T, Price J, McIntosh-Smith S. Portable methods for measuring cache hierarchy performance. 2017. Poster sessions presented at IEEE/ACM SuperComputing, Denver, United States. You can view the Poster and Extended Abstract

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.