Go to file
Tom Deakin 6803a141ee [Kokkos] Rearrange Makefile variables on liner line.
Fixes #40. The issue was the library flags came before the object
files causing lots of undefined references.
2018-02-15 03:06:35 +00:00
CL Update cl2.hpp 2016-05-03 11:41:00 +01:00
results Move HIP results into new directory structure 2016-10-21 12:57:31 +01:00
.gitignore Add SYCL intermediate outputs to .gitignore 2017-02-23 19:38:14 +00:00
ACCStream.cpp Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
ACCStream.h Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
CHANGELOG.md [Kokkos] Set some meaningful output with --list argument. 2018-02-14 22:22:57 +00:00
CUDA.make pulled -O3 out into CXXFLAGS, refactored CUDA compiler into CUDA_CXX 2017-03-17 15:18:13 +01:00
CUDAStream.cu Use static shared memory in dot for CUDA and HIP 2017-02-28 13:24:45 +00:00
CUDAStream.h Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
HC.make enable propagation of preprocessor macros from CLI to compiler command 2017-07-31 14:21:16 +02:00
HCStream.cpp refactored n_tiles into preprocessor macro 2017-07-31 14:20:59 +02:00
HCStream.h moved experimental dot product implementation of dot_impl which is build only if -DHC_DEVELOP is given 2017-03-27 14:22:56 +02:00
HIP.make removed CUDA_PATH sentinel from HIP.make 2017-02-28 10:04:36 +01:00
HIPStream.cpp Use static shared memory in dot for CUDA and HIP 2017-02-28 13:24:45 +00:00
HIPStream.h Add dot kernel to HIP implementation 2017-02-23 19:08:25 +00:00
Kokkos.make [Kokkos] Rearrange Makefile variables on liner line. 2018-02-15 03:06:35 +00:00
KokkosStream.cpp [Kokkos] Set some meaningful output with --list argument. 2018-02-14 22:22:57 +00:00
KokkosStream.hpp [Kokkos] Remove defining View layout as Kokkos does it correctly by default. 2018-02-14 22:14:47 +00:00
LICENSE Rename to BabelStream 2017-04-08 12:16:29 +01:00
main.cpp [Kokkos] Rename files to match Kokkos case conventions 2018-02-14 22:05:50 +00:00
OCLStream.cpp Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
OCLStream.h Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
OMPStream.cpp [OpenMP 4.5] Remove superfluous map clauses 2018-02-07 15:05:06 +00:00
OMPStream.h Make OpenMP string name without version number 2016-12-09 12:24:08 +00:00
OpenACC.make Add OpenACC Volta flags 2017-11-10 15:33:22 +00:00
OpenCL.make Allow user to override CXX in OpenCL.make 2017-02-24 09:33:59 -06:00
OpenMP.make [OpenMP] Add -qopt-streaming-stores for Intel 2017-03-13 17:15:10 +00:00
RAJA.make [RAJA] Use xHost and streaming stores with the Intel compiler 2017-04-06 10:02:25 +01:00
RAJAStream.cpp [RAJA] Use Index_type for iterator index type instead of hardcoding int 2017-04-06 10:36:01 +01:00
RAJAStream.hpp Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
README.android Move android instructions to seperate file 2017-02-23 16:45:55 +00:00
README.md Add better link to SC17 publication in README 2018-02-07 15:22:04 +00:00
Stream.h Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
SYCL.make Changed name of sycl make var to match the sdk 2017-07-27 17:55:56 +01:00
SYCLStream.cpp SYCL implementation adapted to 1.2.1 interface 2017-12-08 12:49:21 +00:00
SYCLStream.h [SYCL] Fix multiple template specializations 2016-11-18 00:14:46 +00:00

BabelStream

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models. Currently implemented are:

  • OpenCL
  • CUDA
  • OpenACC
  • OpenMP 3 and 4.5
  • Kokkos
  • RAJA
  • SYCL

This code was previously called GPU-STREAM.

Website

uob-hpc.github.io/BabelStream/

Usage

Drivers, compiler and software applicable to whichever implementation you would like to build against is required.

We have supplied a series of Makefiles, one for each programming model, to assist with building. The Makefiles contain common build options, and should be simple to customise for your needs too.

General usage is make -f <Model>.make Common compiler flags and names can be set by passing a COMPILER option to Make, e.g. make COMPILER=GNU. Some models allow specifying a CPU or GPU style target, and this can be set by passing a TARGET option to Make, e.g. make TARGET=GPU.

Pass in extra flags via the EXTRA_FLAGS option.

The binaries are named in the form <model>-stream.

Building Kokkos

We use the following command to build Kokkos using the Intel Compiler, specifying the arch appropriately, e.g. KNL.

../generate_makefile.bash --prefix=<prefix> --with-openmp --with-pthread --arch=<arch> --compiler=icpc --cxxflags=-DKOKKOS_MEMORY_ALIGNMENT=2097152

For building with CUDA support, we use the following command, specifying the arch appropriately, e.g. Kepler35.

../generate_makefile.bash --prefix=<prefix> --with-cuda --with-openmp --with-pthread --arch=<arch> --with-cuda-options=enable_lambda --compiler=<path_to_kokkos_src>/bin/nvcc_wrapper

Building RAJA

We use the following command to build RAJA using the Intel Compiler.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DCMAKE_BUILD_TYPE=ICCBuild -DRAJA_ENABLE_TESTS=Off

For building with CUDA support, we use the following command.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DRAJA_ENABLE_CUDA=1 -DRAJA_ENABLE_TESTS=Off

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Citing

Please cite BabelStream via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany.

Other BabelStream publications:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States. You can view the Poster and Extended Abstract.

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM: Now in 2D!. 2016. Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, United States. You can view the Poster and Extended Abstract.

Raman K, Deakin T, Price J, McIntosh-Smith S. Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG Spring Meeting, Cambridge, UK, 2017.

Deakin T, Price J, Martineau M, McIntosh-Smith S. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering. Special issue (in press). 2017.

Deakin T, Price J, McIntosh-Smith S. Portable methods for measuring cache hierarchy performance. 2017. Poster sessions presented at IEEE/ACM SuperComputing, Denver, United States. You can view the Poster and Extended Abstract

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.