Go to file

James Price c9b3d07b84 Fix OpenCL host code for dot kernel Wrong number of blocks was being copied and summed, and the host sums vector didn't have the correct size.		2016-10-24 12:49:58 +01:00
CL	Update cl2.hpp	2016-05-03 11:41:00 +01:00
results	Move HIP results into new directory structure	2016-10-21 12:57:31 +01:00
.gitignore	Add binary names to gitignore	2016-05-11 17:53:33 +01:00
ACCStream.cpp	Implement the reduction in OpenACC	2016-10-14 14:40:08 +01:00
ACCStream.h	Implement the reduction in OpenACC	2016-10-14 14:40:08 +01:00
CMakeLists.txt	Add support of HIP version of GPU-STREAM.	2016-09-05 23:41:01 -05:00
common.h.in	Add License text to all files	2016-05-03 12:32:03 +01:00
CUDAStream.cu	Fix CUDA host code for dot kernel	2016-10-24 12:47:25 +01:00
CUDAStream.h	Add a CUDA dot kernel	2016-10-14 17:51:40 +01:00
HIPStream.cu	move hip_runtime.h after copyright info	2016-10-12 10:41:50 -05:00
HIPStream.h	Add support of HIP version of GPU-STREAM.	2016-09-05 23:41:01 -05:00
KokkosMakefile	Fix Kokkos CMake so it works..	2016-05-12 12:35:47 +01:00
KOKKOSStream.cpp	Add dot kernel to Kokkos	2016-10-21 10:58:26 +01:00
KOKKOSStream.hpp	Add dot kernel to Kokkos	2016-10-21 10:58:26 +01:00
LICENSE	Add License text to all files	2016-05-03 12:32:03 +01:00
main.cpp	Fix verification of dot kernel	2016-10-24 12:47:01 +01:00
OCLStream.cpp	Fix OpenCL host code for dot kernel	2016-10-24 12:49:58 +01:00
OCLStream.h	Add an OpenCL dot kernel	2016-10-14 17:07:55 +01:00
OMP3Stream.cpp	Implement dot kernel in OpenMP 3	2016-10-14 15:05:06 +01:00
OMP3Stream.h	Implement dot kernel in OpenMP 3	2016-10-14 15:05:06 +01:00
OMP45Stream.cpp	Add dot kernel to OpenMP 4.5 - tested with clang-ykt	2016-10-14 15:19:25 +01:00
OMP45Stream.h	Add dot kernel to OpenMP 4.5 - tested with clang-ykt	2016-10-14 15:19:25 +01:00
RAJAStream.cpp	Add RAJA dot kernel	2016-10-24 11:34:40 +01:00
RAJAStream.hpp	Add RAJA dot kernel	2016-10-24 11:34:40 +01:00
README.md	Update citation	2016-07-19 15:46:08 +01:00
Stream.h	Add the dot routine to the abstract class	2016-10-14 14:39:48 +01:00
SYCLStream.cpp	[SYCL] Set WGSIZE to more sensible value for AMD Fiji	2016-07-07 09:40:16 +01:00
SYCLStream.h	Require SYCL array size to be multiple of WGSIZE	2016-05-11 12:23:21 +01:00

README.md

GPU-STREAM

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models. Currently implemented are:

OpenCL
CUDA
OpenACC
OpenMP 3 and 4.5
Kokkos
RAJA
SYCL

Usage

CMake 3.2 or above is required. Drivers, compiler and software applicable to whichever implementation you would like to build against. Our build system is designed to only build implementations in programming models that your system supports.

Generate the Makefile with cmake .

Android (outdated instructions)

Assuming you have a recent Android NDK available, you can use the toolchain that it provides to build GPU-STREAM. You should first use the NDK to generate a standalone toolchain:

# Select a directory to install the toolchain to
ANDROID_NATIVE_TOOLCHAIN=/path/to/toolchain

${NDK}/build/tools/make-standalone-toolchain.sh \
  --platform=android-14 \
  --toolchain=arm-linux-androideabi-4.8 \
  --install-dir=${ANDROID_NATIVE_TOOLCHAIN}

Make sure that the OpenCL headers and library (libOpenCL.so) are available in ${ANDROID_NATIVE_TOOLCHAIN}/sysroot/usr/.

You should then be able to build GPU-STREAM:

make CXX=${ANDROID_NATIVE_TOOLCHAIN}/bin/arm-linux-androideabi-g++

Copy the executable and OpenCL kernels to the device:

adb push gpu-stream-ocl /data/local/tmp
adb push ocl-stream-kernels.cl /data/local/tmp

Run GPU-STREAM from an adb shell:

adb shell
cd /data/local/tmp

# Use float if device doesn't support double, and reduce array size
./gpu-stream-ocl --float -n 6 -s 10000000

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Citing

You can view the Poster and Extended Abstract on GPU-STREAM presented at SC'15. Please cite GPU-STREAM via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany.

Other GPU-STREAM publications:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States.

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.