Go to file

Tom Deakin 31cb567e21 Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp. Using integers for maths gets unstable past 38 interations even in double precision. Using the original values/10 is safe up to the default 100 iterations.		2016-05-11 15:51:19 +01:00
CL	Update cl2.hpp	2016-05-03 11:41:00 +01:00
results	Add Fury X result of csv file (also fix line endings here)	2015-09-21 15:38:52 +01:00
.gitignore	Add CMake things to gitignore	2016-05-03 12:18:41 +01:00
ACCStream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
ACCStream.h	Implement the OpenACC device string functions, and device selector	2016-05-03 14:50:09 +01:00
CMakeLists.txt	Remove ugly CMake endif text in parenthesis	2016-05-11 13:37:12 +01:00
common.h.in	Add License text to all files	2016-05-03 12:32:03 +01:00
CUDAStream.cu	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
CUDAStream.h	Set thread block size in CUDA with a #define, and check that array size is multiple of it	2016-05-11 12:21:29 +01:00
KOKKOSStream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
KOKKOSStream.hpp	Adjusted the Kokkos implementation to fix view initialisation, and store local copies of views for lambda scoping	2016-05-06 21:02:44 +01:00
LICENSE	Add License text to all files	2016-05-03 12:32:03 +01:00
main.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
OCLStream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
OCLStream.h	Add License text to all files	2016-05-03 12:32:03 +01:00
OMP3Stream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
OMP3Stream.h	Add reference OpenMP 3.0 version	2016-05-04 10:41:41 +01:00
OMP45Stream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
OMP45Stream.h	First attempt at OpenMP 4.5	2016-05-11 15:08:08 +01:00
RAJAStream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
RAJAStream.hpp	Fixed memory management for GPU, now working with OpenMP and CUDA	2016-05-06 13:17:04 +01:00
README.md	Add citation information to README	2016-03-15 09:17:46 +00:00
Stream.h	Add License text to all files	2016-05-03 12:32:03 +01:00
SYCLStream.cpp	Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.	2016-05-11 15:51:19 +01:00
SYCLStream.h	Require SYCL array size to be multiple of WGSIZE	2016-05-11 12:23:21 +01:00

README.md

GPU-STREAM

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

Usage

Build the OpenCL and CUDA binaries with make (CUDA version requires CUDA >= v6.5)

Run the OpenCL version with ./gpu-stream-ocl and the CUDA version with ./gpu-stream-cuda

Android

Assuming you have a recent Android NDK available, you can use the toolchain that it provides to build GPU-STREAM. You should first use the NDK to generate a standalone toolchain:

# Select a directory to install the toolchain to
ANDROID_NATIVE_TOOLCHAIN=/path/to/toolchain

${NDK}/build/tools/make-standalone-toolchain.sh \
  --platform=android-14 \
  --toolchain=arm-linux-androideabi-4.8 \
  --install-dir=${ANDROID_NATIVE_TOOLCHAIN}

Make sure that the OpenCL headers and library (libOpenCL.so) are available in ${ANDROID_NATIVE_TOOLCHAIN}/sysroot/usr/.

You should then be able to build GPU-STREAM:

make CXX=${ANDROID_NATIVE_TOOLCHAIN}/bin/arm-linux-androideabi-g++

Copy the executable and OpenCL kernels to the device:

adb push gpu-stream-ocl /data/local/tmp
adb push ocl-stream-kernels.cl /data/local/tmp

Run GPU-STREAM from an adb shell:

adb shell
cd /data/local/tmp

# Use float if device doesn't support double, and reduce array size
./gpu-stream-ocl --float -n 6 -s 10000000

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Citing

You can view the Poster and Extended Abstract on GPU-STREAM presented at SC'15. Please cite GPU-STREAM via this reference:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States.

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.