Go to file
2016-05-11 12:23:21 +01:00
CL Update cl2.hpp 2016-05-03 11:41:00 +01:00
results Add Fury X result of csv file (also fix line endings here) 2015-09-21 15:38:52 +01:00
.gitignore Add CMake things to gitignore 2016-05-03 12:18:41 +01:00
ACCStream.cpp Implement the OpenACC device string functions, and device selector 2016-05-03 14:50:09 +01:00
ACCStream.h Implement the OpenACC device string functions, and device selector 2016-05-03 14:50:09 +01:00
CMakeLists.txt [SYCL] Pass -no-serial-memop to compute++ to squelch warning 2016-05-06 22:33:18 +01:00
common.h.in Add License text to all files 2016-05-03 12:32:03 +01:00
CUDAStream.cu Set thread block size in CUDA with a #define, and check that array size is multiple of it 2016-05-11 12:21:29 +01:00
CUDAStream.h Set thread block size in CUDA with a #define, and check that array size is multiple of it 2016-05-11 12:21:29 +01:00
KOKKOSStream.cpp Fixed deep copy ordering, which was reversed 2016-05-06 21:08:23 +01:00
KOKKOSStream.hpp Adjusted the Kokkos implementation to fix view initialisation, and store local copies of views for lambda scoping 2016-05-06 21:02:44 +01:00
LICENSE Add License text to all files 2016-05-03 12:32:03 +01:00
main.cpp Default to power of two array size 2016-05-11 12:04:19 +01:00
OCLStream.cpp Add License text to all files 2016-05-03 12:32:03 +01:00
OCLStream.h Add License text to all files 2016-05-03 12:32:03 +01:00
OMP3Stream.cpp Add reference OpenMP 3.0 version 2016-05-04 10:41:41 +01:00
OMP3Stream.h Add reference OpenMP 3.0 version 2016-05-04 10:41:41 +01:00
RAJAStream.cpp Fixed memory management for GPU, now working with OpenMP and CUDA 2016-05-06 13:17:04 +01:00
RAJAStream.hpp Fixed memory management for GPU, now working with OpenMP and CUDA 2016-05-06 13:17:04 +01:00
README.md Add citation information to README 2016-03-15 09:17:46 +00:00
Stream.h Add License text to all files 2016-05-03 12:32:03 +01:00
SYCLStream.cpp Require SYCL array size to be multiple of WGSIZE 2016-05-11 12:23:21 +01:00
SYCLStream.h Require SYCL array size to be multiple of WGSIZE 2016-05-11 12:23:21 +01:00

GPU-STREAM

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

Usage

Build the OpenCL and CUDA binaries with make (CUDA version requires CUDA >= v6.5)

Run the OpenCL version with ./gpu-stream-ocl and the CUDA version with ./gpu-stream-cuda

Android

Assuming you have a recent Android NDK available, you can use the toolchain that it provides to build GPU-STREAM. You should first use the NDK to generate a standalone toolchain:

# Select a directory to install the toolchain to
ANDROID_NATIVE_TOOLCHAIN=/path/to/toolchain

${NDK}/build/tools/make-standalone-toolchain.sh \
  --platform=android-14 \
  --toolchain=arm-linux-androideabi-4.8 \
  --install-dir=${ANDROID_NATIVE_TOOLCHAIN}

Make sure that the OpenCL headers and library (libOpenCL.so) are available in ${ANDROID_NATIVE_TOOLCHAIN}/sysroot/usr/.

You should then be able to build GPU-STREAM:

make CXX=${ANDROID_NATIVE_TOOLCHAIN}/bin/arm-linux-androideabi-g++

Copy the executable and OpenCL kernels to the device:

adb push gpu-stream-ocl /data/local/tmp
adb push ocl-stream-kernels.cl /data/local/tmp

Run GPU-STREAM from an adb shell:

adb shell
cd /data/local/tmp

# Use float if device doesn't support double, and reduce array size
./gpu-stream-ocl --float -n 6 -s 10000000

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Citing

You can view the Poster and Extended Abstract on GPU-STREAM presented at SC'15. Please cite GPU-STREAM via this reference:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States.

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.