BabelStream/README.md
sunway513 11053798ff Improved GPU-STREAM benchmark for HIP version:
1. Add optional looper kernels to take command line input for the number of groups and groupSize
2. Add GEOMEAN value calculation of the kernels
3. Instructions on configure HIP environment in the README.md
4. Add results for HIP on FIJI Nano, TITAN X; CUDA on TITAN X
5. Run script to optionally run HIP version with groups and groupSize options
2016-03-15 07:56:32 -05:00

66 lines
2.3 KiB
Markdown

GPU-STREAM
==========
Measure memory transfer rates to/from global device memory on GPUs.
This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.
Unlike other GPU memory bandwidth benchmarks this does *not* include the PCIe transfer time.
Usage
-----
Build the OpenCL and CUDA binaries with `make` (CUDA version requires CUDA >= v6.5)
Run the OpenCL version with `./gpu-stream-ocl` and the CUDA version with `./gpu-stream-cuda`
For HIP version, follow the instructions on the following blog to properly install ROCK and ROCR drivers:
http://gpuopen.com/getting-started-with-boltzmann-components-platforms-installation/
Install the HCC compiler:
https://bitbucket.org/multicoreware/hcc/wiki/Home
Install HIP:
https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP
Build the HIP binaries with make gpu-stream-hip, run it with './gpu-stream-hip'
Android
-------
Assuming you have a recent Android NDK available, you can use the
toolchain that it provides to build GPU-STREAM. You should first
use the NDK to generate a standalone toolchain:
# Select a directory to install the toolchain to
ANDROID_NATIVE_TOOLCHAIN=/path/to/toolchain
${NDK}/build/tools/make-standalone-toolchain.sh \
--platform=android-14 \
--toolchain=arm-linux-androideabi-4.8 \
--install-dir=${ANDROID_NATIVE_TOOLCHAIN}
Make sure that the OpenCL headers and library (libOpenCL.so) are
available in `${ANDROID_NATIVE_TOOLCHAIN}/sysroot/usr/`.
You should then be able to build GPU-STREAM:
make CXX=${ANDROID_NATIVE_TOOLCHAIN}/bin/arm-linux-androideabi-g++
Copy the executable and OpenCL kernels to the device:
adb push gpu-stream-ocl /data/local/tmp
adb push ocl-stream-kernels.cl /data/local/tmp
Run GPU-STREAM from an adb shell:
adb shell
cd /data/local/tmp
# Use float if device doesn't support double, and reduce array size
./gpu-stream-ocl --float -n 6 -s 10000000
Results
-------
Sample results can be found in the `results` subdirectory. If you would like to submit updated results, please submit a Pull Request.
[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.