Go to file

Jeff Hammond 66491909e4 BabelStream Fortran This is a new implementation of BabelStream using Fortran. The code uses a Fortran driver that is largely equivalent to the C++ one, with a few exceptions. First, it does not use a C++ class for the stream object, since that doesn't seem like a useful way to do things in Fortran. Instead, I use a module that contains the same methods, and which has alloc and dealloc that act like CTOR and DTOR. The current implementations are: - DO CONCURRENT - Fortran array notation - Sequential DO loops - OpenACC parallel loop - OpenACC kernels on Fortran array notation - OpenMP parallel do - OpenMP taskloop - OpenMP target teams distribute parallel do simd - OpenMP target teams loop - CUDA Fortran (handwritten CUDA Fortran kernels, except DOT) - CUDA Fortran kernels (!$cuf kernel do <<<,>>>) I have tested with GCC, Intel (ifort and ifx), and NVHPC compilers on AArch64, x86_64 and NVIDIA GPU targets, although not exhaustively. Cray and Fujitsu have been tested as well. The only untested compiler of significance is IBM XLF. The current build system is GNU Make, and requires the user to manually specify the compiler and implementation. CSV printing is supported. Squashed commit of the following: commit 15f13ef9d326102cc003b2fdfe1b31c4aea55373 Author: Jeff Hammond <email> Date: Tue Nov 15 06:42:46 2022 +0200 8 cores unless user changes commit 62ca680546ff89a1987b6fb797273038f767bf7b Author: Jeff Hammond <email> Date: Tue Nov 15 06:42:09 2022 +0200 hoist and disable orin flags commit 76495509abcdb0686f293a72f7ded7c8ed7bb882 Author: Jeff Hammond <email> Date: Tue Nov 15 06:40:13 2022 +0200 cleanup scripts commit 5b45df87954282cbb6b0f7eb2dcb3570d08bb5c2 Author: Jeff Hammond <email> Date: Tue Nov 15 06:39:31 2022 +0200 add autopar flag for GCC commit 87eb07e4a8c3e8d6247ab5f72e14bf90002733ce Merge: a732e7c 270644e Author: Jeff Hammond <email> Date: Wed Nov 9 15:53:41 2022 +0200 Merge remote-tracking branch 'origin/fortran_compiler_details' into fortran-ports commit a732e7c49e12ce8aff15e9d4bcbd215fa4a05d82 Merge: cfafd99 5697d94 Author: Jeff Hammond <email> Date: Wed Nov 9 15:53:36 2022 +0200 Merge remote-tracking branch 'origin/fortran_int32_option' into fortran-ports commit cfafd993b646d5f5a90eb6d37d347cc545ab36d4 Merge: de5ff67 26a9707 Author: Jeff Hammond <email> Date: Wed Nov 9 15:53:25 2022 +0200 Merge remote-tracking branch 'origin/fortran_csv' into fortran-ports commit de5ff6772b2036ad259a6a9c331ff5408146b54c Merge: 3109653 `1d0755f` Author: Jeff Hammond <email> Date: Wed Nov 9 15:51:40 2022 +0200 Merge branch 'UoB-HPC:main' into fortran-ports commit 310965399a9b518122ff610b61419cdaab75ecd0 Author: Jeff Hammond <jehammond@nvidia.com> Date: Mon Sep 26 03:39:01 2022 -0700 because gomp so confict commit 270644e6fb89e8f3c3bfe4d73c9896fc3094d761 Author: Jeff Hammond <email> Date: Fri Sep 16 11:46:49 2022 +0300 add compiler info flag commit 5697d94a9ce5162de9445f5fde76f8020eae8b83 Author: Jeff Hammond <email> Date: Sun Sep 4 13:59:57 2022 +0300 implement INT32 indexing commit 830ad58dd2c985b9a2425093c0eed9ec1c7887dd Author: Jeff Hammond <email> Date: Sun Sep 4 13:49:17 2022 +0300 remove swear words from debugging commit 26a9707a1f09249d04206adf647587e42cf5fab5 Author: Jeff Hammond <email> Date: Sun Sep 4 13:47:18 2022 +0300 add an option for giga/gibi-bytes commit 4f6d693c03ca1b092d3bf003cdfcc367b8ad86ac Author: Jeff Hammond <email> Date: Sun Sep 4 13:41:32 2022 +0300 CSV output seems done Signed-off-by: Jeff Hammond <email> commit 94e62be05c11b9ef208f7ad09402ddf26e4586ae Merge: ad52adc 772c183 Author: Jeff Hammond <email> Date: Sun Sep 4 12:59:01 2022 +0300 Merge branch 'fortran_nan_check' into fortran_csv commit 772c183de2fb1a8ea72ae7ef3c45c17895c4fdc9 Author: Jeff Hammond <email> Date: Sun Sep 4 10:44:26 2022 +0300 fixed NaN check commit ad52adc9ba6eb702c0fefdf1d9a8d1830b74830b Author: Jeff Hammond <email> Date: Sun Sep 4 10:28:00 2022 +0300 CSV WIP commit 6f7cefc42ca286ae3b698d827fd7c9ee14984ecb Author: Jeff Hammond <email> Date: Sun Sep 4 10:08:14 2022 +0300 update help output commit 208207597d150fafa059ca593ac30bc9a2e6d1a7 Author: Jeff Hammond <email> Date: Sun Sep 4 10:02:24 2022 +0300 add option for cpu_time intrinsic timer also adjust use statements and rename macro for OpenMP timer Signed-off-by: Jeff Hammond <email> commit 78fa2fcb1087f00efd94dd911000dc0d485da406 Author: Jeff Hammond <email> Date: Tue Aug 30 17:19:36 2022 +0300 add check for normal (not NaN, not Inf, not denormal) the previous error check failed to detect garbage results because comparisons against NaN always return true. i flipped the logical comparison and added a check for IEEE normal to prevent this. it works on the case that was missed previously. Signed-off-by: Jeff Hammond <email> commit 22fc9fe918a378f47c88dbad3ce91a4a6688789b Author: Jeff Hammond <email> Date: Tue Aug 30 17:19:30 2022 +0300 move commit d2d8c8555d2665fc553f9263a6767843ec14def8 Author: Jeff Hammond <email> Date: Tue Aug 30 16:29:15 2022 +0300 so far so good commit ffe181536b78ef845f861a09ca0dc72d4fffcbe8 Author: Jeff Hammond <email> Date: Tue Aug 30 16:29:09 2022 +0300 so far so good commit aa72b46a8187792ca819f9720c032e802525413a Author: Jeff Hammond <email> Date: Tue Aug 30 16:28:52 2022 +0300 GPU on by default commit 0fc9e4acdd0fbb5b6d9399962fc6a1daaa4a84da Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 25 16:38:08 2022 +0300 better commit b1cbd6f5b6a7534502d29e14d1c09fa6be378dd8 Merge: bf14601 5fe03c6 Author: Jeff Hammond <email> Date: Thu Aug 25 16:35:22 2022 +0300 Merge branch 'fortran-ports' of https://github.com/jeffhammond/BabelStream into fortran-ports commit bf146011d6ee1ac9dd0cb6d43bb4e60b8cc37acf Author: Jeff Hammond <email> Date: Thu Aug 25 16:35:07 2022 +0300 autodetect GPU arch in build (who needs CMake?) commit 5fe03c664e318a33bd0d383fddf8e76a2266a4e0 Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 25 15:57:41 2022 +0300 be smarter and check for compilers in path commit a187612a68447302fbd036d717df53b2780df3b4 Author: Jeff Hammond <email> Date: Thu Aug 25 15:35:58 2022 +0300 remove samsung paths commit 82af886943a67980dda1724edae7686c6d280e1e Merge: a46bf6b 0f59b50 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 13:22:13 2022 +0300 merge fix plus build updates commit 0f59b5014477c9a3da5eeb97328e6c55554a8c24 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 08:43:19 2022 +0000 typo in USE_OPENMP_TIMERS commit 4a9a0019585b0f03c42f151042ad592cba03d8b3 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 08:42:59 2022 +0000 logic fix commit 74d8123864fdb603b409112f5b9c0e92c2a93071 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 03:05:58 2022 -0500 no-gpu option commit dc1e39ff34e384ae66f50ab787e9ca8c92701c3b Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 03:05:17 2022 -0500 fix default case commit 0b2b0e0bb754b0ac86dd16eeb30db092a3b3e658 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 02:57:02 2022 -0500 fix tp for aarch64 commit 1e213bec76d2e7f5f161a18eb365f2948563c925 Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 24 07:46:41 2022 +0000 fix MARCH and build.sh elif commit a46bf6b48eb730a2fa08ccd8dddd04725fe25371 Author: Jeff Hammond <jehammond@nvidia.com> Date: Tue Aug 23 16:43:22 2022 +0300 orin updates commit a9fe9c028c08b9f0d468ee56f24970817087099d Merge: 2ab14de 9f4bee4 Author: Jeff Hammond <jehammond@nvidia.com> Date: Tue Aug 23 06:32:01 2022 -0700 more CPU specialization fixes commit 2ab14de1535f71fd1b548a10585b035ed88daa26 Author: Jeff Hammond <jehammond@nvidia.com> Date: Tue Aug 23 06:30:37 2022 -0700 more CPU specialization fixes commit 9f4bee439c36b592321f4af38235450cfb23cdf2 Author: Jeff Hammond <email> Date: Tue Aug 23 16:12:13 2022 +0300 build and run updates commit aeff0854478e5f16536b11034f459ea387a222a2 Author: Jeff Hammond <email> Date: Tue Aug 23 15:56:25 2022 +0300 aesthetics commit 89b1ab01369cd71d5bbb837474799c75eabd64b5 Author: Jeff Hammond <email> Date: Tue Aug 23 15:56:08 2022 +0300 handle march flag better commit a284bfa6da9bbb1aa9de5e8d40b74c316e90f3c6 Author: Jeff Hammond <email> Date: Tue Aug 23 15:56:04 2022 +0300 handle march flag better commit c18c3945eb053581f2cdf528961f158c4aa66271 Author: Jeff Hammond <email> Date: Tue Aug 23 15:53:11 2022 +0300 handle march flag better commit a3a8ccf453a2ff7cc99a774b5a6262648690f7c8 Author: Jeff Hammond <jehammond@nvidia.com> Date: Tue Aug 23 05:29:41 2022 -0700 brewster updates commit 1364c4100f4bb6241e2db5805a64625a66c9d2fa Author: Tom Deakin <thomasdeakin@gmail.com> Date: Sun Aug 21 17:16:20 2022 +0100 Add Fujitsu compiler flags commit b82fe2cb38cab940d0bebf613e22ea9685a21d06 Author: Jeff Hammond <email> Date: Sun Aug 21 15:40:28 2022 +0300 FJ timer workaround commit c1b2fa81155c4d6a3717793c5670b1b0d4cf6101 Author: Jeff Hammond <email> Date: Sun Aug 21 15:29:13 2022 +0300 intel update/fix commit 063ef879d9c3a3010a0be3b9baad7600f62e52bf Author: Jeff Hammond <email> Date: Sun Aug 21 04:43:29 2022 -0700 NERSC AMD compiler commit 2c68292667b62f3428fc8cf4dfa874a5b44e625d Merge: 2bdbbe8 ca98948 Author: Jeff Hammond <jehammond@nvidia.com> Date: Sun Aug 21 02:12:12 2022 -0700 Merge branch 'fortran-ports' of https://github.com/jeffhammond/BabelStream into fortran-ports commit 2bdbbe81d782268fd7f48889fd6eeea32d5f1f58 Author: Jeff Hammond <jehammond@nvidia.com> Date: Sun Aug 21 02:11:27 2022 -0700 AMD ROCM buikd commit ca9894801fdcca705e5d06c703af3a0f4e888c01 Author: Jeff Hammond <jehammond@nvidia.com> Date: Sun Aug 21 09:10:16 2022 +0000 AWS stuff commit 4c539efda9522810dadc64c65339ce22ea6822b4 Author: Jeff Hammond <jehammond@nvidia.com> Date: Sun Aug 21 09:09:59 2022 +0000 merge commit c3830658f8d403f602f3270b8f34b6ebd405c3e3 Author: Jeff Hammond <email> Date: Sun Aug 21 02:08:46 2022 -0700 NERSC stuff commit 7d7f746206e1ace8753778fcd2416d5ae30b7470 Merge: 1fefb8e d929852 Author: Jeff Hammond <email> Date: Sat Aug 20 20:56:09 2022 -0700 Merge branch 'fortran-ports' of https://github.com/jeffhammond/BabelStream into fortran-ports commit 1fefb8e657764b43cbcaf63278e051ead53bd29a Author: Jeff Hammond <email> Date: Sat Aug 20 20:55:16 2022 -0700 Cray temp stuff commit d92985239b31e16d478ca3a8a740baba2c35c164 Author: Jeff Hammond <jehammond@nvidia.com> Date: Fri Aug 19 02:11:07 2022 -0700 Xeon stuff commit 3f19e451bbc856ed6aa221077e51bd0578e48426 Merge: 38f28e1 c8dd609 Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 18 13:56:37 2022 +0000 Merge branch 'fortran-ports' of https://github.com/jeffhammond/BabelStream into fortran-ports commit 38f28e193c76970e5b6f641b437c6faefb9c608b Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 18 13:54:12 2022 +0000 TARGET for cpu too commit 6be181a07a93281a51cb897edf703404ead2c83e Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 18 13:52:58 2022 +0000 AWS flags commit e88479e09176510f707e410a4e69ea5290b2619e Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 18 13:52:42 2022 +0000 ARM stuff for AWS commit 1ee26cb3675b5e2739ddc21f56a1a864ff681950 Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 18 13:52:24 2022 +0000 disable shared for portability commit c8dd6099d95792b17abbcb025f771c3ae0ed773e Merge: 8bda56d `1b67999` Author: Jeff Hammond <email> Date: Thu Aug 18 15:23:16 2022 +0300 Merge branch 'UoB-HPC:main' into fortran-ports commit 8bda56dd9053fdacc77aac572401bc4c7806efa0 Author: Jeff Hammond <email> Date: Wed Aug 17 03:07:13 2022 -0700 add Cray compiler to build system - ignore temp files generated by Cray Fortran - workaround Cray not having reduce commit 3a0fec620d7ce5317a3260826087a26e0faee36c Author: Jeff Hammond <jehammond@nvidia.com> Date: Wed Aug 17 02:09:19 2022 -0700 remove LOCAL, which causes problems commit e5a70ddbd995567c28a4c74373481c01a7489c88 Author: Jeff Hammond <email> Date: Wed Aug 10 22:26:50 2022 +0300 add a way to use managed/device for everything DC uses managed by default. no way to not use it and be strictly standard right now. managed affects performance in some cases, so we want to compare apples-to-apples. thanks to Jeff Larkin for helping with this. Signed-off-by: Jeff Hammond <email> commit 8fe956ab62737aecdec1ce7785a659587d814653 Author: Jeff Hammond <email> Date: Wed Aug 10 22:26:41 2022 +0300 only do GPU flag for IFX commit de49723a7ae864847a2136353a45a49502291373 Author: Jeff Hammond <email> Date: Wed Aug 10 22:26:23 2022 +0300 helper scripts commit e0971aa15d6fac2bc1de6e5080b53f7288975fe9 Author: Jeff Hammond <email> Date: Wed Aug 10 22:26:21 2022 +0300 helper scripts commit a7ba50a60d321cab8e0f63d841b893c01a7df6b6 Author: Jeff Hammond <email> Date: Wed Aug 10 12:29:28 2022 +0300 remove all the compiled intermediates with wildcard commit 31a594e82ec7b75d626639948eba532d503c4d81 Author: Jeff Hammond <jehammond@nvidia.com> Date: Fri Aug 5 03:31:32 2022 -0700 build stuff commit 2cd3acd0f3cee82b60e5b05ac8dc01da3452bd1f Author: Jeff Hammond <jehammond@nvidia.com> Date: Fri Aug 5 02:09:17 2022 -0700 build all with unique names commit ac230d127e15bdc9e56450862f9627d55da37f59 Author: Jeff Hammond <email> Date: Fri Aug 5 09:28:03 2022 +0300 fix make clean commit bd0ef7736a43e26864167eb61345704731acbefa Author: Jeff Hammond <email> Date: Fri Aug 5 09:24:12 2022 +0300 build check update commit 662520c4e443b841a88f1a4fe833bdb77b7cfd45 Author: Jeff Hammond <email> Date: Fri Aug 5 09:21:48 2022 +0300 CUDA kernel version commit 25c321987b349f85a13f0140ae316382aa71e601 Author: Jeff Hammond <email> Date: Fri Aug 5 09:15:32 2022 +0300 fixed CUDA Fortran dot commit 64612d2604401c2f200a3689b4248ecf7c93adaf Author: Jeff Hammond <email> Date: Fri Aug 5 09:10:49 2022 +0300 CUDA Fortran working except DOT commit 4d35fe51a22978cc77bdd6311b7d15654856c564 Author: Jeff Hammond <email> Date: Fri Aug 5 08:48:17 2022 +0300 CUDA Fortran is not compiling yet commit 0967c36695518a0c7bf7ee4c62a412f51338708e Author: Jeff Hammond <email> Date: Fri Aug 5 07:50:40 2022 +0300 workshare commit 3ed69ea9ea655c364181144f21f0bfc0d3afa13c Author: Jeff Hammond <email> Date: Fri Aug 5 07:42:49 2022 +0300 target loop commit 30dfb574c0c4435f09fc5a6e53644f9ab7fd95f3 Author: Jeff Hammond <email> Date: Fri Aug 5 07:31:41 2022 +0300 OpenMP target commit a5306ce5c1144f38223074b786240db07a66b6bf Author: Jeff Hammond <email> Date: Fri Aug 5 07:17:58 2022 +0300 makefile errors on non support commit 854c8135f5d80d5cecce22042d761a3f75a5ee13 Author: Jeff Hammond <email> Date: Fri Aug 5 07:15:12 2022 +0300 fix taskloop commit f2894c583346410e14461988d68012d8469e583c Author: Jeff Hammond <email> Date: Fri Aug 5 07:11:26 2022 +0300 add taskloop part 1 commit b7c0a43e9b49eed7ee54a4b4a8470118c092a922 Author: Jeff Hammond <email> Date: Fri Aug 5 07:07:54 2022 +0300 add OpenMP traditional commit 7dafcc385f547738b9972f98b9e93f87e22d468c Author: Jeff Hammond <email> Date: Fri Aug 5 07:02:36 2022 +0300 add OpenACC kernels + Array implementation commit 096e7d281015b09e5a099e5a1eb8b9b3e46cea5f Author: Jeff Hammond <email> Date: Fri Aug 5 06:53:13 2022 +0300 formatting commit 284b62b47e508799dc49c85ea3d7a8d1f34f87a9 Author: Jeff Hammond <email> Date: Thu Aug 4 19:41:27 2022 +0300 add placeholder for CSV commit 516bdd5929a13c17348040b031931485ca32e40e Author: Jeff Hammond <email> Date: Thu Aug 4 19:14:00 2022 +0300 add --float commit d4e0ccaf6c00e6109e6130b3fd7c604df6feaa28 Author: Jeff Hammond <email> Date: Thu Aug 4 19:13:23 2022 +0300 default message updates commit e8452f1c2e30fb84533b75a43ac9f5f265c96f60 Author: Jeff Hammond <email> Date: Thu Aug 4 17:58:48 2022 +0300 list devices etc commit a80e82c323a5b0d1bffc524a8219de51cbdba8d2 Author: Jeff Hammond <email> Date: Thu Aug 4 14:07:02 2022 +0300 better build system commit c3b090cf1f28641a9e34e331ab37cb055e82eec4 Author: Jeff Hammond <email> Date: Thu Aug 4 14:03:27 2022 +0300 refactor build system commit 096cd43b7bc49751c17d686519620a7a4b1e5677 Author: Jeff Hammond <email> Date: Thu Aug 4 13:43:17 2022 +0300 cleanup the rest commit 1e4fb8125e0729b32e8ec6d87f30d935310f55ca Author: Jeff Hammond <email> Date: Thu Aug 4 13:40:38 2022 +0300 add Intel build and fix syntax issuse commit db3a9307b57bbc82456f9d52a6ff20d6e37b4083 Author: Jeff Hammond <email> Date: Thu Aug 4 13:34:43 2022 +0300 use modern character syntax commit b66bd707d64a1823a3a7bd8a2c6acf30ce9043be Author: Jeff Hammond <email> Date: Thu Aug 4 12:10:59 2022 +0300 printing commit ff842f62b952b5a61decfac80fd9b51dc56546d3 Author: Jeff Hammond <jehammond@nvidia.com> Date: Thu Aug 4 11:06:43 2022 +0300 build stuff commit 05791085dd4cdde8b07f5b33a78ce051f4c8dd1d Author: Jeff Hammond <email> Date: Wed Aug 3 20:10:33 2022 +0300 add OpenACC commit bb76b757a2765640b4a7bfb8d2d4850f96c478f7 Author: Jeff Hammond <email> Date: Wed Aug 3 20:04:12 2022 +0300 better clean commit 2f53530d0f7f3d0cb4e138e2d76c325d81bbab8d Author: Jeff Hammond <email> Date: Wed Aug 3 20:03:04 2022 +0300 Sequential loop Stream commit f5c0eaee60b04dfeabd96750c8b34694d2757f54 Author: Jeff Hammond <email> Date: Wed Aug 3 19:56:54 2022 +0300 add array notation commit 76f836b1836b83006285ef69b4457abea39b400d Author: Jeff Hammond <email> Date: Wed Aug 3 10:05:46 2022 +0300 implement BabelStream in Fortran 1. only DO CONCURRENT is supported right now. 2. the structure mostly matches C++ except we do not make a stream class. 3. there is no option for float versus double right now. it will be a compile-time choice later. Signed-off-by: Jeff Hammond <email>		2022-11-15 14:29:56 +02:00
.github/workflows	Fix rust CI path	2021-12-09 16:22:52 +00:00
cmake	Move CMakeList.txt to top level	2021-12-03 13:26:09 +00:00
results	Add Titan Xp numbers	2018-05-07 11:42:11 -04:00
src	BabelStream Fortran	2022-11-15 14:29:56 +02:00
.gitignore	BabelStream Fortran	2022-11-15 14:29:56 +02:00
CHANGELOG.md	update changelog	2022-08-16 15:45:11 +00:00
CITATION.cff	CITATION cannot yet handle external references	2021-07-28 10:31:39 +01:00
CMakeLists.txt	Fix a bug in the CMake script where override flags are ignored	2022-02-14 13:37:50 +09:00
LICENSE	Rename to BabelStream	2017-04-08 12:16:29 +01:00
README.md	Update preferred Citation in README	2022-04-27 12:20:10 +01:00

README.md

BabelStream

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models.

This code was previously called GPU-STREAM.

Programming Models
How is this different to STREAM?
Building
- CMake
- GNU Make (removed)
Results
Contributing
Citing
- Other BabelStream publications

Programming Models

BabelStream is currently implemented in the following parallel programming models, listed in no particular order:

OpenCL
CUDA
HIP
OpenACC
OpenMP 3 and 4.5
C++ Parallel STL
Kokkos
RAJA
SYCL and SYCL 2020
TBB
Thrust (via CUDA or HIP)

This project also contains implementations in alternative languages with different build systems:

Julia - JuliaStream.jl
Java - java-stream
Scala - scala-stream
Rust - rust-stream

How is this different to STREAM?

BabelStream implements the four main kernels of the STREAM benchmark (along with a dot product), but by utilising different programming models expands the platforms which the code can run beyond CPUs.

The key differences from STREAM are that:

the arrays are allocated on the heap
the problem size is unknown at compile time
wider platform and programming model support

With stack arrays of known size at compile time, the compiler is able to align data and issue optimal instructions (such as non-temporal stores, remove peel/remainder vectorisation loops, etc.). But this information is not typically available in real HPC codes today, where the problem size is read from the user at runtime.

BabelStream therefore provides a measure of what memory bandwidth performance can be attained (by a particular programming model) if you follow today's best parallel programming best practice.

BabelStream also includes the nstream kernel from the Parallel Research Kernels (PRK) project, available on GitHub. Details about PRK can be found in the following references:

Van der Wijngaart, Rob F., and Timothy G. Mattson. The parallel research kernels. IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014.
R. F. Van der Wijngaart, A. Kayi, J. R. Hammond, G. Jost, T. St. John, S. Sridharan, T. G. Mattson, J. Abercrombie, and J. Nelson. Comparing runtime systems with exascale ambitions using the Parallel Research Kernels. ISC 2016, DOI: 10.1007/978-3-319-41321-1_17.
Jeff R. Hammond and Timothy G. Mattson. Evaluating data parallelism in C++ using the Parallel Research Kernels. IWOCL 2019, DOI: 10.1145/3318170.3318192.

Building

Drivers, compiler and software applicable to whichever implementation you would like to build against is required.

CMake

The project supports building with CMake >= 3.13.0, which can be installed without root via the official script.

Each BabelStream implementation (programming model) is built as follows:

$ cd babelstream

# configure the build, build type defaults to Release
# The -DMODEL flag is required
$ cmake -Bbuild -H. -DMODEL=<model> <model specific flags prefixed with -D...>

# compile
$ cmake --build build

# run executables in ./build
$ ./build/<model>-stream

The MODEL option selects one implementation of BabelStream to build. The source for each model's implementations are located in ./src/<model>.

Currently available models are:

omp;ocl;std;std20;hip;cuda;kokkos;sycl;sycl2020;acc;raja;tbb;thrust

Overriding default flags

By default, we have defined a set of optimal flags for known HPC compilers. There are assigned those to RELEASE_FLAGS, and you can override them if required.

To find out what flag each model supports or requires, simply configure while only specifying the model. For example:

> cd babelstream
> cmake -Bbuild -H. -DMODEL=ocl 
...
- Common Release flags are `-O3`, set RELEASE_FLAGS to override
-- CXX_EXTRA_FLAGS: 
        Appends to common compile flags. These will be used at link phase at well.
        To use separate flags at link time, set `CXX_EXTRA_LINKER_FLAGS`
-- CXX_EXTRA_LINK_FLAGS: 
        Appends to link flags which appear *before* the objects.
        Do not use this for linking libraries, as the link line is order-dependent
-- CXX_EXTRA_LIBRARIES: 
        Append to link flags which appears *after* the objects.
        Use this for linking extra libraries (e.g `-lmylib`, or simply `mylib`) 
-- CXX_EXTRA_LINKER_FLAGS: 
        Append to linker flags (i.e GCC's `-Wl` or equivalent)
-- Available models:  omp;ocl;std;std20;hip;cuda;kokkos;sycl;acc;raja;tbb
-- Selected model  :  ocl
-- Supported flags:

   CMAKE_CXX_COMPILER (optional, default=c++): Any CXX compiler that is supported by CMake detection
   OpenCL_LIBRARY (optional, default=): Path to OpenCL library, usually called libOpenCL.so
...

Alternatively, refer to the CI script, which test-compiles most of the models, and see which flags are used there.

It is recommended that you delete the build directory when you change any of the build flags.

GNU Make

Support for Make has been removed from 4.0 onwards. However, as the build process only involves a few source files, the required compile commands can be extracted from the CI output.

Results

Sample results can be found in the results subdirectory. Newer results are found in our Performance Portability repository.

Contributing

As of v4.0, the main branch of this repository will hold the latest released version.

The develop branch will contain unreleased features due for the next (major and/or minor) release of BabelStream. Pull Requests should be made against the develop branch.

Citing

Please cite BabelStream via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering. Special issue. Vol. 17, No. 3, pp. 247–262. 2018. DOI: 10.1504/IJCSE.2018.095847

Other BabelStream publications

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany. DOI: 10.1007/978- 3-319-46079-6_34
Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States. You can view the Poster and Extended Abstract.
Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM: Now in 2D!. 2016. Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, United States. You can view the Poster and Extended Abstract.
Raman K, Deakin T, Price J, McIntosh-Smith S. Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG Spring Meeting, Cambridge, UK, 2017.
Deakin T, Price J, McIntosh-Smith S. Portable methods for measuring cache hierarchy performance. 2017. Poster sessions presented at IEEE/ACM SuperComputing, Denver, United States. You can view the Poster and Extended Abstract

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.

README.md Unescape Escape