Commit Graph

79 Commits

Author SHA1 Message Date
Tom Deakin
487e59c6a9 Don't run nstream in the main benchmark 2021-02-18 11:37:49 +00:00
Tom Deakin
579247dc06 Normalise sum result to mitigate errors with large iteration counts 2021-02-03 10:16:13 +00:00
Tom Deakin
bd04e6db3c Add nstream kernel from PRK
PRK has a nstream kernel, which is Triad with a += update.
This means there are 3 reads and a write, which is a higher
read/write ratio. In addition, non-temporal stores for the
write on CPUs will not be beneficial, and so compilers should
take care to emit these for the other kernels, but not these.
2021-02-02 11:25:42 +00:00
Tom Deakin
f99f8d35d9 Revert "Add nstream kernel from PRK"
This reverts commit 1e94a41f3c.
2021-02-02 11:25:27 +00:00
Tom Deakin
1e94a41f3c Add nstream kernel from PRK
PRK has a nstream kernel, which is Triad with a += update.
This means there are 3 reads and a write, which is a higher
read/write ratio. In addition, non-temporal stores for the
write on CPUs will not be beneficial, and so compilers should
take care to emit these for the other kernels, but not these.
2021-02-01 17:41:30 +00:00
Tom Deakin
435a104f6e Check input array size is positive 2021-01-12 15:30:41 +00:00
Tom Deakin
903eb40d19 Add parseInt function for parsing CLI arguments for array size 2021-01-12 10:28:01 +00:00
Tom Deakin
00de932454 Save array size argument as signed integer 2021-01-12 10:09:55 +00:00
Tom Deakin
e8fb3a6be4 Add C++20 version using for_each_n and range factories
Closes #85
2020-12-07 14:55:54 +00:00
Tom Deakin
5a93022fc1 Update OpenACC for Issue #80 2020-12-07 11:50:20 +00:00
Tom Deakin
b00120d346 Update STD C++17 for Issue #80 2020-12-07 11:32:22 +00:00
Tom Deakin
74f705cac9 Update OpenMP for Issue #80 2020-12-07 10:41:48 +00:00
Tom Deakin
829aa15da0 Allocate driver solution check vectors *after* the main computation
Each Stream implementation owns its own data, so the driver code
shouldn't allocate a large array just before. On processors with
strong NUMA effects and smaller memory capacities per NUMA domain,
these checking vectors can result in the main arrays being
allocated in the wrong NUMA domain.

The fix is to simply move the driver allocation until after the
computation has finished and we want to check the answers.

This commit only changes the driver; each model will be updated
in subsequent commits.

Fixes #80.
2020-12-07 10:39:37 +00:00
Gonzalo Brito Gadeschi
0855805ce2 Add NVIDIA HPC SDK C++ parallel STL implementation
This commits adds an implementation using the C++ parallel STL.
The Makefile uses the NVIDIA HPC SDK `nvc++` compiler with the `-stdpar` flag.
Tested using the NVIDIA HPC SDK 20.9.
2020-11-23 03:08:44 -08:00
Tom Deakin
289a2c204c Version bump 2019-04-10 14:12:00 +01:00
Tom Deakin
08348d1f0f Use ternary operator for simpler base 2 output checks 2019-04-10 14:06:05 +01:00
Patrick Atkinson
c50eba9caf fix for mibibytes in printing 2019-04-10 11:04:29 +00:00
Tom Deakin
5a1396671e Add a --mibibytes flag to output bandwidth and array sizes in base 2
This sets MiB = 2^20, GiB = 2^30 rather than the default of
MB = 10^6 and GB = 10^9.
2019-04-09 09:50:44 +01:00
Tom Deakin
02bcd9b762 Fix trailing comma in CSV output 2018-10-04 14:37:27 +01:00
Tom Deakin
a1f7b94820 Support CSV output for triad only running mode
Fixes #54
2018-10-04 14:36:59 +01:00
Tom Deakin
cc5ceb76f2 [Kokkos] Remove test for Kokkos around now fixed multiple template specializations 2018-02-15 03:40:36 +00:00
Tom Deakin
dead6d0d44 [Kokkos] Use tempate type throughout instead of double
Fixes #44. Also requires the typedef keyword in a few places.
2018-02-15 03:32:27 +00:00
Tom Deakin
b93ac5d7cf [Kokkos] Rename files to match Kokkos case conventions 2018-02-14 22:05:50 +00:00
Tom Deakin
87eb4361b4 Version bump 2017-08-02 16:35:40 +01:00
James Price
6a2da4c862 Implement --triad-only switch 2017-08-02 15:43:56 +01:00
Tom Deakin
5ad8341b39 Merge pull request #35 from psteinb/adding_csv_output
Adding csv output
2017-07-31 15:03:00 +01:00
Peter Steinbach
01d4eea7b7 removed obsolete spaces 2017-07-31 14:52:18 +02:00
Peter Steinbach
f9ffa712cf removed doublicate spaces 2017-07-31 14:46:50 +02:00
Peter Steinbach
df6fff1d2e added missing space for consistency 2017-07-31 14:30:08 +02:00
Peter Steinbach
2dbb693761 renamed nreps to be more consistent with the naming scheme 2017-07-31 14:23:39 +02:00
Peter Steinbach
7ed0308cb7 code formatting fixed 2017-07-31 14:14:52 +02:00
Peter Steinbach
2415bdc7c0 fixed if-clause formatting 2017-07-31 14:00:44 +02:00
Peter Steinbach
7911e6a0ae fixed compilation error due to unpropagated typo fix 2017-07-26 17:28:41 +02:00
Peter Steinbach
add9973b67 fixed typo 2017-07-26 17:21:17 +02:00
Peter Steinbach
99fad100c6 added csv-output-sentinals and output 2017-07-26 14:22:24 +02:00
Peter Steinbach
ee8ab08eaf added csv flag 2017-07-26 14:02:32 +02:00
Peter Steinbach
26279688d1 Merge branch 'master' of https://github.com/UoB-HPC/BabelStream into rocm_hc_support 2017-07-25 17:05:31 +02:00
Tom Deakin
dafc63030f Rename to BabelStream 2017-04-08 12:16:29 +01:00
Tom Deakin
9c08fdd184 Minor version bump 2017-04-06 10:38:48 +01:00
Peter Steinbach
62ea5e3ed6 Merge remote-tracking branch 'upstream/master' into bare_hc
Conflicts:
	CMakeLists.txt
2017-02-27 14:35:11 +01:00
Tom Deakin
cc90cefeeb Minor version bump to signal build system update 2017-02-25 14:14:59 +00:00
Peter Steinbach
c9a45600c8 Merge branch 'master' into bare_hc 2017-01-30 16:06:34 +01:00
Tom Deakin
ec2bf50e75 Version bump 2017-01-30 13:52:45 +00:00
Peter Steinbach
7621f86701 added pure HC gpu stream implmentation 2017-01-03 11:43:12 +01:00
Tom Deakin
d0dd48406c Move version string to main removing common dependency 2016-12-09 12:36:25 +00:00
Tom Deakin
e6615944f4 Use a compiler switch to select OpenMP directives (target or parallel for) 2016-12-09 12:24:08 +00:00
James Price
1e976ff150 [SYCL] Fix multiple template specializations 2016-11-18 00:14:46 +00:00
Tom Deakin
d42bcd4675 Merge remote-tracking branch 'origin/init-arrays' into devel 2016-11-04 09:17:54 +00:00
James Price
7f4761ae52 Replace write_arrays with init_arrays
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.
2016-11-02 11:22:01 +00:00
Tom Deakin
644ebc40ef Verify reduction result to 8 decimal places 2016-10-24 16:22:35 +01:00