BabelStream

Author	SHA1	Message	Date
Tom Deakin	579247dc06	Normalise sum result to mitigate errors with large iteration counts	2021-02-03 10:16:13 +00:00
Tom Deakin	210cfb7520	Revert "Update initial array values to ensure dot product works with the nstream kernel" This reverts commit `5346e1226d`. Conflicts: CHANGELOG.md	2021-02-03 10:14:58 +00:00
Tom Deakin	fdac285110	Merge branch 'main' into nstream Conflicts: CHANGELOG.md	2021-02-02 15:46:39 +00:00
Tom Deakin	018d8a4510	[OpenCL] Remove dot kernel object in deconstructor	2021-02-02 15:45:54 +00:00
Tom Deakin	84406024cf	Update CHANGELOG	2021-02-02 11:28:33 +00:00
Tom Deakin	5346e1226d	Update initial array values to ensure dot product works with the nstream kernel	2021-02-02 11:27:54 +00:00
Tom Deakin	20c3284629	Update CHANGELOG with signed int change	2021-01-12 10:23:21 +00:00
Tom Deakin	9c211bca96	Update changelog for CUDA memory mode	2020-12-07 15:13:06 +00:00
Tom Deakin	e8fb3a6be4	Add C++20 version using for_each_n and range factories Closes #85	2020-12-07 14:55:54 +00:00
Tom Deakin	ffa221fd35	Fix OpenMP Clang NVIDIA Target flags (missing sm architecture) with new NVARCH option Example usage: make -f OpenMP.make COMPILER=CLANG TARGET=NVIDIA NVARCH=sm_61 Fixes #61	2020-12-07 12:23:11 +00:00
Tom Deakin	829aa15da0	Allocate driver solution check vectors after the main computation Each Stream implementation owns its own data, so the driver code shouldn't allocate a large array just before. On processors with strong NUMA effects and smaller memory capacities per NUMA domain, these checking vectors can result in the main arrays being allocated in the wrong NUMA domain. The fix is to simply move the driver allocation until after the computation has finished and we want to check the answers. This commit only changes the driver; each model will be updated in subsequent commits. Fixes #80.	2020-12-07 10:39:37 +00:00
Tom Deakin	f373927ce8	Rename branch name	2020-12-07 10:23:27 +00:00
Gonzalo Brito Gadeschi	0855805ce2	Add NVIDIA HPC SDK C++ parallel STL implementation This commits adds an implementation using the C++ parallel STL. The Makefile uses the NVIDIA HPC SDK `nvc++` compiler with the `-stdpar` flag. Tested using the NVIDIA HPC SDK 20.9.	2020-11-23 03:08:44 -08:00
Tom Deakin	5182342403	Update CHANGELOG.md	2020-10-26 09:58:57 +00:00
Tom Deakin	0ff841bbf5	Update CHANGELOG.md	2020-08-07 12:29:28 +01:00
Tom Deakin	8ece4079fd	Update CHANGELOG.md	2020-07-14 14:03:04 +01:00
Tom Deakin	6c57b6305e	Update CHANGELOG.md Summarise move of build system to Kokkos 3.	2020-07-13 09:35:55 +01:00
Tom Deakin	64617c6dee	Update OpenMP Cray flags Fixes #68	2020-07-10 13:28:23 +01:00
Tom Deakin	5d0ee99de6	Remove Cray flags for OpenACC following removal of support in latest compiler	2020-07-10 13:27:21 +01:00
Andrew Stubbs	09271eda17	Add GNU OpenACC support for AMD GCN Autodetect the device type, rather than hard-code NVidia. Add GNU command line options to the makefile, and adjust the "restrict" extension usage. For now, we assume the toolchain is only configured for one accelerator.	2020-05-21 20:54:04 +01:00
Tom Deakin	b792c422f7	[OpenMP] Add build flags for OpenMP offload to AMD and NVIDIA with GCC 10.1 Closes #65	2020-05-12 11:24:29 +01:00
Tom Deakin	0919d95aa4	[SYCL] Use SYCL runtime device discovery Fixes #63	2020-05-11 17:16:47 +01:00
Tom Deakin	1d6da069b3	[SYCL] Pass explicit async_handler to queue constructor	2020-05-11 17:13:36 +01:00
Tom Deakin	1bc4395f48	Update local copy of OpenCL C++ header file. This closes #62	2020-03-16 16:43:55 +00:00
Tom Deakin	8776901733	[SYCL] Use the cl::sycl::id parameter in the parallel_for kernels The cl::sycl::item provides extra features for extracing global/local ids which aren't required by the kernels. This also means the kernels don't need to extract the id from the item.	2019-11-01 15:19:01 +00:00
Tom Deakin	63cc964847	Update CHANGELOG with updates from #58	2019-06-26 12:06:06 +01:00
Tom Deakin	289a2c204c	Version bump	2019-04-10 14:12:00 +01:00
Tom Deakin	dd6f3af98b	Update changelog	2019-04-10 14:06:50 +01:00
Tom Deakin	db2a4c40d8	[OpenACC] Add PGI support for Power 9	2019-03-14 15:56:51 +00:00
Tom Deakin	7ec2108896	[OpenMP] Use -qarch=auto with XL compiler	2019-03-14 15:39:45 +00:00
Tom Deakin	c8098a5cc0	[OpenACC] Add KNL support	2019-03-14 09:11:16 -05:00
Tom Deakin	f1f31d2a9b	[OpenACC] Add PGI compiler support for Skylake	2019-03-13 04:14:11 -05:00
Tom Deakin	db9bf78530	[OpenMP] Add PGI compiler support	2019-03-13 04:13:38 -05:00
Tom Deakin	6229b83e62	update changelog	2019-03-11 17:40:33 +00:00
Tom Deakin	a1f7b94820	Support CSV output for triad only running mode Fixes #54	2018-10-04 14:36:59 +01:00
Tom Deakin	96216628bf	Update CHANGELOG.md	2018-09-14 12:57:14 +01:00
Tom Deakin	e5d54dd521	Use parallel loop for OpenACC instead of kernels Closes #53.	2018-07-25 15:53:50 +00:00
Tom Deakin	54fc326097	Add mcpu=native flag to GNU OpenMP builds	2018-04-27 13:21:30 +01:00
Tom Deakin	54b8a549c1	Update CHANGELOG.md Add notice to changelog about #49.	2018-03-19 11:07:32 +00:00
Tom Deakin	dead6d0d44	[Kokkos] Use tempate type throughout instead of double Fixes #44. Also requires the typedef keyword in a few places.	2018-02-15 03:32:27 +00:00
Tom Deakin	5f20c119bc	[Kokkos] Set some meaningful output with --list argument. The string is mangled by the linker, but should say something useful.	2018-02-14 22:22:57 +00:00
Tom Deakin	0092d23461	[Kokkos] Remove defining View layout as Kokkos does it correctly by default. This fixes #43.	2018-02-14 22:14:47 +00:00
Tom Deakin	b93ac5d7cf	[Kokkos] Rename files to match Kokkos case conventions	2018-02-14 22:05:50 +00:00
Tom Deakin	3925c71851	[Kokkos] Remove global use of gobal namespace	2018-02-14 22:00:21 +00:00
Tom Deakin	1d84002cb6	Fix GitHub formatting in CHANGELOG	2018-02-07 16:54:18 +00:00
Tom Deakin	53e8f408ad	Fix GitHub formatting in CHANGELOG	2018-02-07 16:53:00 +00:00
Tom Deakin	88c8854a54	Add unreleased changes to CHANGELOG	2018-02-07 16:51:57 +00:00
Tom Deakin	710a18916c	Add a Changelog file to document project changes	2018-02-07 16:46:18 +00:00

48 Commits