Commit Graph

722 Commits

Author SHA1 Message Date
Tom Deakin
5a93022fc1 Update OpenACC for Issue #80 2020-12-07 11:50:20 +00:00
Tom Deakin
b00120d346 Update STD C++17 for Issue #80 2020-12-07 11:32:22 +00:00
Tom Deakin
74f705cac9 Update OpenMP for Issue #80 2020-12-07 10:41:48 +00:00
Tom Deakin
829aa15da0 Allocate driver solution check vectors *after* the main computation
Each Stream implementation owns its own data, so the driver code
shouldn't allocate a large array just before. On processors with
strong NUMA effects and smaller memory capacities per NUMA domain,
these checking vectors can result in the main arrays being
allocated in the wrong NUMA domain.

The fix is to simply move the driver allocation until after the
computation has finished and we want to check the answers.

This commit only changes the driver; each model will be updated
in subsequent commits.

Fixes #80.
2020-12-07 10:39:37 +00:00
Tom Deakin
f373927ce8 Rename branch name 2020-12-07 10:23:27 +00:00
Tom Deakin
f271d5563d
Merge pull request #84 from gonzalobg/cxx_parallel_stl
Add NVIDIA HPC SDK C++ parallel STL implementation
2020-12-03 14:15:45 +00:00
Gonzalo Brito Gadeschi
0855805ce2 Add NVIDIA HPC SDK C++ parallel STL implementation
This commits adds an implementation using the C++ parallel STL.
The Makefile uses the NVIDIA HPC SDK `nvc++` compiler with the `-stdpar` flag.
Tested using the NVIDIA HPC SDK 20.9.
2020-11-23 03:08:44 -08:00
Tom Deakin
5182342403
Update CHANGELOG.md 2020-10-26 09:58:57 +00:00
Tom Deakin
8ae8c70188
Merge pull request #81 from Kerilk/master
Ensure OpenCL destructors are called in the "correct" order.
2020-10-26 09:58:05 +00:00
Brice Videau
e92d034f64 Ensure OpenCL destructors are called in the correct order. 2020-10-16 18:05:23 -05:00
Tom Deakin
6f46267e6c Add AOMP build options 2020-08-13 17:46:45 +01:00
Tom Deakin
66d915fa2e
[OpenMP] Fix ARMCLANG Makefile bug where it didn't set the flags 2020-08-12 15:39:13 +01:00
Tom Deakin
f31181dedb
Add -O3 flat to HIP.make to fix segmentation fault 2020-08-12 14:09:22 +01:00
Tom Deakin
da3946a7d5 Add missing O3 flag for OpenMP ARMCLANG 2020-08-07 17:09:46 +01:00
Tom Deakin
0ff841bbf5
Update CHANGELOG.md 2020-08-07 12:29:28 +01:00
Tom Deakin
17f057c38a
Merge pull request #79 from tom91136/master
Update build flags for SYCL, Kokkos, and OpenMP, tracking newest versions of each compiler
2020-08-07 12:27:43 +01:00
Tom Lin
cdaf6cb88e Fixed a bug where ComputeCpp's flags is omitted
Renamed INTEL_GT -> INTEL_GPU
Only use NVCC with Kokkos if not using HIPCC
2020-08-07 11:00:56 +01:00
Tom Lin
59274d6a91 Add NVIDIA as target for dpcpp 2020-08-05 08:54:40 +01:00
Tom Lin
603dc7d136 Add HIP compilers for Kokkos 2020-08-05 08:49:18 +01:00
Tom Lin
98b0939669 Add Intel GT OMP offloading support for icpc 2020-08-04 23:52:08 +01:00
Tom Lin
09458ef866 Fixed hipSYCL flag propagation 2020-08-03 17:22:39 +01:00
Tom Lin
f0403d2b09 Add CXX support for hipSYCL, dpcpp, and ComputeCpp 2020-08-03 16:45:06 +01:00
Tom Deakin
2f9f533890
Merge pull request #76 from tom91136/master
Add PPC+GNU combination
2020-07-29 14:10:27 +01:00
Tom Lin
0cb6b3d421 Add PPC+GNU combination 2020-07-16 11:07:01 +01:00
Tom Deakin
8ece4079fd
Update CHANGELOG.md 2020-07-14 14:03:04 +01:00
Tom Deakin
d28df3b71e
Merge pull request #73 from tom91136/master
Added support for Cray, PGI, and Armclang for Kokkos and OpenMP
2020-07-14 14:02:18 +01:00
Tom Lin
a22bb92516 Add armclang-cpu flags for OpenMP 2020-07-14 13:57:40 +01:00
Tom Lin
1fcd062d6c Add Cray, PGI, Armclang support for Kokkos/OpenMP
Fixed GCC's missing openmp flag in Kokkos
2020-07-14 13:56:55 +01:00
Tom Deakin
6c57b6305e
Update CHANGELOG.md
Summarise move of build system to Kokkos 3.
2020-07-13 09:35:55 +01:00
Tom Deakin
a8b85e71bd
Merge pull request #72 from tom91136/master
Update Kokkos to support version 3+
2020-07-13 09:34:46 +01:00
Tom Lin
1ffd069e80 Update Kokkos to support version 3+ 2020-07-13 03:02:34 +01:00
Tom Deakin
64617c6dee Update OpenMP Cray flags
Fixes #68
2020-07-10 13:28:23 +01:00
Tom Deakin
5d0ee99de6 Remove Cray flags for OpenACC following removal of support in latest compiler 2020-07-10 13:27:21 +01:00
Tom Deakin
d6520daf11 Update README with differentiation from STREAM 2020-06-02 15:41:00 +01:00
Tom Deakin
272c73a622
Merge pull request #66 from ams-cs/master
Add GNU OpenACC support for AMD GCN
2020-05-22 13:00:15 +01:00
Andrew Stubbs
09271eda17 Add GNU OpenACC support for AMD GCN
Autodetect the device type, rather than hard-code NVidia.

Add GNU command line options to the makefile, and adjust the "restrict"
extension usage. For now, we assume the toolchain is only configured for one
accelerator.
2020-05-21 20:54:04 +01:00
Tom Deakin
d410c65c97 [OpenMP] Change GNU -mcpu=native to -march=native as former is deprecated 2020-05-12 11:48:26 +01:00
Tom Deakin
b792c422f7 [OpenMP] Add build flags for OpenMP offload to AMD and NVIDIA with GCC 10.1
Closes #65
2020-05-12 11:24:29 +01:00
Tom Deakin
87b126f5ea Merge branch 'local'
Conflicts:
	SYCLStream.cpp
2020-05-11 17:20:01 +01:00
Tom Deakin
0919d95aa4 [SYCL] Use SYCL runtime device discovery
Fixes #63
2020-05-11 17:16:47 +01:00
Tom Deakin
1d6da069b3 [SYCL] Pass explicit async_handler to queue constructor 2020-05-11 17:13:36 +01:00
Tom Deakin
7f1637d679 [SYCL] Remove unused program variable 2020-05-11 17:10:48 +01:00
Tom Deakin
6db2c7a0ec [SYCL] Remove unused program variable 2020-05-11 17:09:21 +01:00
Tom Deakin
1bc4395f48 Update local copy of OpenCL C++ header file.
This closes #62
2020-03-16 16:43:55 +00:00
Tom Deakin
8776901733 [SYCL] Use the cl::sycl::id parameter in the parallel_for kernels
The cl::sycl::item provides extra features for extracing global/local
ids which aren't required by the kernels.
This also means the kernels don't need to extract the id from the item.
2019-11-01 15:19:01 +00:00
Tom Deakin
4bcb777100 Add Zen target for OpenACC 2019-08-08 14:36:20 +00:00
Tom Deakin
63cc964847 Update CHANGELOG with updates from #58 2019-06-26 12:06:06 +01:00
Tom Deakin
022793bdd6
Merge pull request #58 from GeorgeWeb/sycl-compliant
Making BabelStream's SYCL code compliant
2019-06-26 12:03:47 +01:00
GeorgeWeb
e657bfa897 based on perf comparison, and discussions, the use pre-built kernels is unnecessary in this case 2019-06-20 14:24:46 +01:00
GeorgeWeb
54737d87cb enclosing computecpp specific code in macros, rather than removing it 2019-06-20 10:13:39 +01:00