James Price
1e976ff150
[SYCL] Fix multiple template specializations
2016-11-18 00:14:46 +00:00
Tom Deakin
d42bcd4675
Merge remote-tracking branch 'origin/init-arrays' into devel
2016-11-04 09:17:54 +00:00
James Price
7f4761ae52
Replace write_arrays with init_arrays
...
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.
2016-11-02 11:22:01 +00:00
Tom Deakin
644ebc40ef
Verify reduction result to 8 decimal places
2016-10-24 16:22:35 +01:00
Tom Deakin
f32cf3bad3
Merge branch 'master' into kernel-dot
...
Conflicts:
main.cpp
2016-10-24 13:53:58 +01:00
Tom Deakin
5ae613519d
Change the value of scalar, and specify in a #define
2016-10-24 13:19:31 +01:00
James Price
1e94870859
Fix verification of dot kernel
2016-10-24 12:47:01 +01:00
Tom Deakin
28c2660b52
Merge branch 'master' into kernel-dot
2016-10-24 12:21:16 +01:00
Tom Deakin
08fe695d51
Fix typo in main file
2016-10-14 15:04:04 +01:00
Tom Deakin
275bfb2066
Check result of the final reduction
2016-10-14 14:45:28 +01:00
Tom Deakin
04ca357159
Call the Dot kernel and print out results
2016-10-14 14:40:28 +01:00
pensun
a1f9d9ece7
Add support of HIP version of GPU-STREAM.
...
This commit was tested with HIP developer preview branch.
2016-09-05 23:41:01 -05:00
Tom Deakin
d420032c66
Remove warning about iteration count when using floats as new data values work for 100 iterations
2016-05-11 17:15:43 +01:00
Tom Deakin
31cb567e21
Switch data from 1.0, 2.0 and 3.0 to 0.1, 0.2, and 0.3 resp.
...
Using integers for maths gets unstable past 38 interations even
in double precision. Using the original values/10 is safe up to
the default 100 iterations.
2016-05-11 15:51:19 +01:00
Tom Deakin
55a858e0c0
Use 2^25 as default size because 2^26 gives too many thread blocks for CUDA
2016-05-11 15:43:52 +01:00
Tom Deakin
eb10c716f2
First attempt at OpenMP 4.5
2016-05-11 15:08:08 +01:00
Tom Deakin
207fd8f784
Default to power of two array size
2016-05-11 12:04:19 +01:00
Tom Deakin
0f8f191d0e
Require number of iterations to be at least 2
2016-05-11 11:55:33 +01:00
Tom Deakin
75ef78495c
Add print out of number of iterations
2016-05-11 11:53:51 +01:00
Tom Deakin
3227e5dbf0
Print out data type for float or double
2016-05-11 11:52:17 +01:00
Tom Deakin
5c8b07262b
Default to 100 iterations to get over any warm up times
2016-05-11 11:49:44 +01:00
Matthew Martineau
894829cb05
Adjusted the Kokkos implementation to fix view initialisation, and store local copies of views for lambda scoping
2016-05-06 21:02:44 +01:00
Matthew Martineau
57189e7ca5
Merge branch 'refactor' of https://github.com/UoB-hpc/gpu-stream into refactor
2016-05-06 10:54:18 +01:00
Matthew Martineau
3b266b8266
Fix for namespace collision with #define RAJA
2016-05-06 10:53:12 +01:00
James Price
d4b3b3533c
Update SYCL version to work with ComputeCpp
...
Still needs proper CMake rules and kernel names need to be fixed for
multiple template instantiations.
2016-05-06 00:38:30 +01:00
Matthew Martineau
0a738efa54
Merging in changes from trunk
2016-05-05 17:23:47 +01:00
Matthew Martineau
7c28a6386b
Added the Kokkos and RAJA implementations
2016-05-05 17:22:29 +01:00
Tom Deakin
f0afa0c1e4
Add reference OpenMP 3.0 version
2016-05-04 10:41:41 +01:00
Tom Deakin
0b0de4e0c3
Implement the OpenACC device string functions, and device selector
2016-05-03 14:50:09 +01:00
James Price
da4f918788
Add initial SYCL implementation
2016-05-03 14:45:13 +01:00
Tom Deakin
1a38b18954
Add OpenACC version
2016-05-03 14:36:08 +01:00
Tom Deakin
530b2adda2
Add License text to all files
2016-05-03 12:32:03 +01:00
Tom Deakin
a355acf2ee
Move source files to top level directory
2016-05-03 11:43:25 +01:00