Tom Deakin
1d4a5dc346
Make OpenMP string name without version number
2016-12-09 12:24:08 +00:00
Tom Deakin
469d8d5634
Remove old OpenMP 3 code
2016-12-09 12:24:08 +00:00
Tom Deakin
e6615944f4
Use a compiler switch to select OpenMP directives (target or parallel for)
2016-12-09 12:24:08 +00:00
Tom Deakin
edd65dacb1
Add Kokkos Makefile for CPU
2016-11-22 20:06:54 +00:00
James Price
db01715806
[SYCL] Explictly use first dimension of ranges
2016-11-18 00:35:36 +00:00
James Price
1e976ff150
[SYCL] Fix multiple template specializations
2016-11-18 00:14:46 +00:00
James Price
66776d5839
[SYCL] Use consistent syntax for indexing
2016-11-17 23:52:13 +00:00
James Price
02bff60870
[SYCL] Fix start index in reduction loop
2016-11-17 21:01:30 +00:00
Tom Deakin
ffac9fc352
[OMP45] Use alloc instead to to allocate device memory
...
This fixes #11
2016-11-16 12:50:20 -06:00
Tom Deakin
cb2221a64a
Add a common.h file
2016-11-16 08:29:54 -07:00
Simon McIntosh-Smith
b3cf9992bb
Fixed broken link to new GPU-STREAM webpage
2016-11-07 23:35:00 +00:00
Tom Deakin
d42bcd4675
Merge remote-tracking branch 'origin/init-arrays' into devel
2016-11-04 09:17:54 +00:00
James Price
7f4761ae52
Replace write_arrays with init_arrays
...
This allows each model to initialise their arrays with a parallel
approach, which yields the first touch required for good performance
on NUMA architectures.
2016-11-02 11:22:01 +00:00
Tom Deakin
6f9512e5b5
Merge branch 'kernel-dot' of github.com:uob-hpc/gpu-stream into kernel-dot
2016-11-01 16:25:23 +00:00
James Price
829d21c1d6
Merge branch 'master' into kernel-dot
2016-11-01 16:20:27 +00:00
James Price
3045208aae
[RAJA] Parallel first touch
2016-11-01 16:18:43 +00:00
Tom Deakin
54a966f99a
Merge branch 'master' into kernel-dot
2016-10-31 19:04:59 +00:00
Tom Deakin
9acdba8b76
Add link to website
2016-10-31 18:31:33 +00:00
Tom Deakin
7150e047dd
Add link to website
2016-10-31 18:31:07 +00:00
James Price
dd296d2231
[SYCL] Prebuild dot kernel like the others
2016-10-28 21:15:12 +01:00
James Price
b09b90f6fc
Merge remote-tracking branch 'origin' into kernel-dot
2016-10-28 21:07:57 +01:00
James Price
cce8e78cae
Merge pull request #12 from Ruyk/master
...
Updated the SYCL Stream benchmark with latest ComputeCpp CE 0.1.1 Edition
2016-10-28 21:06:34 +01:00
Tom Deakin
4e39572966
Add O3 to Kokkos CPU
2016-10-27 15:41:23 +01:00
Tom Deakin
98962c4aee
Add Kokkos CPU Makefile
2016-10-27 15:19:16 +01:00
James Price
d7c48c5063
Slight tweak to dot config output to fix parsing scripts
2016-10-26 15:47:10 +01:00
James Price
cbf97dc7d9
[SYCL] Automatically determine dot NDRange config
2016-10-26 15:19:14 +01:00
James Price
21556af500
[OCL] Automatically determine dot NDRange config
2016-10-26 15:19:14 +01:00
Tom Deakin
bd0ba4dee9
Merge branch 'master' into kernel-dot
2016-10-26 14:37:15 +01:00
Tom Deakin
188165d729
[Kokkos] Add -O3 flag to Kokkos makefile
2016-10-26 14:37:00 +01:00
James Price
ed630e7dbc
[SYCL] Implement dot kernel
2016-10-25 16:39:23 +01:00
Tom Deakin
e5b67ac969
Version bump
2016-10-25 12:22:01 +01:00
James Price
dfc79eeb4d
Improve performance of CUDA dot implementation
2016-10-24 21:42:39 +01:00
James Price
d5482b74f4
Improve performance of OpenCL dot implementation
2016-10-24 21:26:09 +01:00
Tom Deakin
644ebc40ef
Verify reduction result to 8 decimal places
2016-10-24 16:22:35 +01:00
Tom Deakin
f32cf3bad3
Merge branch 'master' into kernel-dot
...
Conflicts:
main.cpp
2016-10-24 13:53:58 +01:00
Tom Deakin
963f3abfa0
Version bump
2016-10-24 13:52:03 +01:00
Tom Deakin
0bed614734
[OpenCL] Use global defined scalar value
2016-10-24 13:51:47 +01:00
Tom Deakin
ce5152fefd
[HIP] Use global defined scalar value
2016-10-24 13:45:07 +01:00
Tom Deakin
47128d47c0
[SYCL] Use global defined scalar value
2016-10-24 13:37:43 +01:00
Tom Deakin
b54d94b82d
[RAJA] Use global defined scalar value
2016-10-24 13:30:55 +01:00
Tom Deakin
d1bebf12d9
[Kokkos] Use global defined scalar value
2016-10-24 13:25:54 +01:00
Tom Deakin
ac6158fa31
[OpenACC] Use global defined scalar value
2016-10-24 13:24:11 +01:00
Tom Deakin
b120acaf87
[OMP45] Use global defined scalar value
2016-10-24 13:23:20 +01:00
Tom Deakin
7a81b63fbf
[OMP3] Use global defined scalar value
2016-10-24 13:21:47 +01:00
Tom Deakin
5b1e67f666
[CUDA] Use new value of scalar
2016-10-24 13:19:54 +01:00
Tom Deakin
5ae613519d
Change the value of scalar, and specify in a #define
2016-10-24 13:19:31 +01:00
James Price
cfc1aba2c0
Use WGSIZE=256 for dot for compatability with AMD
2016-10-24 12:51:01 +01:00
James Price
c9b3d07b84
Fix OpenCL host code for dot kernel
...
Wrong number of blocks was being copied and summed, and the host sums
vector didn't have the correct size.
2016-10-24 12:49:58 +01:00
James Price
8a8f44b4ce
Fix CUDA host code for dot kernel
...
Wrong number of blocks was being copied and summed.
2016-10-24 12:47:25 +01:00
James Price
1e94870859
Fix verification of dot kernel
2016-10-24 12:47:01 +01:00