Commit Graph

10 Commits

Author SHA1 Message Date
Tom Lin
145e2a0649 Merge branch 'fix_num_type' into develop 2023-10-07 15:09:44 +01:00
Tom Lin
f2f7f3a3de Fix bad dot group initialiser in HIP and CUDA 2023-10-07 11:12:08 +01:00
Tom Lin
ffae3ba83f Fix CMAKE_CUDA_FLAGS, resolves #166 2023-10-07 09:45:16 +01:00
Tom Deakin
9954b7d38c Set CUDA dot kernel to use number of blocks relative to device property
This aligns with the approach implemented in other models (SYCL 1.2.1 and HIP)

Cherry-picks the CUDA updates from lmeadows in #122
2023-10-06 17:56:42 +01:00
Tom Lin
bd6bb09b5d Fix MEM flag for CUDA, resolves #163 2023-09-25 01:39:23 +01:00
Tom Lin
3dcafd1af1 Fix max element guard overflow for CUDA, resolves #136 2023-09-22 02:31:14 +01:00
Tom Deakin
092ee67764 Change CUDA DOT thread-blocks to 1024
This improves the performance on Ampere (A100) GPUs.

Fixes #137.
2023-06-12 15:51:13 +01:00
Tom Deakin
a35c7b4bea Fix CUDA memory check for large array sizes
Closes #123
2022-02-16 14:33:17 +00:00
Tom Lin
f5fe55c204 [WIP] Drop CL headers and Makefiles
Update README
Move new models to /src
2021-11-30 18:22:55 +00:00
Tom Lin
5318404249 Use ./src instead of ./cpp
Create subdir for each cpp-based implementation
2021-05-26 17:46:07 +01:00