Tom Lin
|
145e2a0649
|
Merge branch 'fix_num_type' into develop
|
2023-10-07 15:09:44 +01:00 |
|
Tom Lin
|
f2f7f3a3de
|
Fix bad dot group initialiser in HIP and CUDA
|
2023-10-07 11:12:08 +01:00 |
|
Tom Lin
|
ffae3ba83f
|
Fix CMAKE_CUDA_FLAGS, resolves #166
|
2023-10-07 09:45:16 +01:00 |
|
Tom Deakin
|
9954b7d38c
|
Set CUDA dot kernel to use number of blocks relative to device property
This aligns with the approach implemented in other models (SYCL 1.2.1 and HIP)
Cherry-picks the CUDA updates from lmeadows in #122
|
2023-10-06 17:56:42 +01:00 |
|
Tom Lin
|
bd6bb09b5d
|
Fix MEM flag for CUDA, resolves #163
|
2023-09-25 01:39:23 +01:00 |
|
Tom Lin
|
3dcafd1af1
|
Fix max element guard overflow for CUDA, resolves #136
|
2023-09-22 02:31:14 +01:00 |
|
Tom Deakin
|
092ee67764
|
Change CUDA DOT thread-blocks to 1024
This improves the performance on Ampere (A100) GPUs.
Fixes #137.
|
2023-06-12 15:51:13 +01:00 |
|
Tom Deakin
|
a35c7b4bea
|
Fix CUDA memory check for large array sizes
Closes #123
|
2022-02-16 14:33:17 +00:00 |
|
Tom Lin
|
f5fe55c204
|
[WIP] Drop CL headers and Makefiles
Update README
Move new models to /src
|
2021-11-30 18:22:55 +00:00 |
|
Tom Lin
|
5318404249
|
Use ./src instead of ./cpp
Create subdir for each cpp-based implementation
|
2021-05-26 17:46:07 +01:00 |
|