| c470c2d9 | 27-Apr-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Use CeedMallocArray for void pointers in CUDA/HIP QFunctionContext |
| f48ed27d | 25-Apr-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Use backend functions for SyncArray in CUDA and HIP |
| 07b31e0e | 20-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - remove 'quoted' operator assembly kernels |
| ee5a26f2 | 04-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
jit - add interface for adding additional jit source dirs |
| a0154ade | 04-Apr-2022 |
Jed Brown <jed@jedbrown.org> |
move include/ceed-jit-source to include/ceed/jit-source |
| 6eb0d8b4 | 01-Apr-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
jit - use relpath from include/ceed-jit-source for jit source files |
| b216396c | 21-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
tidy - small tidy fixes |
| 4345bdd5 | 20-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #920 from CEED/jeremy/conversion
Explicit casting of vector sizes |
| 539ec17d | 20-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - fix error handling in size conversion |
| 2459f3f1 | 18-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #925 from CEED/gpu-assemble
Add some matrix assembly support to GPU backends |
| 59ad764a | 18-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add fallback kernel for larger element sizes in GPU assembly |
| cc132f9a | 17-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add LinearAssemble CUDA reference implementation for low-order elements |
| a835093f | 17-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add LinearAssemble HIP reference implementation for low-order elements |
| a11a3c55 | 17-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
MAGMA: simplify atomic add usage and reduce MAGMA header file usage |
| d2643443 | 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - explicit casting of vector sizes |
| 3d8e8822 | 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - update copyright headers |
| 16e5f7d7 | 14-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
opt - reduce memory usage in CeedOperatorLinearAssembleQFunctionCore_Opt |
| e79b91d9 | 11-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
rstr - use CeedSize for l_size |
| 1f9221fe | 11-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
vec - use CeedSize for vector lengths |
| f99981a3 | 25-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #893 from CEED/natalie/more-hip-launch-bounds
HIP: add atomics flag and more kernel launch bounds for performance improvements |
| 37c3b1cf | 24-Feb-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Change to more specific name for hip-gen block size function |
| 441428df | 14-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
qf - extend ctx read/write feature to qf |
| 28bfd0b7 | 14-Feb-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
ctx - add read-only access for QFContext |
| c8b3a627 | 18-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends/magma: fix pinned vs unpinned memory free bug |
| e6f67ff7 | 18-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends cuda-shared: fix launch bounds to avoid invalid z dimension
The typical max z dimension size of a thread block is 64 and we were computing larger values (like 85) in some cases. |