| bf84744c | 22-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
leak - add missing CeedFree for string |
| 5a5594ff | 22-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - fix CeedCall() vs CeedCallBackend() in backend code |
| f8a0df59 | 21-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Skip duplicate transpose restrictions (#1645)
* cpu - skip duplicate output rstr
* cuda - skip duplicate output rstr
* hip - skip duplicate output rstr |
| 4b3e95d5 | 21-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
GPU Gen Reorganize (#1637)
* cuda - pull out basis setup for gen
* cuda - functions for adding basis, rstr gen actions
* cuda - pull QFunction logic into separate fn for gen
* cuda - minor
GPU Gen Reorganize (#1637)
* cuda - pull out basis setup for gen
* cuda - functions for adding basis, rstr gen actions
* cuda - pull QFunction logic into separate fn for gen
* cuda - minor formatting
* cuda - fix basis errorr
* cuda - rename collograd_parallelization to 3d_slices
* cuda - another gen setup function separated
* hip - update gen source building to match cuda
* gpu - fix min size of QF inputs for gen
show more ...
|
| db2becc9 | 13-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Add CeedBasisApplyAdd (#1644)
* basis - add CeedBasisApplyAdd + CPU impl
* basis - add ref GPU ApplyAdd
* basis - add shared GPU ApplyAdd
* basis - add MAGMA ApplyAdd
* basis - add CeedB
Add CeedBasisApplyAdd (#1644)
* basis - add CeedBasisApplyAdd + CPU impl
* basis - add ref GPU ApplyAdd
* basis - add shared GPU ApplyAdd
* basis - add MAGMA ApplyAdd
* basis - add CeedBasisApplyAddAtPoints + default impl
* basis - add GPU ApplyAddAtPoints
* tidy - add extra assert to fix clang-tidy
* Apply suggestions from code review
style - consistently use indexing over pointer arithmatic
Co-authored-by: Zach Atkins <zach.atkins@colorado.edu>
* style - more pointer fixes
---------
Co-authored-by: Zach Atkins <zach.atkins@colorado.edu>
show more ...
|
| 3aab95c0 | 05-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
op - minor performance improvement for op with repeat input rstr |
| 13062808 | 02-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
atpoints - remove some extra operations |
| 86e10729 | 02-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
atpoints - fix diagonal bug with stale qvec data |
| 382e9c83 | 02-Aug-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
atPoints - fix diagonal assembly for mixed |
| 9b443e3b | 16-Jul-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - minimum input/output array size of 1 |
| 0a5597ce | 11-Jul-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
op - cast to CeedSize when creating rstr |
| afe3bc8a | 28-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
op - HIP diagonal assembly AtPoints |
| 349fb27d | 28-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
op - CUDA diagonal assembly AtPoints |
| 67d9480a | 20-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - add AtPoints CeedOperator |
| 756ca9e9 | 20-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
cuda - add AtPoints CeedOperator |
| ad8059fc | 10-Jul-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - reduce write conflits for AtPoints basis operations |
| 14950a8e | 21-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
magma - explicitly exclude BasisApplyAtPoints |
| f7c9815f | 20-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
AtPoints - ease memory requirement |
| 2d10e82c | 17-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
AtPoints - fix gpu thread usage |
| 1dda9c1a | 17-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - add intial AtPoints to shared mem backends, but using ref impl |
| 1c21e869 | 11-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - add BasisApplyAtPoints |
| 34d14614 | 30-May-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
cuda - impl BasisApplyAtPoints |
| 958e607d | 28-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
ref - drop unused variables in OpAtPoints |
| c1222711 | 24-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - skip unneeded restrictions in OpApply |
| a7efc114 | 24-Jun-2024 |
Jeremy L Thompson <jeremy@jeremylt.org> |
vec - use min of 2 lengths for gpu impl of CopyStrided |