| a61b1c91 | 17-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - small fixes |
| efa41df3 | 14-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
fix - harmless warnings |
| 74398b5a | 14-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - add mixed gen |
| 8014c5e7 | 14-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - set default dim to max |
| 259057ed | 14-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - fix flattened indexing |
| c8e372f0 | 13-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - add 3D mixed support |
| c433aabc | 11-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
cuda - fix 2D flattening |
| 412e5683 | 28-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - use 2d Flat variants in gen |
| 343e3094 | 26-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - isolate core 2D tensor logic to allow flat version |
| f725b54b | 26-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - add P_1D to template args for AtPoints |
| 90c30374 | 18-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - use blocksize of 1 elem AtPoints |
| 28c1f747 | 13-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - log error to debug on JiT try & fail |
| 6b92dc4b | 10-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - use BASIS_T_1D in codegen |
| 99421279 | 10-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
cuda - use BASIS_T_1D in codegen |
| 826538b3 | 07-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - restrict input/output array pointers |
| 59fa3f92 | 06-Mar-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gen - use field names for clarity |
| 0c8fbeed | 26-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - gen should use GetArray over GetArrayWrite |
| 087855af | 24-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - gen put suboperators on separate streams |
| c99afcd8 | 24-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - gen ApplyAdd functions |
| e9c76bdd | 19-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - allow running shared kernels on stream |
| ea04d07f | 11-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - isolate gen ApplyAdd inner logic |
| a8772291 | 13-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - fix bug, need to actually get kernels |
| af0e6e89 | 13-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - add Transpose/TransposeAdd variants for AtPoints |
| 5a05fad6 | 12-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #1750 from CEED/jeremy/no-handroll-blas
gpu - prefer cu/hipBlas over handrolls |
| e84c3ebc | 12-Feb-2025 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - prefer cu/hipBlas over handrolls |