| #
2b730f8b
|
| 17-Nov-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - re
Switch to clang-format (#1051)
* style - switch to clang-format
* ci - use newer libxsmm
* action - update format action
* format - consistent use of {} for multi-line if/for
* make - remove stray newline
* make - simpler 'make format' target
* ci - use newer libxsmm
* doc - minor release note claification
* minor - minor fix
* minor - minor fix
* minor - minor fix
* minor - minor fix
* make format
* format - less aggressive alignment rules
* tidy - check for argument name mismatches
* fix newline
* format - mirror Ratel update to .clang-format
* fix merge error
* fix merge conflict
* fix merge error
* drop style in .phony list
* Update .clang-format
Co-authored-by: Jed Brown <jed@jedbrown.org>
* apply updated format
Co-authored-by: Jed Brown <jed@jedbrown.org>
show more ...
|
| #
9e201c85
|
| 23-Sep-2022 |
Yohann <dudouit1@llnl.gov> |
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - miss
Refactor `cuda-gen` and `hip-gen` backends. (#1050)
* Add TODO items.
* rough, but something like this?
* wip - cleaning up some warnings, but more remain
* wip - reorganize
* wip - missing kernels
* wip - replace t1d
* fix some kernels
* another typo
* more
* another one
* closer
* define T_1D
* typosgit add .!
* WIP: changes to cuda-shared framework for new kernels
* fix output writing
* buffer fix
* buffer sizes
* WIP: fixes for 2 and 3D basis kernels
* minor
* fix weight kernel for 3d
* remove debugging output
* minor reorg
* fix includes
* enable collo grad for cuda-shared
* move quoted kernels
* renaming
* missed a rename
* small fix
* more naming consistency
* faster 'useCollograd=false' path in *-gen
* more style
* one last style fix
* clearer collograd condition
* Add gen basis kernels to hip-shared
* Try some changes to hip-shared basis block sizes for new kernels
* cuda - drop extra kernel arg
* cuda - fix collograd check logic
* update gen comment about parallelization
* tidy up fields struct definition
* tidy up structs even more
* Update hip-gen basis templates use and move other hip-gen device functions to jit-source
* Finish hip-gen basis template update; small style updates to match CUDA
* missing isStrided
* Update block size used in 3D weight for new shared kernels
* update release notes
Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org>
Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com>
show more ...
|
| #
32b31df9
|
| 17-Aug-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #1041 from CEED/jeremy/guard-hip-version
Guard hipblas header include for HIP_VERSION
|
| #
0df8cb37
|
| 16-Aug-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
hip - guard hipblas header include for HIP_VERSION
|
| #
428b7a12
|
| 06-Jun-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #977 from CEED/jeremy/fallback-yet-again
Add debugging output to fallback creation
|
| #
6aa95790
|
| 06-Jun-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
pc - fix fallback for composite assembly
|
| #
2459f3f1
|
| 18-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #925 from CEED/gpu-assemble
Add some matrix assembly support to GPU backends
|
| #
59ad764a
|
| 18-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add fallback kernel for larger element sizes in GPU assembly
|
| #
a835093f
|
| 17-Mar-2022 |
nbeams <246972+nbeams@users.noreply.github.com> |
Add LinearAssemble HIP reference implementation for low-order elements
|
| #
ce18bed9
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #858 from CEED/jeremy/dump-copy-stuff
Strip redundant/outdated license info duplication
|
| #
3d8e8822
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - update copyright headers
|
| #
60224bc5
|
| 14-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #913 from CEED/jeremy/coo-ptrdiff
Create CeedSize as ptrdiff_t
|
| #
1f9221fe
|
| 11-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
vec - use CeedSize for vector lengths
|
| #
51d630a3
|
| 24-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #864 from CEED/jeremy/gpu-templates
GPU - pull quoted kernels into separate files
|
| #
437930d1
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - pull quoted kernels into separate files
|
| #
d92fedf5
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #863 from CEED/jeremy/gpu-jit-code
GPU - separate common code into separate folder
|
| #
0d0321e0
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
style - consistent nameing and style for gpu backends
|