minor - update copyright headers
backend - use const for backend rstr data, should never change after set
backend - use new user data copy utility
fix - need C string for debug printing
fix - print actual source code, not the defs only
jit - write debug info when CUDA/HIP fails to compile
internal - more updates for const
style - fixes for CUDA backends
style - fix header guards
Whitespace, style, and formatting updates for consistency between CUDA and HIP backendsAdds include guards in JiT header files, even if not strictly necessary, to match the precedent set in cuda-sh
Whitespace, style, and formatting updates for consistency between CUDA and HIP backendsAdds include guards in JiT header files, even if not strictly necessary, to match the precedent set in cuda-shared and hip-shared as well as sycl.
show more ...
gpu - naming consistency fixes
ceed - move GetResourceRoot to backend interface
minor - clean up backend headers for const and argument names
Merge pull request #1197 from sebastiangrimberg/sjg/style-whitespace-fixMinor style consistency updates
internal - add CeedCheck macro to reduce repetition
Fix file endings inconsistency
IWYU fixes (#1182)* iwyu - include fixes * iwyu - silence some iwyu output * minor - clearer macro names * iwyu - fix suggestion of "ceed/ceed.h" externally * iwyu - lighter petsc heade
IWYU fixes (#1182)* iwyu - include fixes * iwyu - silence some iwyu output * minor - clearer macro names * iwyu - fix suggestion of "ceed/ceed.h" externally * iwyu - lighter petsc headers * iwyu - ceed/ceed.h -> ceed.h * iwyu - cuda/hip include fixes
magma: non-tensor rtc (#1141)* some refactoring in magma's jit src * fix path * fix loading src * refactor magma nontensor backend * refactor magma nontensor backend * [WIP]: new non
magma: non-tensor rtc (#1141)* some refactoring in magma's jit src * fix path * fix loading src * refactor magma nontensor backend * refactor magma nontensor backend * [WIP]: new nontensor basis kernels * [WIP]: new nontensor basis kernels * [WIP]: new nontensor basis kernels * call the new nontensor kernels for low order problems * multiple compilation for the same kernels but with different tuning parmaters * magma: allow different nb's for different non-tensor kernels * tuning data for the non-tensor rtc kernels * remove no-longer used functions, add new one for tuning the nontensor kernels * constants for tuning * tuning functions * use the tuning functions in compiling/running the new kernels * bug fix * fixes * fixes * minor * switch tuning data * fix name * fix name * add function to run cuda kernels with opt-in shared memory feature * minor fix * minor fix * fix calls to batch api * allow more kernel instances * temporary timing function * temporary timing function * tuning data based on hiprtc * rollback tuning parameters * fixes * fixes * fix inconsistency in the parameters passed to nvrtc/hiprtc * minor * a fix to the nb selector * cleanup * merge the opt-in feature in CeedRunKernelDimSharedOptinCuda into CeedRunKernelDimSharedCuda * fix paths for hip-magma backends * style * fixes * running make format * undo changes from the last commit * change HIP_DIR to ROCM_DIR and adjust the paths for magma accordingly * replace HIP_DIR with ROCM_DIR
minor - assorted formatting fixes
Switch to clang-format (#1051)* style - switch to clang-format * ci - use newer libxsmm * action - update format action * format - consistent use of {} for multi-line if/for * make - re
Switch to clang-format (#1051)* style - switch to clang-format * ci - use newer libxsmm * action - update format action * format - consistent use of {} for multi-line if/for * make - remove stray newline * make - simpler 'make format' target * ci - use newer libxsmm * doc - minor release note claification * minor - minor fix * minor - minor fix * minor - minor fix * minor - minor fix * make format * format - less aggressive alignment rules * tidy - check for argument name mismatches * fix newline * format - mirror Ratel update to .clang-format * fix merge error * fix merge conflict * fix merge error * drop style in .phony list * Update .clang-format Co-authored-by: Jed Brown <jed@jedbrown.org> * apply updated format Co-authored-by: Jed Brown <jed@jedbrown.org>
Refactor `cuda-gen` and `hip-gen` backends. (#1050)* Add TODO items. * rough, but something like this? * wip - cleaning up some warnings, but more remain * wip - reorganize * wip - miss
Refactor `cuda-gen` and `hip-gen` backends. (#1050)* Add TODO items. * rough, but something like this? * wip - cleaning up some warnings, but more remain * wip - reorganize * wip - missing kernels * wip - replace t1d * fix some kernels * another typo * more * another one * closer * define T_1D * typosgit add .! * WIP: changes to cuda-shared framework for new kernels * fix output writing * buffer fix * buffer sizes * WIP: fixes for 2 and 3D basis kernels * minor * fix weight kernel for 3d * remove debugging output * minor reorg * fix includes * enable collo grad for cuda-shared * move quoted kernels * renaming * missed a rename * small fix * more naming consistency * faster 'useCollograd=false' path in *-gen * more style * one last style fix * clearer collograd condition * Add gen basis kernels to hip-shared * Try some changes to hip-shared basis block sizes for new kernels * cuda - drop extra kernel arg * cuda - fix collograd check logic * update gen comment about parallelization * tidy up fields struct definition * tidy up structs even more * Update hip-gen basis templates use and move other hip-gen device functions to jit-source * Finish hip-gen basis template update; small style updates to match CUDA * missing isStrided * Update block size used in 3D weight for new shared kernels * update release notes Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com>
QF headers for typedefs and macros (#1036)* jit - qf headers for typedefs and macros * jit - smaller list of permitted files * ceed - only include ceed.h in QF source
gpu - fix setting device id
backends/cuda: more informative error reporting
12345678