ceed-cuda-shared-basis.c - OpenGrok history log for /libCEED/backends/cuda-shared/ceed-cuda-shared-basis.c

Revision	Date	Author	Comments
# 2b730f8b	17-Nov-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Switch to clang-format (#1051) * style - switch to clang-format * ci - use newer libxsmm * action - update format action * format - consistent use of {} for multi-line if/for * make - re Switch to clang-format (#1051) * style - switch to clang-format * ci - use newer libxsmm * action - update format action * format - consistent use of {} for multi-line if/for * make - remove stray newline * make - simpler 'make format' target * ci - use newer libxsmm * doc - minor release note claification * minor - minor fix * minor - minor fix * minor - minor fix * minor - minor fix * make format * format - less aggressive alignment rules * tidy - check for argument name mismatches * fix newline * format - mirror Ratel update to .clang-format * fix merge error * fix merge conflict * fix merge error * drop style in .phony list * Update .clang-format Co-authored-by: Jed Brown <jed@jedbrown.org> * apply updated format Co-authored-by: Jed Brown <jed@jedbrown.org> show more ...
# 9e201c85	23-Sep-2022	Yohann <dudouit1@llnl.gov>	Refactor `cuda-gen` and `hip-gen` backends. (#1050) * Add TODO items. * rough, but something like this? * wip - cleaning up some warnings, but more remain * wip - reorganize * wip - miss Refactor `cuda-gen` and `hip-gen` backends. (#1050) * Add TODO items. * rough, but something like this? * wip - cleaning up some warnings, but more remain * wip - reorganize * wip - missing kernels * wip - replace t1d * fix some kernels * another typo * more * another one * closer * define T_1D * typosgit add .! * WIP: changes to cuda-shared framework for new kernels * fix output writing * buffer fix * buffer sizes * WIP: fixes for 2 and 3D basis kernels * minor * fix weight kernel for 3d * remove debugging output * minor reorg * fix includes * enable collo grad for cuda-shared * move quoted kernels * renaming * missed a rename * small fix * more naming consistency * faster 'useCollograd=false' path in -gen more style * one last style fix * clearer collograd condition * Add gen basis kernels to hip-shared * Try some changes to hip-shared basis block sizes for new kernels * cuda - drop extra kernel arg * cuda - fix collograd check logic * update gen comment about parallelization * tidy up fields struct definition * tidy up structs even more * Update hip-gen basis templates use and move other hip-gen device functions to jit-source * Finish hip-gen basis template update; small style updates to match CUDA * missing isStrided * Update block size used in 3D weight for new shared kernels * update release notes Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> Co-authored-by: nbeams <246972+nbeams@users.noreply.github.com> show more ...
# 18562a3a	08-Apr-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #935 from CEED/jeremy/install-folder Install qf and kernels
# ee5a26f2	04-Apr-2022	Jeremy L Thompson <jeremy@jeremylt.org>	jit - add interface for adding additional jit source dirs
# a0154ade	04-Apr-2022	Jed Brown <jed@jedbrown.org>	move include/ceed-jit-source to include/ceed/jit-source
# 6eb0d8b4	01-Apr-2022	Jeremy L Thompson <jeremy@jeremylt.org>	jit - use relpath from include/ceed-jit-source for jit source files
# ce18bed9	17-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #858 from CEED/jeremy/dump-copy-stuff Strip redundant/outdated license info duplication
# 3d8e8822	17-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	minor - update copyright headers
# 60224bc5	14-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #913 from CEED/jeremy/coo-ptrdiff Create CeedSize as ptrdiff_t
# 1f9221fe	11-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	vec - use CeedSize for vector lengths
# e2cfdb03	18-Feb-2022	Jed Brown <jed@jedbrown.org>	Merge pull request #902 from CEED/jed/cuda-block-sizes backends cuda-shared: fix launch bounds to avoid invalid z dimension
# e6f67ff7	18-Feb-2022	Jed Brown <jed@jedbrown.org>	backends cuda-shared: fix launch bounds to avoid invalid z dimension The typical max z dimension size of a thread block is 64 and we were computing larger values (like 85) in some cases.
# 8b036261	16-Feb-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #899 from CEED/jed/cu-lv-cuda-11.6 CI: update lv for cuda-11.6
# c47bfe2b	16-Feb-2022	Jed Brown <jed@jedbrown.org>	backends/cuda-shared: limit 1D thread counts We need to avoid this error: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106 A pro backends/cuda-shared: limit 1D thread counts We need to avoid this error: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106 A proper solution is to use cuOccupancyMaxPotentialBlockSize to place a number of elements per block that stays within resource limits. This would involve a bit more refactoring to do cleanly. show more ...
# 51d630a3	24-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #864 from CEED/jeremy/gpu-templates GPU - pull quoted kernels into separate files
# d7d111ec	23-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	gpu - style consistency
# 46dc0734	23-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	gpu - improved human-readability of debugging output
# 437930d1	22-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	gpu - pull quoted kernels into separate files
# d92fedf5	22-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #863 from CEED/jeremy/gpu-jit-code GPU - separate common code into separate folder
# 6d69246a	21-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	cuda - separate compile functionality into new header
# 9c774edd	17-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	vec/qf - initial valid/borrowed/owned split for data (#853) * vec/qf - initial valid/borrowed/owned split for data * vec/qf - tidy logic for checking active/stale data * minor - add missing NU vec/qf - initial valid/borrowed/owned split for data (#853) * vec/qf - initial valid/borrowed/owned split for data * vec/qf - tidy logic for checking active/stale data * minor - add missing NULL * doc - explain VectorTakeArray update * minor - update error messages * test - update error message in junit/tap * gpu - fix stray CeedScalar vs void for QFunctionContext * vec/qf - clarify/simplify access logic * vec - calloc host arrays when no value set to make empty * style - minor * style - minor * minor - fix error messages * vec/qf - move data validity checking to backend interface * gpu - add missing sync error checking for qfcontext * gpu - homogonize use of impl for backend data to reduce confusion * vec - clarify access conditions * python - update test for stricter vector access * vec - minor fixes * minor - fix ipython change * vec - add missing declarations in ceed/backend.h * ctx - mirror vector borrowed data check in ctx interface * vec - add CeedVectorGetArrayWrite * vec - consistent use of CeedVectorGetArray vs CeedVectorGetArrayWrite * python - small vec fixes * doc - describe vector data semantics * magma - update restriction * gpu - fix restr bug I added, need to sum into target * magma - fix restriction bug * cpu - fix restriction bug here too * op - fix evec allocations * julia - fix ElemRestriction for new vector access rules * op - double check GetArray vs Read vs Write usage * doc - small fix * restr - clean up read/write logic for restr * python - add vec.array_write * magma - typo fix show more ...
# 80a9ef05	02-Sep-2021	Natalie Beams <246972+nbeams@users.noreply.github.com>	Allow CeedScalar to be single precision (#788) One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust, Allow CeedScalar to be single precision (#788) One can modify `ceed.h` to include `ceed-f32.h` and then use single precision. This is tested for C in CI and has been tested by developers with Rust, Julia, and Python. This interface is evolving and should be considered experimental at this time (thus lack of automated build support). * Introduce CeedScalarType enum * WIP changes to allow different definitions of CeedScalar * Introduce new header files for float and double * Only use avx tensor contract and MAGMA non-tensor basis if CeedScalar is double * WIP changes to allow CeedScalar to be float * WIP start trying to adjust test tolerances for float or double * fix typos in comments * install ceed-f32/64 headers * Fix missing casts for hipMAGMA element restrictions * make CeedQFunctionContextGetContextSize available for Python bindings * Changes to Python bindings to allow CeedScalar to be float * WIP adjust Python tests for float or double * make style * remove QFunctionContextGetContextSize from backend header * Use quotes instead of <> in include statement * Remove unncessary includes * Update tolerances for tests * [Julia] allow CeedScalar to be Float32 * [Julia] Use Preferences instead of custom build configuration # Conflicts: # julia/LibCEED.jl/src/C.jl * [Makefile] Change definition of CC_VENDOR so it works with cross-compilation * [Julia] Use Preferences in CI # Conflicts: # .github/workflows/julia-test-with-style.yml * [Julia] Update docs about preferences * [Julia] Add test/Project.toml workaround for Preferences * Add CeedGetScalarType to get the type of CeedScalar at runtime * [Julia] Move functions from Ceed.jl to LibCEED.jl * [Julia] Add support for getting library path and scalar type at runtime * [Julia] Minor change to checking if CUDA is loaded * [Julia] Check correct CeedScalar types in basis functions * [Julia] Fix tests comparing with output file * [Julia] Change devtests to use CeedScalar instead of Float64 * Update test 402 so context will be same size in double or float * Update tolerances for ceed examples * [Julia] CUDA fixes * remove unused variable in t208 * SchurDecomposition: do not compute tau on final iteration * Update tolerances for some basis tests (for single precision) * Make style * Python style fixes for basis test * Add single precision output for t300 and t320 and adjust checks; skip t541 in single * Add LCOV exclusions after moving to new line * fix spacing * Python: make CEED_EPSILON available as libceed.EPSILON * Python: optional parameter to specify different output file for test comparison * Python: update tests' use of EPSILON and change test_300 output file for single precision * Python: add convenience function for getting dtype corresponding to CeedScalar * rust - add single precision support * [Julia] Fall back on Float64 if CeedGetScalarType is not available * [Julia] style * Adjust tolerance for t301 * xsmm - add single precision support * avx - add single precision support * Add initial single precision support for MAGMA non-tensor basis * Skip t300 and t320 in single precision; revert Python t300 changes * Revert output changes for t300 and t320 in junit * [Julia] Changes to autogenerated bindings for mixed precision * [Julia] style * [Julia] Check scalar type when changing libceed library path The check is also performed when the package is loaded. This prevents having to restart the Julia session twice * [Julia] Require JLLWrappers version 1.3 This is needed to use Preferences to change the library path * Add documentation page for precision development Co-authored-by: Will Pazner <will.e.p@gmail.com> * Cleanup from merge: remove old README * Return CEED_ALIGN to backend.h * Make Fortran compiler (FC) optional; empty skips Fortran tests Use in Python and Rust builds, which may not have a Fortran compiler installed and thus would produce confusing output. * Add single precision CI test for Noether Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> Co-authored-by: Will Pazner <will.e.p@gmail.com> Co-authored-by: Jeremy L Thompson <jeremy@jeremylt.org> Co-authored-by: Jed Brown <jed@jedbrown.org> show more ...
# ebc204c0	15-Apr-2021	Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>	Merge pull request #740 from CEED/natalie/device-id Update device ID selection for HIP/CUDA/MAGMA backends
# 6dbfb411	05-Apr-2021	nbeams <246972+nbeams@users.noreply.github.com>	Update device ID selection for HIP/CUDA backends; add for MAGMA backends
# 874019bc	31-Mar-2021	Jed Brown <jed@jedbrown.org>	Merge pull request #716 from CEED/jed/install-backend.h Jed/install backend.h
1 2 345 6