ceed-cuda-shared.h - OpenGrok history log for /libCEED/backends/cuda-shared/ceed-cuda-shared.h

Revision	Date	Author	Comments
# ce18bed9	17-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #858 from CEED/jeremy/dump-copy-stuff Strip redundant/outdated license info duplication
# 3d8e8822	17-Mar-2022	Jeremy L Thompson <jeremy@jeremylt.org>	minor - update copyright headers
# 51d630a3	24-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #864 from CEED/jeremy/gpu-templates GPU - pull quoted kernels into separate files
# 437930d1	22-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	gpu - pull quoted kernels into separate files
# d92fedf5	22-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	Merge pull request #863 from CEED/jeremy/gpu-jit-code GPU - separate common code into separate folder
# 7fcac036	22-Dec-2021	Jeremy L Thompson <jeremy@jeremylt.org>	gpu - split common cuda/hip data into separate folder
# ebc204c0	15-Apr-2021	Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>	Merge pull request #740 from CEED/natalie/device-id Update device ID selection for HIP/CUDA/MAGMA backends
# 6dbfb411	05-Apr-2021	nbeams <246972+nbeams@users.noreply.github.com>	Update device ID selection for HIP/CUDA backends; add for MAGMA backends
# 874019bc	31-Mar-2021	Jed Brown <jed@jedbrown.org>	Merge pull request #716 from CEED/jed/install-backend.h Jed/install backend.h
# ec3da8bc	26-Mar-2021	Jed Brown <jed@jedbrown.org>	Install install backend headers under include/ceed/ This makes it possible to distribute source plugins that provide additional backends. It's also used in MFEM, perhaps temporarily. Deprecate ceed Install install backend headers under include/ceed/ This makes it possible to distribute source plugins that provide additional backends. It's also used in MFEM, perhaps temporarily. Deprecate ceed-backend.h, which was not previously installed, but some users accessed it from an in-place build. Also install CUDA and HIP headers that allow users to provide CUfunction and hipFunction_t. Co-authored-by: Jeremy L. Thompson <jeremy.thompson@colorado.edu> Requested-by: Andrew T. Barker <barker29@llnl.gov> show more ...
# 3d576824	29-Jan-2021	Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>	headers - clearify includes to not rely on transitive includes (#701) * headers - clearify includes to not rely on transitive includes * style - add header recommendations from 'include-what-you- headers - clearify includes to not rely on transitive includes (#701) * headers - clearify includes to not rely on transitive includes * style - add header recommendations from 'include-what-you-use' * style - apply 'include-what-you-use' changes to CUDA backends * style - 'include-what-you-use' for hip backends * style - drop ceed.h includes in gallery qf source * docs - add dev notes for header files * style - header style and alphabetize show more ...
# 621cd461	16-Mar-2020	Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>	Merge pull request #421 from SanderA/sanderarens/fix_ceed_cuda_subclasses Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.
# abfaacbb	17-Nov-2019	Sander Arens <sanderarens@gmail.com>	Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen. Now Ceed_Cuda_ref/shared/gen act like subclasses and can be properly cast to Ceed_Cuda.
# ac421f39	17-Sep-2019	Yohann <dudouit1@llnl.gov>	Improved performance of cuda-gen backend (#341) Thanks-to: Tim Warburton Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar Improved performance of cuda-gen backend (#341) Thanks-to: Tim Warburton Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED. * Add colocated gradient in 3D. * Treat the qFunction by slice in 3d to avoid using too many registers. * Minor fix * Minor fix. * Minor fix * Compute the colocated gradient slice by slice. * Add synchthreads after initialization of the matrices. * Remove code print. * Add a critical #pragma unroll * Fix typo on "collocated". * Remove dead code. * Use ColloGrad3d functions. * Fix cuda-gen backend when collocated gradient is not available. * make style * make style * Add some comments. * Replace int by CeedInt. show more ...
# a62270dd	27-Aug-2019	Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>	Merge pull request #314 from CEED/jeremy/dof-to-node Update DoF to Node and Style Changes
# 8795c945	22-Aug-2019	jeremylt <jeremy.thompson@colorado.edu>	Rename NDoF to NNodes and style updates
# 1226057f	27-Jun-2019	Yohann Dudouit <yohann.dudouit@gmail.com>	Merge branch 'master' into yohann/cuda-restr-opt Conflicts: backends/cuda-reg/ceed-cuda-reg-restriction.c backends/cuda-shared/ceed-cuda-shared-basis.c
# 9d77422e	26-Jun-2019	Jed Brown <jed@jedbrown.org>	Merge branch 'yohann/cuda-non-tensor' [PR #249] * yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functi Merge branch 'yohann/cuda-non-tensor' [PR #249] * yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functions. Minor: styling Add CUDA_LIB_DIR_STUBS for systems that don't have CUDA drivers installed make style Remove useless function declaration. Add a reference non-tensor BasisApply for cuda backends. show more ...
# df4cfd6d	04-Jun-2019	Yohann Dudouit <yohann.dudouit@gmail.com>	Remove dead or unnecessary code.
# 074be161	03-Jun-2019	Yohann Dudouit <yohann.dudouit@gmail.com>	Optimization of weight kernel and dynamic allocation of shared memory. - First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per Optimization of weight kernel and dynamic allocation of shared memory. - First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per block) - Switch to dynamic shared memory allocation in order to batch elements for interpolation and gradient in cuda-shared backend. - Add GetPreferedMemoryType for cuda-reg and cuda-shared backends. (Can be removed in the future with delegation of this function) show more ...
# 469f0220	16-May-2019	Yohann Dudouit <yohann.dudouit@gmail.com>	Remove useless function declaration.
# c532df63	16-May-2019	Yohann <dudouit1@llnl.gov>	Cuda backend using shared memory (#247) Add a GPU backend based on Cuda using shared memory. * Draft of a shared memory backend * New basis apply passes all tests. * Add the possibility to Cuda backend using shared memory (#247) Add a GPU backend based on Cuda using shared memory. * Draft of a shared memory backend * New basis apply passes all tests. * Add the possibility to treat several elements in one block of threads. * Fix an error in 2D and 3D gradient. * Put the cuda-shared backend in its own folder. * Minor cleaning. * Replace <ceed-impl.h> with <ceed-backend.h> * make style * Add a few CeedChk_Cu show more ...
12