History log of /libCEED/backends/cuda-shared/ceed-cuda-shared.h (Results 26 – 47 of 47)
Revision Date Author Comments
# ce18bed9 17-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #858 from CEED/jeremy/dump-copy-stuff

Strip redundant/outdated license info duplication


# 3d8e8822 17-Mar-2022 Jeremy L Thompson <jeremy@jeremylt.org>

minor - update copyright headers


# 51d630a3 24-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #864 from CEED/jeremy/gpu-templates

GPU - pull quoted kernels into separate files


# 437930d1 22-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

gpu - pull quoted kernels into separate files


# d92fedf5 22-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

Merge pull request #863 from CEED/jeremy/gpu-jit-code

GPU - separate common code into separate folder


# 7fcac036 22-Dec-2021 Jeremy L Thompson <jeremy@jeremylt.org>

gpu - split common cuda/hip data into separate folder


# ebc204c0 15-Apr-2021 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #740 from CEED/natalie/device-id

Update device ID selection for HIP/CUDA/MAGMA backends


# 6dbfb411 05-Apr-2021 nbeams <246972+nbeams@users.noreply.github.com>

Update device ID selection for HIP/CUDA backends; add for MAGMA backends


# 874019bc 31-Mar-2021 Jed Brown <jed@jedbrown.org>

Merge pull request #716 from CEED/jed/install-backend.h

Jed/install backend.h


# ec3da8bc 26-Mar-2021 Jed Brown <jed@jedbrown.org>

Install install backend headers under include/ceed/

This makes it possible to distribute source plugins that provide
additional backends. It's also used in MFEM, perhaps temporarily.

Deprecate ceed

Install install backend headers under include/ceed/

This makes it possible to distribute source plugins that provide
additional backends. It's also used in MFEM, perhaps temporarily.

Deprecate ceed-backend.h, which was not previously installed, but some
users accessed it from an in-place build.

Also install CUDA and HIP headers that allow users to provide CUfunction
and hipFunction_t.

Co-authored-by: Jeremy L. Thompson <jeremy.thompson@colorado.edu>
Requested-by: Andrew T. Barker <barker29@llnl.gov>

show more ...


# 3d576824 29-Jan-2021 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

headers - clearify includes to not rely on transitive includes (#701)

* headers - clearify includes to not rely on transitive includes

* style - add header recommendations from 'include-what-you-

headers - clearify includes to not rely on transitive includes (#701)

* headers - clearify includes to not rely on transitive includes

* style - add header recommendations from 'include-what-you-use'

* style - apply 'include-what-you-use' changes to CUDA backends

* style - 'include-what-you-use' for hip backends

* style - drop ceed.h includes in gallery qf source

* docs - add dev notes for header files

* style - header style and alphabetize

show more ...


# 621cd461 16-Mar-2020 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #421 from SanderA/sanderarens/fix_ceed_cuda_subclasses

Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.


# abfaacbb 17-Nov-2019 Sander Arens <sanderarens@gmail.com>

Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.

Now Ceed_Cuda_ref/shared/gen act like subclasses and can be properly cast to Ceed_Cuda.


# ac421f39 17-Sep-2019 Yohann <dudouit1@llnl.gov>

Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar

Improved performance of cuda-gen backend (#341)

Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.

* Add colocated gradient in 3D.

* Treat the qFunction by slice in 3d to avoid using too many registers.

* Minor fix

* Minor fix.

* Minor fix

* Compute the colocated gradient slice by slice.

* Add synchthreads after initialization of the matrices.

* Remove code print.

* Add a critical #pragma unroll

* Fix typo on "collocated".

* Remove dead code.

* Use ColloGrad3d functions.

* Fix cuda-gen backend when collocated gradient is not available.

* make style

* make style

* Add some comments.

* Replace int by CeedInt.

show more ...


# a62270dd 27-Aug-2019 Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com>

Merge pull request #314 from CEED/jeremy/dof-to-node

Update DoF to Node and Style Changes


# 8795c945 22-Aug-2019 jeremylt <jeremy.thompson@colorado.edu>

Rename NDoF to NNodes and style updates


# 1226057f 27-Jun-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Merge branch 'master' into yohann/cuda-restr-opt

Conflicts:
backends/cuda-reg/ceed-cuda-reg-restriction.c
backends/cuda-shared/ceed-cuda-shared-basis.c


# 9d77422e 26-Jun-2019 Jed Brown <jed@jedbrown.org>

Merge branch 'yohann/cuda-non-tensor' [PR #249]

* yohann/cuda-non-tensor:
ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError
make style.
namespace cuda backends functi

Merge branch 'yohann/cuda-non-tensor' [PR #249]

* yohann/cuda-non-tensor:
ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError
make style.
namespace cuda backends functions.
Minor: styling
Add CUDA_LIB_DIR_STUBS for systems that don't have CUDA drivers installed
make style
Remove useless function declaration.
Add a reference non-tensor BasisApply for cuda backends.

show more ...


# df4cfd6d 04-Jun-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Remove dead or unnecessary code.


# 074be161 03-Jun-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Optimization of weight kernel and dynamic allocation of shared memory.

- First optimization of weight kernel, kernels are now coalesce but
might not be fully using SMs (need to batch elements per

Optimization of weight kernel and dynamic allocation of shared memory.

- First optimization of weight kernel, kernels are now coalesce but
might not be fully using SMs (need to batch elements per block)
- Switch to dynamic shared memory allocation in order to batch elements
for interpolation and gradient in cuda-shared backend.
- Add GetPreferedMemoryType for cuda-reg and cuda-shared backends.
(Can be removed in the future with delegation of this function)

show more ...


# 469f0220 16-May-2019 Yohann Dudouit <yohann.dudouit@gmail.com>

Remove useless function declaration.


# c532df63 16-May-2019 Yohann <dudouit1@llnl.gov>

Cuda backend using shared memory (#247)

Add a GPU backend based on Cuda using shared memory.

* Draft of a shared memory backend

* New basis apply passes all tests.

* Add the possibility to

Cuda backend using shared memory (#247)

Add a GPU backend based on Cuda using shared memory.

* Draft of a shared memory backend

* New basis apply passes all tests.

* Add the possibility to treat several elements in one block of threads.

* Fix an error in 2D and 3D gradient.

* Put the cuda-shared backend in its own folder.

* Minor cleaning.

* Replace <ceed-impl.h> with <ceed-backend.h>

* make style

* Add a few CeedChk_Cu

show more ...


12