| #
ce18bed9
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #858 from CEED/jeremy/dump-copy-stuff
Strip redundant/outdated license info duplication
|
| #
3d8e8822
|
| 17-Mar-2022 |
Jeremy L Thompson <jeremy@jeremylt.org> |
minor - update copyright headers
|
| #
51d630a3
|
| 24-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #864 from CEED/jeremy/gpu-templates
GPU - pull quoted kernels into separate files
|
| #
437930d1
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - pull quoted kernels into separate files
|
| #
d92fedf5
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
Merge pull request #863 from CEED/jeremy/gpu-jit-code
GPU - separate common code into separate folder
|
| #
7fcac036
|
| 22-Dec-2021 |
Jeremy L Thompson <jeremy@jeremylt.org> |
gpu - split common cuda/hip data into separate folder
|
| #
ebc204c0
|
| 15-Apr-2021 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #740 from CEED/natalie/device-id
Update device ID selection for HIP/CUDA/MAGMA backends
|
| #
6dbfb411
|
| 05-Apr-2021 |
nbeams <246972+nbeams@users.noreply.github.com> |
Update device ID selection for HIP/CUDA backends; add for MAGMA backends
|
| #
874019bc
|
| 31-Mar-2021 |
Jed Brown <jed@jedbrown.org> |
Merge pull request #716 from CEED/jed/install-backend.h
Jed/install backend.h
|
| #
ec3da8bc
|
| 26-Mar-2021 |
Jed Brown <jed@jedbrown.org> |
Install install backend headers under include/ceed/
This makes it possible to distribute source plugins that provide additional backends. It's also used in MFEM, perhaps temporarily.
Deprecate ceed
Install install backend headers under include/ceed/
This makes it possible to distribute source plugins that provide additional backends. It's also used in MFEM, perhaps temporarily.
Deprecate ceed-backend.h, which was not previously installed, but some users accessed it from an in-place build.
Also install CUDA and HIP headers that allow users to provide CUfunction and hipFunction_t.
Co-authored-by: Jeremy L. Thompson <jeremy.thompson@colorado.edu> Requested-by: Andrew T. Barker <barker29@llnl.gov>
show more ...
|
| #
3d576824
|
| 29-Jan-2021 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
headers - clearify includes to not rely on transitive includes (#701)
* headers - clearify includes to not rely on transitive includes
* style - add header recommendations from 'include-what-you-
headers - clearify includes to not rely on transitive includes (#701)
* headers - clearify includes to not rely on transitive includes
* style - add header recommendations from 'include-what-you-use'
* style - apply 'include-what-you-use' changes to CUDA backends
* style - 'include-what-you-use' for hip backends
* style - drop ceed.h includes in gallery qf source
* docs - add dev notes for header files
* style - header style and alphabetize
show more ...
|
| #
621cd461
|
| 16-Mar-2020 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #421 from SanderA/sanderarens/fix_ceed_cuda_subclasses
Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.
|
| #
abfaacbb
|
| 17-Nov-2019 |
Sander Arens <sanderarens@gmail.com> |
Add Ceed_Cuda struct to Ceed_Cuda_ref/shared/gen.
Now Ceed_Cuda_ref/shared/gen act like subclasses and can be properly cast to Ceed_Cuda.
|
| #
ac421f39
|
| 17-Sep-2019 |
Yohann <dudouit1@llnl.gov> |
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libPar
Improved performance of cuda-gen backend (#341)
Thanks-to: Tim Warburton
Some of these optimizations are the results of the knowledge and experience gathered by Tim Warburton and his team in libParanumal and then ported to libCEED.
* Add colocated gradient in 3D.
* Treat the qFunction by slice in 3d to avoid using too many registers.
* Minor fix
* Minor fix.
* Minor fix
* Compute the colocated gradient slice by slice.
* Add synchthreads after initialization of the matrices.
* Remove code print.
* Add a critical #pragma unroll
* Fix typo on "collocated".
* Remove dead code.
* Use ColloGrad3d functions.
* Fix cuda-gen backend when collocated gradient is not available.
* make style
* make style
* Add some comments.
* Replace int by CeedInt.
show more ...
|
| #
a62270dd
|
| 27-Aug-2019 |
Jeremy L Thompson <25011573+jeremylt@users.noreply.github.com> |
Merge pull request #314 from CEED/jeremy/dof-to-node
Update DoF to Node and Style Changes
|
| #
8795c945
|
| 22-Aug-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Rename NDoF to NNodes and style updates
|
| #
1226057f
|
| 27-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Merge branch 'master' into yohann/cuda-restr-opt
Conflicts: backends/cuda-reg/ceed-cuda-reg-restriction.c backends/cuda-shared/ceed-cuda-shared-basis.c
|
| #
9d77422e
|
| 26-Jun-2019 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functi
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functions. Minor: styling Add CUDA_LIB_DIR_STUBS for systems that don't have CUDA drivers installed make style Remove useless function declaration. Add a reference non-tensor BasisApply for cuda backends.
show more ...
|
| #
df4cfd6d
|
| 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove dead or unnecessary code.
|
| #
074be161
|
| 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per block) - Switch to dynamic shared memory allocation in order to batch elements for interpolation and gradient in cuda-shared backend. - Add GetPreferedMemoryType for cuda-reg and cuda-shared backends. (Can be removed in the future with delegation of this function)
show more ...
|
| #
469f0220
|
| 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove useless function declaration.
|
| #
c532df63
|
| 16-May-2019 |
Yohann <dudouit1@llnl.gov> |
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to treat several elements in one block of threads.
* Fix an error in 2D and 3D gradient.
* Put the cuda-shared backend in its own folder.
* Minor cleaning.
* Replace <ceed-impl.h> with <ceed-backend.h>
* make style
* Add a few CeedChk_Cu
show more ...
|