| 8795c945 | 22-Aug-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Rename NDoF to NNodes and style updates |
| 1226057f | 27-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Merge branch 'master' into yohann/cuda-restr-opt
Conflicts: backends/cuda-reg/ceed-cuda-reg-restriction.c backends/cuda-shared/ceed-cuda-shared-basis.c |
| 9d77422e | 26-Jun-2019 |
Jed Brown <jed@jedbrown.org> |
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functi
Merge branch 'yohann/cuda-non-tensor' [PR #249]
* yohann/cuda-non-tensor: ceed-cuda: resolve -Wsign-compare for CUresult (unsigned enum) in CeedError make style. namespace cuda backends functions. Minor: styling Add CUDA_LIB_DIR_STUBS for systems that don't have CUDA drivers installed make style Remove useless function declaration. Add a reference non-tensor BasisApply for cuda backends.
show more ...
|
| 961116ec | 17-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
make style. |
| 4a6d4bbd | 17-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
namespace cuda backends functions. |
| 0109ba86 | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Minor: styling |
| df4cfd6d | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove dead or unnecessary code. |
| 3f63d318 | 04-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove dead code. Cuda-reg restriction optimization. |
| 698ebc35 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 3D kernels for cuda-shared backend. |
| d94769d2 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 1D kernels for cuda-shared backend. |
| 4247ecf3 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of 2D kernels for cuda-shared backend. |
| 717ff8a3 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Minor bug fix |
| 074be161 | 03-Jun-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per
Optimization of weight kernel and dynamic allocation of shared memory.
- First optimization of weight kernel, kernels are now coalesce but might not be fully using SMs (need to batch elements per block) - Switch to dynamic shared memory allocation in order to batch elements for interpolation and gradient in cuda-shared backend. - Add GetPreferedMemoryType for cuda-reg and cuda-shared backends. (Can be removed in the future with delegation of this function)
show more ...
|
| a4999edd | 24-May-2019 |
jeremylt <jeremy.thompson@colorado.edu> |
Update Ceed Delegate refrencing |
| 469f0220 | 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Remove useless function declaration. |
| 9ad45357 | 16-May-2019 |
Yohann Dudouit <yohann.dudouit@gmail.com> |
Add a reference non-tensor BasisApply for cuda backends. |
| c532df63 | 16-May-2019 |
Yohann <dudouit1@llnl.gov> |
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to
Cuda backend using shared memory (#247)
Add a GPU backend based on Cuda using shared memory.
* Draft of a shared memory backend
* New basis apply passes all tests.
* Add the possibility to treat several elements in one block of threads.
* Fix an error in 2D and 3D gradient.
* Put the cuda-shared backend in its own folder.
* Minor cleaning.
* Replace <ceed-impl.h> with <ceed-backend.h>
* make style
* Add a few CeedChk_Cu
show more ...
|