Home
last modified time | relevance | path

Searched hist:"074 be161bac8d8f2ff6efdceafa0bbdf1835071b" (Results 1 – 3 of 3) sorted by relevance

/libCEED/backends/cuda-shared/
H A Dceed-cuda-shared.hdiff 074be161bac8d8f2ff6efdceafa0bbdf1835071b Mon Jun 03 19:41:28 UTC 2019 Yohann Dudouit <yohann.dudouit@gmail.com> Optimization of weight kernel and dynamic allocation of shared memory.

- First optimization of weight kernel, kernels are now coalesce but
might not be fully using SMs (need to batch elements per block)
- Switch to dynamic shared memory allocation in order to batch elements
for interpolation and gradient in cuda-shared backend.
- Add GetPreferedMemoryType for cuda-reg and cuda-shared backends.
(Can be removed in the future with delegation of this function)
H A Dceed-cuda-shared.cdiff 074be161bac8d8f2ff6efdceafa0bbdf1835071b Mon Jun 03 19:41:28 UTC 2019 Yohann Dudouit <yohann.dudouit@gmail.com> Optimization of weight kernel and dynamic allocation of shared memory.

- First optimization of weight kernel, kernels are now coalesce but
might not be fully using SMs (need to batch elements per block)
- Switch to dynamic shared memory allocation in order to batch elements
for interpolation and gradient in cuda-shared backend.
- Add GetPreferedMemoryType for cuda-reg and cuda-shared backends.
(Can be removed in the future with delegation of this function)
H A Dceed-cuda-shared-basis.cdiff 074be161bac8d8f2ff6efdceafa0bbdf1835071b Mon Jun 03 19:41:28 UTC 2019 Yohann Dudouit <yohann.dudouit@gmail.com> Optimization of weight kernel and dynamic allocation of shared memory.

- First optimization of weight kernel, kernels are now coalesce but
might not be fully using SMs (need to batch elements per block)
- Switch to dynamic shared memory allocation in order to batch elements
for interpolation and gradient in cuda-shared backend.
- Add GetPreferedMemoryType for cuda-reg and cuda-shared backends.
(Can be removed in the future with delegation of this function)