| c47bfe2b | 16-Feb-2022 |
Jed Brown <jed@jedbrown.org> |
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A pro
backends/cuda-shared: limit 1D thread counts
We need to avoid this error:
CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: max_threads_per_block 512 on block size (24,1,32), shared_size 0, num_regs 106
A proper solution is to use cuOccupancyMaxPotentialBlockSize to place a number of elements per block that stays within resource limits. This would involve a bit more refactoring to do cleanly.
show more ...
|