| #
60259892
|
| 26-Dec-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-12-22/rm-libbase' into 'main'
LIBBASE is no longer used in make so remove it
See merge request petsc/petsc!7139
|
| #
9140fee1
|
| 22-Dec-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
LIBBASE is no longer used in make so remove it
|
| #
360cdf6b
|
| 28-Oct-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'barry/2023-10-25/rename-rules-doc' into 'main'
Rename rules.doc and rules.utils because GitLab treats the former as a MS Word document.
See merge request petsc/petsc!6965
|
| #
cb5db241
|
| 25-Oct-2023 |
Barry Smith <bsmith@mcs.anl.gov> |
Rename rules.doc and rules.utils because GitLab treats the former as a MS Word document.
Thanks-to: Jed Brown
|
| #
dd874c20
|
| 10-Apr-2023 |
Satish Balay <balay@mcs.anl.gov> |
Merge branch 'hongzh/sell-cuda' into 'main'
SELL-based SpMV
See merge request petsc/petsc!3428
|
| #
2d1451d4
|
| 09-Jan-2020 |
Hong Zhang <hongzhang@anl.gov> |
Initial commit for porting SELL to GPU
- Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice
Initial commit for porting SELL to GPU
- Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice height for CUDA version - By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied - Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA - Make the FLOPS count for SELL same as that for AIJCUSPARSE. - MatDisAssemble is not needed. - Change slice height from 32 to 16 for GPU - To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part. - SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8.
Funded-by: Project: PETSc for GPU Time: 42 hours Reported-by: Thanks-to:
show more ...
|