History log of /petsc/src/mat/impls/sell/seq/seqcuda/makefile (Results 1 – 6 of 6)
Revision Date Author Comments
# 60259892 26-Dec-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'barry/2023-12-22/rm-libbase' into 'main'

LIBBASE is no longer used in make so remove it

See merge request petsc/petsc!7139


# 9140fee1 22-Dec-2023 Barry Smith <bsmith@mcs.anl.gov>

LIBBASE is no longer used in make so remove it


# 360cdf6b 28-Oct-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'barry/2023-10-25/rename-rules-doc' into 'main'

Rename rules.doc and rules.utils because GitLab treats the former as a MS Word document.

See merge request petsc/petsc!6965


# cb5db241 25-Oct-2023 Barry Smith <bsmith@mcs.anl.gov>

Rename rules.doc and rules.utils because GitLab treats the former as a MS Word document.

Thanks-to: Jed Brown


# dd874c20 10-Apr-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'hongzh/sell-cuda' into 'main'

SELL-based SpMV

See merge request petsc/petsc!3428


# 2d1451d4 09-Jan-2020 Hong Zhang <hongzhang@anl.gov>

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice height for CUDA version
- By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied
- Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA
- Make the FLOPS count for SELL same as that for AIJCUSPARSE.
- MatDisAssemble is not needed.
- Change slice height from 32 to 16 for GPU
- To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part.
- SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8.

Funded-by:
Project: PETSc for GPU
Time: 42 hours
Reported-by:
Thanks-to:

show more ...