History log of /petsc/src/mat/impls/sell/mpi/mpicuda/mpicudamatimpl.h (Results 1 – 5 of 5)
Revision Date Author Comments
# 9dd11ecf 25-Aug-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'jacobf/2023-08-17/header-guard-check' into 'main'

Check header guards

See merge request petsc/petsc!6822


# a4963045 18-Aug-2023 Jacob Faibussowitsch <jacob.fai@gmail.com>

Convert all header guards to pragma once


# 3ea99036 17-Aug-2023 Jacob Faibussowitsch <jacob.fai@gmail.com>

Fix some malformed if !defined() header guards


# dd874c20 10-Apr-2023 Satish Balay <balay@mcs.anl.gov>

Merge branch 'hongzh/sell-cuda' into 'main'

SELL-based SpMV

See merge request petsc/petsc!3428


# 2d1451d4 09-Jan-2020 Hong Zhang <hongzhang@anl.gov>

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice

Initial commit for porting SELL to GPU

- Add tiled SPMV and basic SpMVfor SeqSELL
- Tested in serial
- Offloadmask is used to determine when the matrix should be copied to GPU
- Use different slice height for CUDA version
- By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied
- Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA
- Make the FLOPS count for SELL same as that for AIJCUSPARSE.
- MatDisAssemble is not needed.
- Change slice height from 32 to 16 for GPU
- To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part.
- SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8.

Funded-by:
Project: PETSc for GPU
Time: 42 hours
Reported-by:
Thanks-to:

show more ...