mpicudamatimpl.h - OpenGrok history log for /petsc/src/mat/impls/sell/mpi/mpicuda/mpicudamatimpl.h

Revision	Date	Author	Comments
# 9dd11ecf	25-Aug-2023	Satish Balay <balay@mcs.anl.gov>	Merge branch 'jacobf/2023-08-17/header-guard-check' into 'main' Check header guards See merge request petsc/petsc!6822
# a4963045	18-Aug-2023	Jacob Faibussowitsch <jacob.fai@gmail.com>	Convert all header guards to pragma once
# 3ea99036	17-Aug-2023	Jacob Faibussowitsch <jacob.fai@gmail.com>	Fix some malformed if !defined() header guards
# dd874c20	10-Apr-2023	Satish Balay <balay@mcs.anl.gov>	Merge branch 'hongzh/sell-cuda' into 'main' SELL-based SpMV See merge request petsc/petsc!3428
# 2d1451d4	09-Jan-2020	Hong Zhang <hongzhang@anl.gov>	Initial commit for porting SELL to GPU - Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice Initial commit for porting SELL to GPU - Add tiled SPMV and basic SpMVfor SeqSELL - Tested in serial - Offloadmask is used to determine when the matrix should be copied to GPU - Use different slice height for CUDA version - By checking the nonzerostate, PETSc can decide if the whole matrix need to be copied or just the values need to be copied - Make the convert function public so that the very slow MatConvert_Basic can be avoided sometimes. E.g. one can use a two-step convert method: AIJ->SELL,SELL->SELLCUDA instead of the direct convert AIJ->SELLCUDA - Make the FLOPS count for SELL same as that for AIJCUSPARSE. - MatDisAssemble is not needed. - Change slice height from 32 to 16 for GPU - To overlap communication with MatMult, VecScatterBegin() should be called before MatMult() for the diagonal part. - SLICE_HEIGHT is defined to be 32 to match the warp size of GPU. For other cases, it is still 8. Funded-by: Project: PETSc for GPU Time: 42 hours Reported-by: Thanks-to: show more ...