| d016bdde | 26-Mar-2025 |
Toby Isaac <toby.isaac@gmail.com> |
Mat: Fix and improve the performance of dense matrix multiplication
Mat: Add MATDENSEFROMVECTYPE constructor type
Now in a tests set you can do
``` testset: args: -mat_type densefromvectype
Mat: Fix and improve the performance of dense matrix multiplication
Mat: Add MATDENSEFROMVECTYPE constructor type
Now in a tests set you can do
``` testset: args: -mat_type densefromvectype test: test_cuda requires: cuda args: -vec_type veccuda test: test_hip requires: hip args: -vec_type vechip ```
(This assumes that you call `MatSetVecType()` before you call `MatSetFromOptions()`)
Mat_MPIDense: Cache offsets of MatDenseGetSubMatrix() to avoid communication in more cases
Mat: Add missing implementations for internal "MatMultColumnRange()" interface
Mat_MPIDense: Fix the zeroing of buffers in multiplication routines
Mat_MPIDense: Add optimization of MatMatMult routines when all columns are owned by rank 0
The communication for intermediate buffers can be handled with allreduce / bcast operations, but we use the PetscSF matvec context instead of MPI routines directly so that we will use gpu-aware MPI if possible.
show more ...
|