1(ch_matrices)= 2 3# Matrices 4 5PETSc provides a variety of matrix implementations because no single 6matrix format is appropriate for all problems. Currently, we support 7dense storage and compressed sparse row storage (both sequential and 8parallel versions) for CPU and GPU based matrices, as well as several specialized formats. Additional 9specialized formats can be easily added. 10 11This chapter describes the basics of using PETSc matrices in general 12(regardless of the particular format chosen) and discusses tips for 13efficient use of the several simple uniprocess and parallel matrix 14types. The use of PETSc matrices involves the following actions: create 15a particular type of matrix, insert values into it, process the matrix, 16use the matrix for various computations, and finally destroy the matrix. 17The application code does not need to know or care about the particular 18storage formats of the matrices. 19 20(sec_matcreate)= 21 22## Creating matrices 23 24As with vectors, PETSc has APIs that allow the user to specify the exact details of the matrix 25creation process but also `DM` based creation routines that handle most of the details automatically 26for specific families of applications. This is done with 27 28``` 29DMCreateMatrix(DM dm,Mat *A) 30``` 31 32The type of matrix created can be controlled with either 33 34``` 35DMSetMatType(DM dm,MatType <MATAIJ or MATBAIJ or MATAIJCUSPARSE etc>) 36``` 37 38or with 39 40``` 41DMSetFromOptions(DM dm) 42``` 43 44and the options database option `-dm_mat_type <aij or baij or aijcusparse etc>` Matrices can be created for CPU usage, for GPU usage and for usage on 45both the CPUs and GPUs. 46 47The creation of `DM` objects is discussed in {any}`sec_struct`, {any}`sec_unstruct`, {any}`sec_network`. 48 49## Low-level matrix creation routines 50 51When using a `DM` is not practical for a particular application one can create matrices directly 52using 53 54``` 55MatCreate(MPI_Comm comm,Mat *A) 56MatSetSizes(Mat A,PetscInt m,PetscInt n,PetscInt M,PetscInt N) 57``` 58 59This routine generates a sequential matrix when running one process and 60a parallel matrix for two or more processes; the particular matrix 61format is set by the user via options database commands. The user 62specifies either the global matrix dimensions, given by `M` and `N` 63or the local dimensions, given by `m` and `n` while PETSc completely 64controls memory allocation. This routine facilitates switching among 65various matrix types, for example, to determine the format that is most 66efficient for a certain application. By default, `MatCreate()` employs 67the sparse AIJ format, which is discussed in detail in 68{any}`sec_matsparse`. See the manual pages for further 69information about available matrix formats. 70 71## Assembling (putting values into) matrices 72 73To insert or add entries to a matrix on CPUs, one can call a variant of 74`MatSetValues()`, either 75 76``` 77MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt idxn[],const PetscScalar values[],INSERT_VALUES); 78``` 79 80or 81 82``` 83MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt idxn[],const PetscScalar values[],ADD_VALUES); 84``` 85 86This routine inserts or adds a logically dense subblock of dimension 87`m*n` into the matrix. The integer indices `idxm` and `idxn`, 88respectively, indicate the global row and column numbers to be inserted. 89`MatSetValues()` uses the standard C convention, where the row and 90column matrix indices begin with zero *regardless of the programming language 91employed*. The array `values` is logically two-dimensional, containing 92the values that are to be inserted. By default the values are given in 93row major order, which is the opposite of the Fortran convention, 94meaning that the value to be put in row `idxm[i]` and column 95`idxn[j]` is located in `values[i*n+j]`. To allow the insertion of 96values in column major order, one can call the command 97 98``` 99MatSetOption(Mat A,MAT_ROW_ORIENTED,PETSC_FALSE); 100``` 101 102*Warning*: Several of the sparse implementations do *not* currently 103support the column-oriented option. 104 105This notation should not be a mystery to anyone. For example, to insert 106one matrix into another when using MATLAB, one uses the command 107`A(im,in) = B;` where `im` and `in` contain the indices for the 108rows and columns. This action is identical to the calls above to 109`MatSetValues()`. 110 111When using the block compressed sparse row matrix format (`MATSEQBAIJ` 112or `MATMPIBAIJ`), one can insert elements more efficiently using the 113block variant, `MatSetValuesBlocked()` or 114`MatSetValuesBlockedLocal()`. 115 116The function `MatSetOption()` accepts several other inputs; see the 117manual page for details. 118 119After the matrix elements have been inserted or added into the matrix, 120they must be processed (also called “assembled”) before they can be 121used. The routines for matrix processing are 122 123``` 124MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY); 125MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY); 126``` 127 128By placing other code between these two calls, the user can perform 129computations while messages are in transit. Calls to `MatSetValues()` 130with the `INSERT_VALUES` and `ADD_VALUES` options *cannot* be mixed 131without intervening calls to the assembly routines. For such 132intermediate assembly calls the second routine argument typically should 133be `MAT_FLUSH_ASSEMBLY`, which omits some of the work of the full 134assembly process. `MAT_FINAL_ASSEMBLY` is required only in the last 135matrix assembly before a matrix is used. 136 137Even though one may insert values into PETSc matrices without regard to 138which process eventually stores them, for efficiency reasons we usually 139recommend generating most entries on the process where they are destined 140to be stored. To help the application programmer with this task for 141matrices that are distributed across the processes by ranges, the 142routine 143 144``` 145MatGetOwnershipRange(Mat A,PetscInt *first_row,PetscInt *last_row); 146``` 147 148informs the user that all rows from `first_row` to `last_row-1` 149(since the value returned in `last_row` is one more than the global 150index of the last local row) will be stored on the local process. 151 152If the `Mat` was obtained from a `DM` with `DMCreateMatrix()`, then the range values are determined by the specific `DM`. 153If the `Mat` was created directly, the range values are determined by the local sizes passed to `MatSetSizes()` or `MatCreateAIJ()` (and such low-level functions for other `MatType`). 154If `PETSC_DECIDE` was passed as the local size, then the vector uses default values for the range using `PetscSplitOwnership()`. 155For certain `DM`, such as `DMDA`, it is better to use `DM` specific routines, such as `DMDAGetGhostCorners()`, to determine 156the local values in the matrix. See {any}`sec_matlayout` for full details on row and column layouts. 157 158In the sparse matrix implementations, once the assembly routines have 159been called, the matrices are compressed and can be used for 160matrix-vector multiplication, etc. Any space for preallocated nonzeros 161that was not filled by a call to `MatSetValues()` or a related routine 162is compressed out by assembling with `MAT_FINAL_ASSEMBLY`. If you 163intend to use that extra space later, be sure to insert explicit zeros 164before assembling with `MAT_FINAL_ASSEMBLY` so the space will not be 165compressed out. Once the matrix has been assembled, inserting new values 166will be expensive since it will require copies and possible memory 167allocation. 168 169One may repeatedly assemble matrices that retain the same 170nonzero pattern (such as within a nonlinear or time-dependent problem). 171Where possible, data structures and communication 172information will be reused (instead of regenerated) during successive 173steps, thereby increasing efficiency. See 174<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex5.c.html">KSP Tutorial ex5</a> 175for a simple example of solving two linear systems that use the same 176matrix data structure. 177 178For matrices associated with `DMDA` there is a higher-level interface for providing 179the numerical values based on the concept of stencils. See the manual page of `MatSetValuesStencil()` for usage. 180 181For GPUs the routines `MatSetPreallocationCOO()` and `MatSetValuesCOO()` should be used for efficient matrix assembly 182instead of `MatSetValues()`. 183 184We now introduce the various families of PETSc matrices. `DMCreateMatrix()` manages 185the preallocation process (introduced below) automatically so many users do not need to 186worry about the details of the preallocation process. 187 188(sec_matlayout)= 189 190### Matrix and Vector Layouts and Storage Locations 191 192The layout of PETSc matrices across MPI ranks is defined by two things 193 194- the layout of the two compatible vectors in the computation of the matrix-vector product y = A * x and 195- the memory where various parts of the matrix are stored across the MPI ranks. 196 197PETSc vectors always have a contiguous range of vector entries stored on each MPI rank. The first rank has entries from 0 to `rend1` - 1, the 198next rank has entries from `rend1` to `rend2` - 1, etc. Thus the ownership range on each rank is from `rstart` to `rend`, these values can be 199obtained with `VecGetOwnershipRange`(`Vec` x, `PetscInt` * `rstart`, `PetscInt` * `rend`). Each PETSc `Vec` has a `PetscLayout` object that contains this information. 200 201All PETSc matrices have two `PetscLayout`s, they define the vector layouts for y and x in the product, y = A * x. Their ownership range information 202can be obtained with `MatGetOwnershipRange()`, `MatGetOwnershipRangeColumn()`, `MatGetOwnershipRanges()`, and `MatGetOwnershipRangesColumn()`. 203Note that `MatCreateVecs()` provides two vectors that have compatible layouts for the associated vector. 204 205For most PETSc matrices, excluding `MATELEMENTAL` and `MATSCALAPACK`, the row ownership range obtained with `MatGetOwnershipRange()` also defines 206where the matrix entries are stored; the matrix entries for rows `rstart` to `rend - 1` are stored on the corresponding MPI rank. For other matrices 207the rank where each matrix entry is stored is more complicated; information about the storage locations can be obtained with `MatGetOwnershipIS()`. 208Note that for 209most PETSc matrices the values returned by `MatGetOwnershipIS()` are the same as those returned by `MatGetOwnershipRange()` and 210`MatGetOwnershipRangeColumn()`. 211 212The PETSc object `PetscLayout` contains the ownership information that is provided by `VecGetOwnershipRange()` and with `MatGetOwnershipRange()`, `MatGetOwnershipRangeColumn()`. Each vector has one layout, which can be obtained with `VecGetLayout()` and `MatGetLayouts()`. Layouts support the routines `PetscLayoutGetLocalSize()`, `PetscLayoutGetSize()`, `PetscLayoutGetBlockSize()`, `PetscLayoutGetRanges()`, `PetscLayoutCompare()` as well as a variety of creation routines. These are used by the `Vec` and `Mat` and so are rarely needed directly. Finally `PetscSplitOwnership()` is a utility routine that does the same splitting of ownership ranges as `PetscLayout`. 213 214(sec_matsparse)= 215 216### Sparse Matrices 217 218The default matrix representation within PETSc is the general sparse AIJ 219format (also called the compressed sparse 220row format, CSR). This section discusses tips for *efficiently* using 221this matrix format for large-scale applications. Additional formats 222(such as block compressed row and block symmetric storage, which are 223generally much more efficient for problems with multiple degrees of 224freedom per node) are discussed below. Beginning users need not concern 225themselves initially with such details and may wish to proceed directly 226to {any}`sec_matoptions`. However, when an application code 227progresses to the point of tuning for efficiency and/or generating 228timing results, it is *crucial* to read this information. 229 230#### Sequential AIJ Sparse Matrices 231 232In the PETSc AIJ matrix formats, we store the nonzero elements by rows, 233along with an array of corresponding column numbers and an array of 234pointers to the beginning of each row. Note that the diagonal matrix 235entries are stored with the rest of the nonzeros (not separately). 236 237To create a sequential AIJ sparse matrix, `A`, with `m` rows and 238`n` columns, one uses the command 239 240``` 241MatCreateSeqAIJ(PETSC_COMM_SELF,PetscInt m,PetscInt n,PetscInt nz,PetscInt *nnz,Mat *A); 242``` 243 244where `nz` or `nnz` can be used to preallocate matrix memory, as 245discussed below. The user can set `nz=0` and `nnz=NULL` for PETSc to 246control all matrix memory allocation. 247 248The sequential and parallel AIJ matrix storage formats by default employ 249*i-nodes* (identical nodes) when possible. We search for consecutive 250rows with the same nonzero structure, thereby reusing matrix information 251for increased efficiency. Related options database keys are 252`-mat_no_inode` (do not use i-nodes) and `-mat_inode_limit <limit>` 253(set i-node limit (max limit=5)). Note that problems with a single degree 254of freedom per grid node will automatically not use i-nodes. 255 256The internal data representation for the AIJ formats employs zero-based 257indexing. 258 259#### Preallocation of Memory for Sequential AIJ Sparse Matrices 260 261The dynamic process of allocating new memory and copying from the old 262storage to the new is *intrinsically very expensive*. Thus, to obtain 263good performance when assembling an AIJ matrix, it is crucial to 264preallocate the memory needed for the sparse matrix. The user has two 265choices for preallocating matrix memory via `MatCreateSeqAIJ()`. 266 267One can use the scalar `nz` to specify the expected number of nonzeros 268for each row. This is generally fine if the number of nonzeros per row 269is roughly the same throughout the matrix (or as a quick and easy first 270step for preallocation). If one underestimates the actual number of 271nonzeros in a given row, then during the assembly process PETSc will 272automatically allocate additional needed space. However, this extra 273memory allocation can slow the computation. 274 275If different rows have very different numbers of nonzeros, one should 276attempt to indicate (nearly) the exact number of elements intended for 277the various rows with the optional array, `nnz` of length `m`, where 278`m` is the number of rows, for example 279 280``` 281PetscInt nnz[m]; 282nnz[0] = <nonzeros in row 0> 283nnz[1] = <nonzeros in row 1> 284.... 285nnz[m-1] = <nonzeros in row m-1> 286``` 287 288In this case, the assembly process will require no additional memory 289allocations if the `nnz` estimates are correct. If, however, the 290`nnz` estimates are incorrect, PETSc will automatically obtain the 291additional needed space, at a slight loss of efficiency. 292 293Using the array `nnz` to preallocate memory is especially important 294for efficient matrix assembly if the number of nonzeros varies 295considerably among the rows. One can generally set `nnz` either by 296knowing in advance the problem structure (e.g., the stencil for finite 297difference problems on a structured grid) or by precomputing the 298information by using a segment of code similar to that for the regular 299matrix assembly. The overhead of determining the `nnz` array will be 300quite small compared with the overhead of the inherently expensive 301`malloc`s and moves of data that are needed for dynamic allocation 302during matrix assembly. Always guess high if an exact value is not known 303(extra space is cheaper than too little). 304 305Thus, when assembling a sparse matrix with very different numbers of 306nonzeros in various rows, one could proceed as follows for finite 307difference methods: 308 3091. Allocate integer array `nnz`. 3102. Loop over grid, counting the expected number of nonzeros for the 311 row(s) associated with the various grid points. 3123. Create the sparse matrix via `MatCreateSeqAIJ()` or alternative. 3134. Loop over the grid, generating matrix entries and inserting in matrix 314 via `MatSetValues()`. 315 316For (vertex-based) finite element type calculations, an analogous 317procedure is as follows: 318 3191. Allocate integer array `nnz`. 3202. Loop over vertices, computing the number of neighbor vertices, which 321 determines the number of nonzeros for the corresponding matrix 322 row(s). 3233. Create the sparse matrix via `MatCreateSeqAIJ()` or alternative. 3244. Loop over elements, generating matrix entries and inserting in matrix 325 via `MatSetValues()`. 326 327The `-info` option causes the routines `MatAssemblyBegin()` and 328`MatAssemblyEnd()` to print information about the success of the 329preallocation. Consider the following example for the `MATSEQAIJ` 330matrix format: 331 332``` 333MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:20 unneeded, 100 used 334MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 0 335``` 336 337The first line indicates that the user preallocated 120 spaces but only 338100 were used. The second line indicates that the user preallocated 339enough space so that PETSc did not have to internally allocate 340additional space (an expensive operation). In the next example the user 341did not preallocate sufficient space, as indicated by the fact that the 342number of mallocs is very large (bad for efficiency): 343 344``` 345MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:47 unneeded, 1000 used 346MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 40000 347``` 348 349Although at first glance such procedures for determining the matrix 350structure in advance may seem unusual, they are actually very efficient 351because they alleviate the need for dynamic construction of the matrix 352data structure, which can be very expensive. 353 354#### Parallel AIJ Sparse Matrices 355 356Parallel sparse matrices with the AIJ format can be created with the 357command 358 359``` 360MatCreateAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,PetscInt d_nz,PetscInt *d_nnz, PetscInt o_nz,PetscInt *o_nnz,Mat *A); 361``` 362 363`A` is the newly created matrix, while the arguments `m`, `M`, and 364`N`, indicate the number of local rows and the number of global rows 365and columns, respectively. In the PETSc partitioning scheme, all the 366matrix columns are local and `n` is the number of columns 367corresponding to the local part of a parallel vector. Either the local or 368global parameters can be replaced with `PETSC_DECIDE`, so that PETSc 369will determine them. The matrix is stored with a fixed number of rows on 370each process, given by `m`, or determined by PETSc if `m` is 371`PETSC_DECIDE`. 372 373If `PETSC_DECIDE` is not used for the arguments `m` and `n`, then 374the user must ensure that they are chosen to be compatible with the 375vectors. To do this, one first considers the matrix-vector product 376$y = A x$. The `m` that is used in the matrix creation routine 377`MatCreateAIJ()` must match the local size used in the vector creation 378routine `VecCreateMPI()` for `y`. Likewise, the `n` used must 379match that used as the local size in `VecCreateMPI()` for `x`. 380 381The user must set `d_nz=0`, `o_nz=0`, `d_nnz=`NULL, and 382`o_nnz=NULL` for PETSc to control dynamic allocation of matrix memory 383space. Analogous to `nz` and `nnz` for the routine 384`MatCreateSeqAIJ()`, these arguments optionally specify nonzero 385information for the diagonal (`d_nz` and `d_nnz`) and off-diagonal 386(`o_nz` and `o_nnz`) parts of the matrix. For a square global 387matrix, we define each process’s diagonal portion to be its local rows 388and the corresponding columns (a square submatrix); each process’s 389off-diagonal portion encompasses the remainder of the local matrix (a 390rectangular submatrix). The rank in the MPI communicator determines the 391absolute ordering of the blocks. That is, the process with rank 0 in the 392communicator given to `MatCreateAIJ()` contains the top rows of the 393matrix; the i$^{th}$ process in that communicator contains the 394i$^{th}$ block of the matrix. 395 396#### Preallocation of Memory for Parallel AIJ Sparse Matrices 397 398As discussed above, preallocation of memory is critical for achieving 399good performance during matrix assembly, as this reduces the number of 400allocations and copies required. We present an example for three 401processes to indicate how this may be done for the `MATMPIAIJ` matrix 402format. Consider the 8 by 8 matrix, which is partitioned by default with 403three rows on the first process, three on the second and two on the 404third. 405 406$$ 407\left( \begin{array}{cccccccccc} 4081 & 2 & 0 & | & 0 & 3 & 0 & | & 0 & 4 \\ 4090 & 5 & 6 & | & 7 & 0 & 0 & | & 8 & 0 \\ 4109 & 0 & 10 & | & 11 & 0 & 0 & | & 12 & 0 \\ 411\hline \\ 41213 & 0 & 14 & | & 15 & 16 & 17 & | & 0 & 0 \\ 4130 & 18 & 0 & | & 19 & 20 & 21 & | & 0 & 0 \\ 4140 & 0 & 0 & | & 22 & 23 & 0 & | & 24 & 0 \\ 415\hline \\ 41625 & 26 & 27 & | & 0 & 0 & 28 & | & 29 & 0 \\ 41730 & 0 & 0 & | & 31 & 32 & 33 & | & 0 &34 418\end{array} \right) 419$$ 420 421The “diagonal” submatrix, `d`, on the first process is given by 422 423$$ 424\left( \begin{array}{ccc} 4251 & 2 & 0 \\ 4260 & 5 & 6 \\ 4279 & 0 & 10 428\end{array} \right), 429$$ 430 431while the “off-diagonal” submatrix, `o`, matrix is given by 432 433$$ 434\left( \begin{array}{ccccc} 435 0 & 3 & 0 & 0 & 4 \\ 436 7 & 0 & 0 & 8 & 0 \\ 437 11 & 0 & 0 & 12 & 0 \\ 438\end{array} \right). 439$$ 440 441For the first process one could set `d_nz` to 2 (since each row has 2 442nonzeros) or, alternatively, set `d_nnz` to $\{2,2,2\}.$ The 443`o_nz` could be set to 2 since each row of the `o` matrix has 2 444nonzeros, or `o_nnz` could be set to $\{2,2,2\}$. 445 446For the second process the `d` submatrix is given by 447 448$$ 449\left( \begin{array}{cccccccccc} 450 15 & 16 & 17 \\ 451 19 & 20 & 21 \\ 452 22 & 23 & 0 453\end{array} \right) . 454$$ 455 456Thus, one could set `d_nz` to 3, since the maximum number of nonzeros 457in each row is 3, or alternatively one could set `d_nnz` to 458$\{3,3,2\}$, thereby indicating that the first two rows will have 4593 nonzeros while the third has 2. The corresponding `o` submatrix for 460the second process is 461 462$$ 463\left( \begin{array}{cccccccccc} 46413 & 0 & 14 & 0 & 0 \\ 4650 & 18 & 0 & 0 & 0 \\ 4660 & 0 & 0 & 24 & 0 \\ 467\end{array} \right) 468$$ 469 470so that one could set `o_nz` to 2 or `o_nnz` to {2,1,1}. 471 472Note that the user never directly works with the `d` and `o` 473submatrices, except when preallocating storage space as indicated above. 474Also, the user need not preallocate exactly the correct amount of space; 475as long as a sufficiently close estimate is given, the high efficiency 476for matrix assembly will remain. 477 478As described above, the option `-info` will print information about 479the success of preallocation during matrix assembly. For the 480`MATMPIAIJ` and `MATMPIBAIJ` formats, PETSc will also list the 481number of elements owned by on each process that were generated on a 482different process. For example, the statements 483 484``` 485MatAssemblyBegin_MPIAIJ:Stash has 10 entries, uses 0 mallocs 486MatAssemblyBegin_MPIAIJ:Stash has 3 entries, uses 0 mallocs 487MatAssemblyBegin_MPIAIJ:Stash has 5 entries, uses 0 mallocs 488``` 489 490indicate that very few values have been generated on different 491processes. On the other hand, the statements 492 493``` 494MatAssemblyBegin_MPIAIJ:Stash has 100000 entries, uses 100 mallocs 495MatAssemblyBegin_MPIAIJ:Stash has 77777 entries, uses 70 mallocs 496``` 497 498indicate that many values have been generated on the “wrong” processes. 499This situation can be very inefficient, since the transfer of values to 500the “correct” process is generally expensive. By using the command 501`MatGetOwnershipRange()` in application codes, the user should be able 502to generate most entries on the owning process. 503 504*Note*: It is fine to generate some entries on the “wrong” process. 505Often this can lead to cleaner, simpler, less buggy codes. One should 506never make code overly complicated in order to generate all values 507locally. Rather, one should organize the code in such a way that *most* 508values are generated locally. 509 510The routine `MatCreateAIJCUSPARSE()` allows one to create GPU based matrices for NVIDIA systems. 511`MatCreateAIJKokkos()` can create matrices for use with CPU, OpenMP, NVIDIA, AMD, or Intel based GPU systems. 512 513It is sometimes difficult to compute the required preallocation information efficiently, hence PETSc provides a 514special `MatType`, `MATPREALLOCATOR` that helps make computing this information more straightforward. One first creates a matrix of this type and then, using the same 515code that one would use to actually compute the matrices numerical values, calls `MatSetValues()` for this matrix, without needing to provide any 516preallocation information (one need not provide the matrix numerical values). Once this is complete one uses `MatPreallocatorPreallocate()` to 517provide the accumulated preallocation information to 518the actual matrix one will use for the computations. We hope to simplify this process in the future, allowing the removal of `MATPREALLOCATOR`, 519instead simply allowing the use of its efficient insertion process automatically during the first assembly of any matrix type directly without 520requiring the detailed preallocation information. 521 522See {any}`doc_matrix` for a table of the matrix types available in PETSc. 523 524(sec_matlmvm)= 525 526#### Limited-Memory Variable Metric (LMVM) Matrices 527 528Variable metric methods, also known as quasi-Newton methods, are 529frequently used for root finding problems and approximate Jacobian 530matrices or their inverses via sequential nonlinear updates based on the 531secant condition. The limited-memory variants do not store the full 532explicit Jacobian, and instead compute forward products and inverse 533applications based on a fixed number of stored update vectors. 534 535```{eval-rst} 536.. list-table:: PETSc LMVM matrix implementations. 537 :name: tab_matlmvmimpl 538 :header-rows: 1 539 540 * - Method 541 - PETSc Type 542 - Name 543 - Property 544 * - "Good" Broyden :cite:`keyprefix-griewank2012broyden` 545 - ``MATLMVMBrdn`` 546 - ``lmvmbrdn`` 547 - Square 548 * - "Bad" Broyden :cite:`keyprefix-griewank2012broyden` 549 - ``MATLMVMBadBrdn`` 550 - ``lmvmbadbrdn`` 551 - Square 552 * - Symmetric Rank-1 :cite:`keyprefix-nocedal2006numerical` 553 - ``MATLMVMSR1`` 554 - ``lmvmsr1`` 555 - Symmetric 556 * - Davidon-Fletcher-Powell (DFP) :cite:`keyprefix-nocedal2006numerical` 557 - ``MATLMVMDFP`` 558 - ``lmvmdfp`` 559 - SPD 560 * - Dense Davidon-Fletcher-Powell (DFP) :cite:`keyprefix-ErwayMarcia2017` 561 - ``MATLMVMDDFP`` 562 - ``lmvmddfp`` 563 - SPD 564 * - Broyden-Fletcher-Goldfarb-Shanno (BFGS) :cite:`keyprefix-nocedal2006numerical` 565 - ``MATLMVMBFGS`` 566 - ``lmvmbfgs`` 567 - SPD 568 * - Dense Broyden-Fletcher-Goldfarb-Shanno (BFGS) :cite:`keyprefix-ErwayMarcia2017` 569 - ``MATLMVMDBFGS`` 570 - ``lmvmdbfgs`` 571 - SPD 572 * - Dense Quasi-Newton 573 - ``MATLMVMDQN`` 574 - ``lmvmdqn`` 575 - SPD 576 * - Restricted Broyden Family :cite:`keyprefix-erway2017solving` 577 - ``MATLMVMSymBrdn`` 578 - ``lmvmsymbrdn`` 579 - SPD 580 * - Restricted Broyden Family (full-memory diagonal) 581 - ``MATLMVMDiagBrdn`` 582 - ``lmvmdiagbrdn`` 583 - SPD 584``` 585 586PETSc implements seven different LMVM matrices listed in the 587table above. They can be created using the 588`MatCreate()` and `MatSetType()` workflow, and share a number of 589common interface functions. We will review the most important ones 590below: 591 592- `MatLMVMAllocate(Mat B, Vec X, Vec F)` – Creates the internal data 593 structures necessary to store nonlinear updates and compute 594 forward/inverse applications. The `X` vector defines the solution 595 space while the `F` defines the function space for the history of 596 updates. 597- `MatLMVMUpdate(Mat B, Vec X, Vec F)` – Applies a nonlinear update to 598 the approximate Jacobian such that $s_k = x_k - x_{k-1}$ and 599 $y_k = f(x_k) - f(x_{k-1})$, where $k$ is the index for 600 the update. 601- `MatLMVMReset(Mat B, PetscBool destructive)` – Flushes the 602 accumulated nonlinear updates and resets the matrix to the initial 603 state. If `destructive = PETSC_TRUE`, the reset also destroys the 604 internal data structures and necessitates another allocation call 605 before the matrix can be updated and used for products and solves. 606- `MatLMVMSetJ0(Mat B, Mat J0)` – Defines the initial Jacobian to 607 apply the updates to. If no initial Jacobian is provided, the updates 608 are applied to an identity matrix. 609 610LMVM matrices can be applied to vectors in forward mode via 611`MatMult()` or `MatMultAdd()`, and in inverse mode via 612`MatSolve()`. They also support `MatCreateVecs()`, `MatDuplicate()` 613and `MatCopy()` operations. 614 615Restricted Broyden Family, DFP and BFGS methods, including their dense 616versions, additionally implement special Jacobian initialization and 617scaling options available via 618`-mat_lmvm_scale_type <none,scalar,diagonal>`. We describe these 619choices below: 620 621- `none` – Sets the initial Jacobian to be equal to the identity 622 matrix. No extra computations are required when obtaining the search 623 direction or updating the approximation. However, the number of 624 function evaluations required to converge the Newton solution is 625 typically much larger than what is required when using other 626 initializations. 627 628- `scalar` – Defines the initial Jacobian as a scalar multiple of the 629 identity matrix. The scalar value $\sigma$ is chosen by solving 630 the one dimensional optimization problem 631 632 $$ 633 \min_\sigma \|\sigma^\alpha Y - \sigma^{\alpha - 1} S\|_F^2, 634 $$ 635 636 where $S$ and $Y$ are the matrices whose columns contain 637 a subset of update vectors $s_k$ and $y_k$, and 638 $\alpha \in [0, 1]$ is defined by the user via 639 `-mat_lmvm_alpha` and has a different default value for each LMVM 640 implementation (e.g.: default $\alpha = 1$ for BFGS produces 641 the well-known $y_k^T s_k / y_k^T y_k$ scalar initialization). 642 The number of updates to be used in the $S$ and $Y$ 643 matrices is 1 by default (i.e.: the latest update only) and can be 644 changed via `-mat_lmvm_sigma_hist`. This technique is inspired by 645 Gilbert and Lemarechal {cite}`keyprefix-gilbert-lemarechal`. 646 647- `diagonal` – Uses a full-memory restricted Broyden update formula 648 to construct a diagonal matrix for the Jacobian initialization. 649 Although the full-memory formula is utilized, the actual memory 650 footprint is restricted to only the vector representing the diagonal 651 and some additional work vectors used in its construction. The 652 diagonal terms are also re-scaled with every update as suggested in 653 {cite}`keyprefix-gilbert-lemarechal`. This initialization requires 654 the most computational effort of the available choices but typically 655 results in a significant reduction in the number of function 656 evaluations taken to compute a solution. 657 658The dense implementations are numerically equivalent to DFP and BFGS, 659but they try to minimize memory transfer at the cost of storage 660{cite}`keyprefix-ErwayMarcia2017`. Generally, dense formulations of DFP 661and BFGS, `MATLMVMDDFP` and `MATLMVMDBFGS`, should be faster than 662classical recursive versions - on both CPU and GPU. It should be noted 663that `MatMult` of dense BFGS, and `MatSolve` of dense DFP requires 664Cholesky factorization, which may be numerically unstable, if a Jacobian 665option other than `none` is used. Therefore, the default 666implementation is to enable classical recursive algorithms to avoid 667the Cholesky factorization. This option can be toggled via 668`-mat_lbfgs_recursive` and `-mat_ldfp_recursive`. 669 670Dense Quasi-Newton, `MATLMVMDQN` is an implementation that uses 671`MatSolve` of `MATLMVMDBFGS` for its `MatSolve`, and uses 672`MatMult` of `MATLMVMDDFP` for its `MatMult`. It can be 673seen as a hybrid implementation to avoid both recursive implementation 674and Cholesky factorization, trading numerical accuracy for performances. 675 676Note that the user-provided initial Jacobian via `MatLMVMSetJ0()` 677overrides and disables all built-in initialization methods. 678 679(sec_matdense)= 680 681### Dense Matrices 682 683PETSc provides both sequential and parallel dense matrix formats, where 684each process stores its entries in a column-major array in the usual 685Fortran style. To create a sequential, dense PETSc matrix, `A` of 686dimensions `m` by `n`, the user should call 687 688``` 689MatCreateSeqDense(PETSC_COMM_SELF,PetscInt m,PetscInt n,PetscScalar *data,Mat *A); 690``` 691 692The variable `data` enables the user to optionally provide the 693location of the data for matrix storage (intended for Fortran users who 694wish to allocate their own storage space). Most users should merely set 695`data` to `NULL` for PETSc to control matrix memory allocation. To 696create a parallel, dense matrix, `A`, the user should call 697 698``` 699MatCreateDense(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,PetscScalar *data,Mat *A) 700``` 701 702The arguments `m`, `n`, `M`, and `N`, indicate the number of 703local rows and columns and the number of global rows and columns, 704respectively. Either the local or global parameters can be replaced with 705`PETSC_DECIDE`, so that PETSc will determine them. The matrix is 706stored with a fixed number of rows on each process, given by `m`, or 707determined by PETSc if `m` is `PETSC_DECIDE`. 708 709PETSc does not provide parallel dense direct solvers, instead 710interfacing to external packages that provide these solvers. Our focus 711is on sparse iterative solvers. 712 713(sec_matnest)= 714 715### Block Matrices 716 717Block matrices arise when coupling variables with different meaning, 718especially when solving problems with constraints (e.g. incompressible 719flow) and “multi-physics” problems. Usually the number of blocks is 720small and each block is partitioned in parallel. We illustrate for a 721$3\times 3$ system with components labeled $a,b,c$. With 722some numbering of unknowns, the matrix could be written as 723 724$$ 725\left( \begin{array}{ccc} 726 A_{aa} & A_{ab} & A_{ac} \\ 727 A_{ba} & A_{bb} & A_{bc} \\ 728 A_{ca} & A_{cb} & A_{cc} 729 \end{array} \right) . 730$$ 731 732There are two fundamentally different ways that this matrix could be 733stored, as a single assembled sparse matrix where entries from all 734blocks are merged together (“monolithic”), or as separate assembled 735matrices for each block (“nested”). These formats have different 736performance characteristics depending on the operation being performed. 737In particular, many preconditioners require a monolithic format, but 738some that are very effective for solving block systems (see 739{any}`sec_block_matrices`) are more efficient when a nested 740format is used. In order to stay flexible, we would like to be able to 741use the same code to assemble block matrices in both monolithic and 742nested formats. Additionally, for software maintainability and testing, 743especially in a multi-physics context where different groups might be 744responsible for assembling each of the blocks, it is desirable to be 745able to use exactly the same code to assemble a single block 746independently as to assemble it as part of a larger system. To do this, 747we introduce the four spaces shown in {numref}`fig_localspaces`. 748 749- The monolithic global space is the space in which the Krylov and 750 Newton solvers operate, with collective semantics across the entire 751 block system. 752- The split global space splits the blocks apart, but each split still 753 has collective semantics. 754- The split local space adds ghost points and separates the blocks. 755 Operations in this space can be performed with no parallel 756 communication. This is often the most natural, and certainly the most 757 powerful, space for matrix assembly code. 758- The monolithic local space can be thought of as adding ghost points 759 to the monolithic global space, but it is often more natural to use 760 it simply as a concatenation of split local spaces on each process. 761 It is not common to explicitly manipulate vectors or matrices in this 762 space (at least not during assembly), but it is a useful for 763 declaring which part of a matrix is being assembled. 764 765:::{figure} /images/manual/localspaces.svg 766:alt: The relationship between spaces used for coupled assembly. 767:name: fig_localspaces 768 769The relationship between spaces used for coupled assembly. 770::: 771 772The key to format-independent assembly is the function 773 774``` 775MatGetLocalSubMatrix(Mat A,IS isrow,IS iscol,Mat *submat); 776``` 777 778which provides a “view” `submat` into a matrix `A` that operates in 779the monolithic global space. The `submat` transforms from the split 780local space defined by `iscol` to the split local space defined by 781`isrow`. The index sets specify the parts of the monolithic local 782space that `submat` should operate in. If a nested matrix format is 783used, then `MatGetLocalSubMatrix()` finds the nested block and returns 784it without making any copies. In this case, `submat` is fully 785functional and has a parallel communicator. If a monolithic matrix 786format is used, then `MatGetLocalSubMatrix()` returns a proxy matrix 787on `PETSC_COMM_SELF` that does not provide values or implement 788`MatMult()`, but does implement `MatSetValuesLocal()` and, if 789`isrow,iscol` have a constant block size, 790`MatSetValuesBlockedLocal()`. Note that although `submat` may not be 791a fully functional matrix and the caller does not even know a priori 792which communicator it will reside on, it always implements the local 793assembly functions (which are not collective). The index sets 794`isrow,iscol` can be obtained using `DMCompositeGetLocalISs()` if 795`DMCOMPOSITE` is being used. `DMCOMPOSITE` can also be used to create 796matrices, in which case the `MATNEST` format can be specified using 797`-prefix_dm_mat_type nest` and `MATAIJ` can be specified using 798`-prefix_dm_mat_type aij`. See 799<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/snes/tutorials/ex28.c.html">SNES Tutorial ex28</a> 800for a simple example using this interface. 801 802(sec_matoptions)= 803 804## Basic Matrix Operations 805 806Table {any}`2.2 <fig_matrixops>` summarizes basic PETSc matrix operations. 807We briefly discuss a few of these routines in more detail below. 808 809The parallel matrix can multiply a vector with `n` local entries, 810returning a vector with `m` local entries. That is, to form the 811product 812 813``` 814MatMult(Mat A,Vec x,Vec y); 815``` 816 817the vectors `x` and `y` should be generated with 818 819``` 820VecCreateMPI(MPI_Comm comm,n,N,&x); 821VecCreateMPI(MPI_Comm comm,m,M,&y); 822``` 823 824By default, if the user lets PETSc decide the number of components to be 825stored locally (by passing in `PETSC_DECIDE` as the second argument to 826`VecCreateMPI()` or using `VecCreate()`), vectors and matrices of 827the same dimension are automatically compatible for parallel 828matrix-vector operations. 829 830Along with the matrix-vector multiplication routine, there is a version 831for the transpose of the matrix, 832 833``` 834MatMultTranspose(Mat A,Vec x,Vec y); 835``` 836 837There are also versions that add the result to another vector: 838 839``` 840MatMultAdd(Mat A,Vec x,Vec y,Vec w); 841MatMultTransposeAdd(Mat A,Vec x,Vec y,Vec w); 842``` 843 844These routines, respectively, produce $w = A*x + y$ and 845$w = A^{T}*x + y$ . In C it is legal for the vectors `y` and 846`w` to be identical. In Fortran, this situation is forbidden by the 847language standard, but we allow it anyway. 848 849One can print a matrix (sequential or parallel) to the screen with the 850command 851 852``` 853MatView(Mat mat,PETSC_VIEWER_STDOUT_WORLD); 854``` 855 856Other viewers can be used as well. For instance, one can draw the 857nonzero structure of the matrix into the default X-window with the 858command 859 860``` 861MatView(Mat mat,PETSC_VIEWER_DRAW_WORLD); 862``` 863 864Also one can use 865 866``` 867MatView(Mat mat,PetscViewer viewer); 868``` 869 870where `viewer` was obtained with `PetscViewerDrawOpen()`. Additional 871viewers and options are given in the `MatView()` man page and 872{any}`sec_viewers`. 873 874```{eval-rst} 875.. list-table:: PETSc Matrix Operations 876 :name: fig_matrixops 877 :header-rows: 1 878 879 * - Function Name 880 - Operation 881 * - ``MatAXPY(Mat Y, PetscScalar a, Mat X, MatStructure s);`` 882 - :math:`Y = Y + a*X` 883 * - ``MatAYPX(Mat Y, PetscScalar a, Mat X, MatStructure s);`` 884 - :math:`Y = a*Y + X` 885 * - ``MatMult(Mat A,Vec x, Vec y);`` 886 - :math:`y = A*x` 887 * - ``MatMultAdd(Mat A,Vec x, Vec y,Vec z);`` 888 - :math:`z = y + A*x` 889 * - ``MatMultTranspose(Mat A,Vec x, Vec y);`` 890 - :math:`y = A^{T}*x` 891 * - ``MatMultTransposeAdd(Mat A, Vec x, Vec y, Vec z);`` 892 - :math:`z = y + A^{T}*x` 893 * - ``MatNorm(Mat A,NormType type, PetscReal *r);`` 894 - :math:`r = A_{type}` 895 * - ``MatDiagonalScale(Mat A,Vec l,Vec r);`` 896 - :math:`A = \text{diag}(l)*A*\text{diag}(r)` 897 * - ``MatScale(Mat A,PetscScalar a);`` 898 - :math:`A = a*A` 899 * - ``MatConvert(Mat A, MatType type, Mat *B);`` 900 - :math:`B = A` 901 * - ``MatCopy(Mat A, Mat B, MatStructure s);`` 902 - :math:`B = A` 903 * - ``MatGetDiagonal(Mat A, Vec x);`` 904 - :math:`x = \text{diag}(A)` 905 * - ``MatTranspose(Mat A, MatReuse, Mat* B);`` 906 - :math:`B = A^{T}` 907 * - ``MatZeroEntries(Mat A);`` 908 - :math:`A = 0` 909 * - ``MatShift(Mat Y, PetscScalar a);`` 910 - :math:`Y = Y + a*I` 911``` 912 913```{eval-rst} 914.. list-table:: Values of MatStructure 915 :name: fig_matstructure 916 :header-rows: 1 917 918 * - Name 919 - Meaning 920 * - ``SAME_NONZERO_PATTERN`` 921 - the matrices have an identical nonzero pattern 922 * - ``DIFFERENT_NONZERO_PATTERN`` 923 - the matrices may have a different nonzero pattern 924 * - ``SUBSET_NONZERO_PATTERN`` 925 - the second matrix has a subset of the nonzeros in the first matrix 926 * - ``UNKNOWN_NONZERO_PATTERN`` 927 - there is nothing known about the relation between the nonzero patterns of the two matrices 928``` 929 930The `NormType` argument to `MatNorm()` is one of `NORM_1`, 931`NORM_INFINITY`, and `NORM_FROBENIUS`. 932 933(sec_matrixfree)= 934 935## Application Specific Custom Matrices 936 937Some people like to use matrix-free methods, which do 938not require explicit storage of the matrix, for the numerical solution 939of partial differential equations. 940Similarly, users may already have a custom matrix data structure and routines 941for that data structure and would like to wrap their code up into a `Mat`; 942that is, provide their own custom matrix type. 943 944To use the PETSc provided matrix-free matrix that uses finite differencing to approximate the matrix-vector product 945use `MatCreateMFFD()`, see {any}`sec_nlmatrixfree`. 946To provide your own matrix operations (such as `MatMult()`) 947use the following command to create a `Mat` structure 948without ever actually generating the matrix: 949 950``` 951MatCreateShell(MPI_Comm comm, PetscInt m, PetscInt n, PetscInt M, PetscInt N, PetscCtx ctx, Mat *mat); 952``` 953 954Here `M` and `N` are the global matrix dimensions (rows and 955columns), `m` and `n` are the local matrix dimensions, and `ctx` 956is a pointer to data needed by any user-defined shell matrix operations; 957the manual page has additional details about these parameters. Most 958matrix-free algorithms require only the application of the linear 959operator to a vector. To provide this action, the user must write a 960routine with the calling sequence 961 962``` 963UserMult(Mat mat,Vec x,Vec y); 964``` 965 966and then associate it with the matrix, `mat`, by using the command 967 968``` 969MatShellSetOperation(Mat mat,MatOperation MATOP_MULT, (void(*)(void)) PetscErrorCode (*UserMult)(Mat,Vec,Vec)); 970``` 971 972Here `MATOP_MULT` is the name of the operation for matrix-vector 973multiplication. Within each user-defined routine (such as 974`UserMult()`), the user should call `MatShellGetContext()` to obtain 975the user-defined context, `ctx`, that was set by `MatCreateShell()`. 976This shell matrix can be used with the iterative linear equation solvers 977discussed in the following chapters. 978 979The routine `MatShellSetOperation()` can be used to set any other 980matrix operations as well. The file 981`$PETSC_DIR/include/petscmat.h` (<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/include/petscmat.h.html">source</a>) 982provides a complete list of matrix operations, which have the form 983`MATOP_<OPERATION>`, where `<OPERATION>` is the name (in all capital 984letters) of the user interface routine (for example, `MatMult()` 985$\to$ `MATOP_MULT`). All user-provided functions have the same 986calling sequence as the usual matrix interface routines, since the 987user-defined functions are intended to be accessed through the same 988interface, e.g., `MatMult(Mat,Vec,Vec)` $\to$ 989`UserMult(Mat,Vec,Vec)`. The final argument for 990`MatShellSetOperation()` needs to be cast to a `void *`, since the 991final argument could (depending on the `MatOperation`) be a variety of 992different functions. 993 994Note that `MatShellSetOperation()` can also be used as a “backdoor” 995means of introducing user-defined changes in matrix operations for other 996storage formats (for example, to override the default LU factorization 997routine supplied within PETSc for the `MATSEQAIJ` format). However, we 998urge anyone who introduces such changes to use caution, since it would 999be very easy to accidentally create a bug in the new routine that could 1000affect other routines as well. 1001 1002See also {any}`sec_nlmatrixfree` for details on one set of 1003helpful utilities for using the matrix-free approach for nonlinear 1004solvers. 1005 1006(sec_mattranspose)= 1007 1008## Transposes of Matrices 1009 1010PETSc provides several ways to work with transposes of matrix. 1011 1012``` 1013MatTranspose(Mat A,MatReuse MAT_INITIAL_MATRIX or MAT_INPLACE_MATRIX or MAT_REUSE_MATRIX,Mat *B) 1014``` 1015 1016will either do an in-place or out-of-place matrix explicit formation of the matrix transpose. After it has been called 1017with `MAT_INPLACE_MATRIX` it may be called again with `MAT_REUSE_MATRIX` and it will recompute the transpose if the A 1018matrix has changed. Internally it keeps track of whether the nonzero pattern of A has not changed so 1019will reuse the symbolic transpose when possible for efficiency. 1020 1021``` 1022MatTransposeSymbolic(Mat A,Mat *B) 1023``` 1024 1025only does the symbolic transpose on the matrix. After it is called `MatTranspose()` may be called with 1026`MAT_REUSE_MATRIX` to compute the numerical transpose. 1027 1028Occasionally one may already have a B matrix with the needed sparsity pattern to store the transpose and wants to reuse that 1029space instead of creating a new matrix by calling `MatTranspose`(A,\`\`MAT_INITIAL_MATRIX\`\`,&B) but they cannot just call 1030`MatTranspose`(A,\`\`MAT_REUSE_MATRIX\`\`,&B) so instead they can call `MatTransposeSetPrecusor`(A,B) and then call 1031`MatTranspose`(A,\`\`MAT_REUSE_MATRIX\`\`,&B). This routine just provides to B the meta-data it needs to compute the numerical 1032factorization efficiently. 1033 1034The routine `MatCreateTranspose`(A,&B) provides a surrogate matrix B that behaviors like the transpose of A without forming 1035the transpose explicitly. For example, `MatMult`(B,x,y) will compute the matrix-vector product of A transpose times x. 1036 1037(sec_othermat)= 1038 1039## Other Matrix Operations 1040 1041In many iterative calculations (for instance, in a nonlinear equations 1042solver), it is important for efficiency purposes to reuse the nonzero 1043structure of a matrix, rather than determining it anew every time the 1044matrix is generated. To retain a given matrix but reinitialize its 1045contents, one can employ 1046 1047``` 1048MatZeroEntries(Mat A); 1049``` 1050 1051This routine will zero the matrix entries in the data structure but keep 1052all the data that indicates where the nonzeros are located. In this way 1053a new matrix assembly will be much less expensive, since no memory 1054allocations or copies will be needed. Of course, one can also explicitly 1055set selected matrix elements to zero by calling `MatSetValues()`. 1056 1057By default, if new entries are made in locations where no nonzeros 1058previously existed, space will be allocated for the new entries. To 1059prevent the allocation of additional memory and simply discard those new 1060entries, one can use the option 1061 1062``` 1063MatSetOption(Mat A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); 1064``` 1065 1066Once the matrix has been assembled, one can factor it numerically 1067without repeating the ordering or the symbolic factorization. This 1068option can save some computational time, although it does require that 1069the factorization is not done in-place. 1070 1071In the numerical solution of elliptic partial differential equations, it 1072can be cumbersome to deal with Dirichlet boundary conditions. In 1073particular, one would like to assemble the matrix without regard to 1074boundary conditions and then at the end apply the Dirichlet boundary 1075conditions. In numerical analysis classes this process is usually 1076presented as moving the known boundary conditions to the right-hand side 1077and then solving a smaller linear system for the interior unknowns. 1078Unfortunately, implementing this requires extracting a large submatrix 1079from the original matrix and creating its corresponding data structures. 1080This process can be expensive in terms of both time and memory. 1081 1082One simple way to deal with this difficulty is to replace those rows in 1083the matrix associated with known boundary conditions, by rows of the 1084identity matrix (or some scaling of it). This action can be done with 1085the command 1086 1087``` 1088MatZeroRows(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag_value,Vec x,Vec b), 1089``` 1090 1091or equivalently, 1092 1093``` 1094MatZeroRowsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b); 1095``` 1096 1097For sparse matrices this removes the data structures for certain rows of 1098the matrix. If the pointer `diag_value` is `NULL`, it even removes 1099the diagonal entry. If the pointer is not null, it uses that given value 1100at the pointer location in the diagonal entry of the eliminated rows. 1101 1102One nice feature of this approach is that when solving a nonlinear 1103problem such that at each iteration the Dirichlet boundary conditions 1104are in the same positions and the matrix retains the same nonzero 1105structure, the user can call `MatZeroRows()` in the first iteration. 1106Then, before generating the matrix in the second iteration the user 1107should call 1108 1109``` 1110MatSetOption(Mat A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); 1111``` 1112 1113From that point, no new values will be inserted into those (boundary) 1114rows of the matrix. 1115 1116The functions `MatZeroRowsLocal()` and `MatZeroRowsLocalIS()` can 1117also be used if for each process one provides the Dirichlet locations in 1118the local numbering of the matrix. A drawback of `MatZeroRows()` is 1119that it destroys the symmetry of a matrix. Thus one can use 1120 1121``` 1122MatZeroRowsColumns(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag_value,Vec x,Vec b), 1123``` 1124 1125or equivalently, 1126 1127``` 1128MatZeroRowsColumnsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b); 1129``` 1130 1131Note that with all of these for a given assembled matrix it can be only 1132called once to update the x and b vector. It cannot be used if one 1133wishes to solve multiple right-hand side problems for the same matrix 1134since the matrix entries needed for updating the b vector are removed in 1135its first use. 1136 1137Once the zeroed rows are removed the new matrix has possibly many rows 1138with only a diagonal entry affecting the parallel load balancing. The 1139`PCREDISTRIBUTE` preconditioner removes all the zeroed rows (and 1140associated columns and adjusts the right-hand side based on the removed 1141columns) and then rebalances the resulting rows of smaller matrix across 1142the processes. Thus one can use `MatZeroRows()` to set the Dirichlet 1143points and then solve with the preconditioner `PCREDISTRIBUTE`. Note 1144if the original matrix was symmetric the smaller solved matrix will also 1145be symmetric. 1146 1147Another matrix routine of interest is 1148 1149``` 1150MatConvert(Mat mat,MatType newtype,Mat *M) 1151``` 1152 1153which converts the matrix `mat` to new matrix, `M`, that has either 1154the same or different format. Set `newtype` to `MATSAME` to copy the 1155matrix, keeping the same matrix format. See 1156`$PETSC_DIR/include/petscmat.h` (<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/include/petscmat.h.html">source</a>) 1157for other available matrix types; standard ones are `MATSEQDENSE`, 1158`MATSEQAIJ`, `MATMPIAIJ`, `MATSEQBAIJ` and `MATMPIBAIJ`. 1159 1160In certain applications it may be necessary for application codes to 1161directly access elements of a matrix. This may be done by using the the 1162command (for local rows only) 1163 1164``` 1165MatGetRow(Mat A,PetscInt row, PetscInt *ncols,const PetscInt (*cols)[],const PetscScalar (*vals)[]); 1166``` 1167 1168The argument `ncols` returns the number of nonzeros in that row, while 1169`cols` and `vals` returns the column indices (with indices starting 1170at zero) and values in the row. If only the column indices are needed 1171(and not the corresponding matrix elements), one can use `NULL` for 1172the `vals` argument. Similarly, one can use `NULL` for the `cols` 1173argument. The user can only examine the values extracted with 1174`MatGetRow()`; the values *cannot* be altered. To change the matrix 1175entries, one must use `MatSetValues()`. 1176 1177Once the user has finished using a row, he or she *must* call 1178 1179``` 1180MatRestoreRow(Mat A,PetscInt row,PetscInt *ncols,PetscInt **cols,PetscScalar **vals); 1181``` 1182 1183to free any space that was allocated during the call to `MatGetRow()`. 1184 1185(sec_symbolic_numeric)= 1186 1187## Symbolic and Numeric Stages in Sparse Matrix Operations 1188 1189Many sparse matrix operations can be optimized by dividing the computation into two stages: a symbolic stage that 1190creates any required data structures and does all the computations that do not require the matrices' numerical values followed by one or more uses of a 1191numerical stage that use the symbolically computed information. Examples of such operations include `MatTranspose()`, `MatCreateSubMatrices()`, 1192`MatCholeskyFactorSymbolic()`, and `MatCholeskyFactorNumeric()`. 1193PETSc uses two different API's to take advantage of these optimizations. 1194 1195The first approach explicitly divides the computation in the API. This approach is used, for example, with `MatCholeskyFactorSymbolic()`, `MatCholeskyFactorNumeric()`. 1196The caller can take advantage of their knowledge of changes in the nonzero structure of the sparse matrices to call the appropriate routines as needed. In fact, they can 1197use `MatGetNonzeroState()` to determine if a new symbolic computation is needed. The drawback of this approach is that the caller of these routines has to 1198manage the creation of new matrices when the nonzero structure changes. 1199 1200The second approach, as exemplified by `MatTranspose()`, does not expose the two stages explicit in the API, instead a flag, `MatReuse` is passed through the 1201API to indicate if a symbolic data structure is already available or needs to be computed. Thus `MatTranspose(A,MAT_INITIAL_MATRIX,&B)` is called first, then 1202`MatTranspose(A,MAT_REUSE_MATRIX,&B)` can be called repeatedly with new numerical values in the A matrix. In theory, if the nonzero structure of A changes, the 1203symbolic computations for B could be redone automatically inside the same B matrix when there is a change in the nonzero state of the A matrix. In practice, in PETSc, the 1204`MAT_REUSE_MATRIX` for most PETSc routines only works if the nonzero structure does not change and the code may crash otherwise. The advantage of this approach 1205(when the nonzero structure changes are handled correctly) is that the calling code does not need to keep track of the nonzero state of the matrices; everything 1206"just works". However, the caller must still know when it is the first call to the routine so the flag `MAT_INITIAL_MATRIX` is being used. If the underlying implementation language supported detecting a yet to be initialized variable at run time, the `MatReuse` flag would not be need. 1207 1208PETSc uses two approaches because the same programming problem was solved with two different ways during PETSc's early development. 1209A better model would combine both approaches; an explicit 1210separation of the stages and a unified operation that internally utilized the two stages appropriately and also handled changes to the nonzero structure. Code could be simplified in many places with this approach, in most places the use of the unified API would replace the use of the separate stages. 1211 1212See {any}`sec_matsub` and {any}`sec_matmatproduct`. 1213 1214(sec_graph)= 1215 1216## Graph Operations 1217 1218PETSc has four families of graph operations that treat sparse `Mat` as representing graphs. 1219 1220```{eval-rst} 1221.. list-table:: 1222 :widths: auto 1223 :align: center 1224 :header-rows: 1 1225 1226 * - Operation 1227 - Type 1228 - Available methods 1229 - User guide 1230 * - Ordering to reduce fill 1231 - N/A 1232 - ``MatOrderingType`` 1233 - :any:`sec_matfactor` 1234 * - Partitioning for parallelism 1235 - ``MatPartitioning`` 1236 - ``MatPartitioningType`` 1237 - :any:`sec_partitioning` 1238 * - Coloring for parallelism or Jacobians 1239 - ``MatColoring`` 1240 - ``MatColoringType`` 1241 - :any:`sec_fdmatrix` 1242 * - Coarsening for multigrid 1243 - ``MatCoarsen`` 1244 - ``MatCoarsenType`` 1245 - :any:`sec_amg` 1246``` 1247 1248(sec_partitioning)= 1249 1250## Partitioning 1251 1252For almost all unstructured grid computation, the distribution of 1253portions of the grid across the process’s work load and memory can have 1254a very large impact on performance. In most PDE calculations the grid 1255partitioning and distribution across the processes can (and should) be 1256done in a “pre-processing” step before the numerical computations. 1257However, this does not mean it need be done in a separate, sequential 1258program; rather, it should be done before one sets up the parallel grid 1259data structures in the actual program. PETSc provides an interface to 1260the ParMETIS (developed by George Karypis; see 1261[the PETSc installation instructions](https://petsc.org/release/install/) 1262for directions on installing PETSc to use ParMETIS) to allow the 1263partitioning to be done in parallel. PETSc does not currently provide 1264directly support for dynamic repartitioning, load balancing by migrating 1265matrix entries between processes, etc. For problems that require mesh 1266refinement, PETSc uses the “rebuild the data structure” approach, as 1267opposed to the “maintain dynamic data structures that support the 1268insertion/deletion of additional vector and matrix rows and columns 1269entries” approach. 1270 1271Partitioning in PETSc is organized around the `MatPartitioning` 1272object. One first creates a parallel matrix that contains the 1273connectivity information about the grid (or other graph-type object) 1274that is to be partitioned. This is done with the command 1275 1276``` 1277MatCreateMPIAdj(MPI_Comm comm,int mlocal,PetscInt n,const PetscInt ia[],const PetscInt ja[],PetscInt *weights,Mat *Adj); 1278``` 1279 1280The argument `mlocal` indicates the number of rows of the graph being 1281provided by the given process, `n` is the total number of columns; 1282equal to the sum of all the `mlocal`. The arguments `ia` and `ja` 1283are the row pointers and column pointers for the given rows; these are 1284the usual format for parallel compressed sparse row storage, using 1285indices starting at 0, *not* 1. 1286 1287:::{figure} /images/manual/usg.* 1288:alt: Numbering on Simple Unstructured Grid 1289:name: fig_usg 1290 1291Numbering on Simple Unstructured Grid 1292::: 1293 1294This, of course, assumes that one has already distributed the grid 1295(graph) information among the processes. The details of this initial 1296distribution is not important; it could be simply determined by 1297assigning to the first process the first $n_0$ nodes from a file, 1298the second process the next $n_1$ nodes, etc. 1299 1300For example, we demonstrate the form of the `ia` and `ja` for a 1301triangular grid where we 1302 13031. partition by element (triangle) 1304 1305- Process 0: `mlocal = 2`, `n = 4`, `ja =``{2,3, 3}`, 1306 `ia =` `{0,2,3}` 1307- Process 1: `mlocal = 2`, `n = 4`, `ja =``{0, 0,1}`, 1308 `ia =``{0,1,3}` 1309 1310Note that elements are not connected to themselves and we only indicate 1311edge connections (in some contexts single vertex connections between 1312elements may also be included). We use a space above to denote the 1313transition between rows in the matrix; and 1314 13152. partition by vertex. 1316 1317- Process 0: `mlocal = 3`, `n = 6`, 1318 `ja =``{3,4, 4,5, 3,4,5}`, `ia =``{0, 2, 4, 7}` 1319- Process 1: `mlocal = 3`, `n = 6`, 1320 `ja =``{0,2, 4, 0,1,2,3,5, 1,2,4}`, 1321 `ia =``{0, 3, 8, 11}`. 1322 1323Once the connectivity matrix has been created the following code will 1324generate the renumbering required for the new partition 1325 1326``` 1327MatPartitioningCreate(MPI_Comm comm,MatPartitioning *part); 1328MatPartitioningSetAdjacency(MatPartitioning part,Mat Adj); 1329MatPartitioningSetFromOptions(MatPartitioning part); 1330MatPartitioningApply(MatPartitioning part,IS *is); 1331MatPartitioningDestroy(MatPartitioning *part); 1332MatDestroy(Mat *Adj); 1333ISPartitioningToNumbering(IS is,IS *isg); 1334``` 1335 1336The resulting `isg` contains for each local node the new global number 1337of that node. The resulting `is` contains the new process number that 1338each local node has been assigned to. 1339 1340Now that a new numbering of the nodes has been determined, one must 1341renumber all the nodes and migrate the grid information to the correct 1342process. The command 1343 1344``` 1345AOCreateBasicIS(isg,NULL,&ao); 1346``` 1347 1348generates, see {any}`sec_ao`, an `AO` object that can be 1349used in conjunction with the `is` and `isg` to move the relevant 1350grid information to the correct process and renumber the nodes etc. In 1351this context, the new ordering is the “application” ordering so 1352`AOPetscToApplication()` converts old global indices to new global 1353indices and `AOApplicationToPetsc()` converts new global indices back 1354to old global indices. 1355 1356PETSc does not currently provide tools that completely manage the 1357migration and node renumbering, since it will be dependent on the 1358particular data structure you use to store the grid information and the 1359type of grid information that you need for your application. We do plan 1360to include more support for this in the future, but designing the 1361appropriate general user interface and providing a scalable 1362implementation that can be used for a wide variety of different grids 1363requires a great deal of time. 1364 1365See {any}`sec_fdmatrix` and {any}`sec_matfactor` for discussions on performing graph coloring and computing graph reorderings to 1366reduce fill in sparse matrix factorizations. 1367 1368```{eval-rst} 1369.. bibliography:: /petsc.bib 1370 :filter: docname in docnames 1371 :keyprefix: keyprefix- 1372 :labelprefix: ref- 1373``` 1374