xref: /petsc/doc/manual/mat.md (revision 4e8208cbcbc709572b8abe32f33c78b69c819375)
1(ch_matrices)=
2
3# Matrices
4
5PETSc provides a variety of matrix implementations because no single
6matrix format is appropriate for all problems. Currently, we support
7dense storage and compressed sparse row storage (both sequential and
8parallel versions) for CPU and GPU based matrices, as well as several specialized formats. Additional
9specialized formats can be easily added.
10
11This chapter describes the basics of using PETSc matrices in general
12(regardless of the particular format chosen) and discusses tips for
13efficient use of the several simple uniprocess and parallel matrix
14types. The use of PETSc matrices involves the following actions: create
15a particular type of matrix, insert values into it, process the matrix,
16use the matrix for various computations, and finally destroy the matrix.
17The application code does not need to know or care about the particular
18storage formats of the matrices.
19
20(sec_matcreate)=
21
22## Creating matrices
23
24As with vectors, PETSc has APIs that allow the user to specify the exact details of the matrix
25creation process but also `DM` based creation routines that handle most of the details automatically
26for specific families of applications. This is done with
27
28```
29DMCreateMatrix(DM dm,Mat *A)
30```
31
32The type of matrix created can be controlled with either
33
34```
35DMSetMatType(DM dm,MatType <MATAIJ or MATBAIJ or MATAIJCUSPARSE etc>)
36```
37
38or with
39
40```
41DMSetFromOptions(DM dm)
42```
43
44and the options database option `-dm_mat_type <aij or baij or aijcusparse etc>` Matrices can be created for CPU usage, for GPU usage and for usage on
45both the CPUs and GPUs.
46
47The creation of `DM` objects is discussed in {any}`sec_struct`, {any}`sec_unstruct`, {any}`sec_network`.
48
49## Low-level matrix creation routines
50
51When using a `DM` is not practical for a particular application one can create matrices directly
52using
53
54```
55MatCreate(MPI_Comm comm,Mat *A)
56MatSetSizes(Mat A,PetscInt m,PetscInt n,PetscInt M,PetscInt N)
57```
58
59This routine generates a sequential matrix when running one process and
60a parallel matrix for two or more processes; the particular matrix
61format is set by the user via options database commands. The user
62specifies either the global matrix dimensions, given by `M` and `N`
63or the local dimensions, given by `m` and `n` while PETSc completely
64controls memory allocation. This routine facilitates switching among
65various matrix types, for example, to determine the format that is most
66efficient for a certain application. By default, `MatCreate()` employs
67the sparse AIJ format, which is discussed in detail in
68{any}`sec_matsparse`. See the manual pages for further
69information about available matrix formats.
70
71## Assembling (putting values into) matrices
72
73To insert or add entries to a matrix on CPUs, one can call a variant of
74`MatSetValues()`, either
75
76```
77MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt idxn[],const PetscScalar values[],INSERT_VALUES);
78```
79
80or
81
82```
83MatSetValues(Mat A,PetscInt m,const PetscInt idxm[],PetscInt n,const PetscInt idxn[],const PetscScalar values[],ADD_VALUES);
84```
85
86This routine inserts or adds a logically dense subblock of dimension
87`m*n` into the matrix. The integer indices `idxm` and `idxn`,
88respectively, indicate the global row and column numbers to be inserted.
89`MatSetValues()` uses the standard C convention, where the row and
90column matrix indices begin with zero *regardless of the programming language
91employed*. The array `values` is logically two-dimensional, containing
92the values that are to be inserted. By default the values are given in
93row major order, which is the opposite of the Fortran convention,
94meaning that the value to be put in row `idxm[i]` and column
95`idxn[j]` is located in `values[i*n+j]`. To allow the insertion of
96values in column major order, one can call the command
97
98```
99MatSetOption(Mat A,MAT_ROW_ORIENTED,PETSC_FALSE);
100```
101
102*Warning*: Several of the sparse implementations do *not* currently
103support the column-oriented option.
104
105This notation should not be a mystery to anyone. For example, to insert
106one matrix into another when using MATLAB, one uses the command
107`A(im,in) = B;` where `im` and `in` contain the indices for the
108rows and columns. This action is identical to the calls above to
109`MatSetValues()`.
110
111When using the block compressed sparse row matrix format (`MATSEQBAIJ`
112or `MATMPIBAIJ`), one can insert elements more efficiently using the
113block variant, `MatSetValuesBlocked()` or
114`MatSetValuesBlockedLocal()`.
115
116The function `MatSetOption()` accepts several other inputs; see the
117manual page for details.
118
119After the matrix elements have been inserted or added into the matrix,
120they must be processed (also called “assembled”) before they can be
121used. The routines for matrix processing are
122
123```
124MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY);
125MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);
126```
127
128By placing other code between these two calls, the user can perform
129computations while messages are in transit. Calls to `MatSetValues()`
130with the `INSERT_VALUES` and `ADD_VALUES` options *cannot* be mixed
131without intervening calls to the assembly routines. For such
132intermediate assembly calls the second routine argument typically should
133be `MAT_FLUSH_ASSEMBLY`, which omits some of the work of the full
134assembly process. `MAT_FINAL_ASSEMBLY` is required only in the last
135matrix assembly before a matrix is used.
136
137Even though one may insert values into PETSc matrices without regard to
138which process eventually stores them, for efficiency reasons we usually
139recommend generating most entries on the process where they are destined
140to be stored. To help the application programmer with this task for
141matrices that are distributed across the processes by ranges, the
142routine
143
144```
145MatGetOwnershipRange(Mat A,PetscInt *first_row,PetscInt *last_row);
146```
147
148informs the user that all rows from `first_row` to `last_row-1`
149(since the value returned in `last_row` is one more than the global
150index of the last local row) will be stored on the local process.
151
152If the `Mat` was obtained from a `DM` with `DMCreateMatrix()`, then the range values are determined by the specific `DM`.
153If the `Mat` was created directly, the range values are determined by the local sizes passed to `MatSetSizes()` or `MatCreateAIJ()` (and such low-level functions for other `MatType`).
154If `PETSC_DECIDE` was passed as the local size, then the vector uses default values for the range using `PetscSplitOwnership()`.
155For certain `DM`, such as `DMDA`, it is better to use `DM` specific routines, such as `DMDAGetGhostCorners()`, to determine
156the local values in the matrix. See {any}`sec_matlayout` for full details on row and column layouts.
157
158In the sparse matrix implementations, once the assembly routines have
159been called, the matrices are compressed and can be used for
160matrix-vector multiplication, etc. Any space for preallocated nonzeros
161that was not filled by a call to `MatSetValues()` or a related routine
162is compressed out by assembling with `MAT_FINAL_ASSEMBLY`. If you
163intend to use that extra space later, be sure to insert explicit zeros
164before assembling with `MAT_FINAL_ASSEMBLY` so the space will not be
165compressed out. Once the matrix has been assembled, inserting new values
166will be expensive since it will require copies and possible memory
167allocation.
168
169One may repeatedly assemble matrices that retain the same
170nonzero pattern (such as within a nonlinear or time-dependent problem).
171Where possible, data structures and communication
172information will be reused (instead of regenerated) during successive
173steps, thereby increasing efficiency. See
174<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex5.c.html">KSP Tutorial ex5</a>
175for a simple example of solving two linear systems that use the same
176matrix data structure.
177
178For matrices associated with `DMDA` there is a higher-level interface for providing
179the numerical values based on the concept of stencils. See the manual page of `MatSetValuesStencil()` for usage.
180
181For GPUs the routines `MatSetPreallocationCOO()` and `MatSetValuesCOO()` should be used for efficient matrix assembly
182instead of `MatSetValues()`.
183
184We now introduce the various families of PETSc matrices. `DMCreateMatrix()` manages
185the preallocation process (introduced below) automatically so many users do not need to
186worry about the details of the preallocation process.
187
188(sec_matlayout)=
189
190### Matrix and Vector Layouts and Storage Locations
191
192The layout of PETSc matrices across MPI ranks is defined by two things
193
194- the layout of the two compatible vectors in the computation of the matrix-vector product y = A * x and
195- the memory where various parts of the matrix are stored across the MPI ranks.
196
197PETSc vectors always have a contiguous range of vector entries stored on each MPI rank. The first rank has entries from 0 to `rend1` - 1, the
198next rank has entries from `rend1` to `rend2` - 1, etc. Thus the ownership range on each rank is from `rstart` to `rend`, these values can be
199obtained with `VecGetOwnershipRange`(`Vec` x, `PetscInt` * `rstart`, `PetscInt` * `rend`). Each PETSc `Vec` has a `PetscLayout` object that contains this information.
200
201All PETSc matrices have two `PetscLayout`s, they define the vector layouts for y and x in the product, y = A * x. Their ownership range information
202can be obtained with `MatGetOwnershipRange()`, `MatGetOwnershipRangeColumn()`, `MatGetOwnershipRanges()`, and `MatGetOwnershipRangesColumn()`.
203Note that `MatCreateVecs()` provides two vectors that have compatible layouts for the associated vector.
204
205For most PETSc matrices, excluding `MATELEMENTAL` and `MATSCALAPACK`, the row ownership range obtained with `MatGetOwnershipRange()` also defines
206where the matrix entries are stored; the matrix entries for rows `rstart` to `rend - 1` are stored on the corresponding MPI rank. For other matrices
207the rank where each matrix entry is stored is more complicated; information about the storage locations can be obtained with `MatGetOwnershipIS()`.
208Note that for
209most PETSc matrices the values returned by `MatGetOwnershipIS()` are the same as those returned by `MatGetOwnershipRange()` and
210`MatGetOwnershipRangeColumn()`.
211
212The PETSc object `PetscLayout` contains the ownership information that is provided by `VecGetOwnershipRange()` and with `MatGetOwnershipRange()`, `MatGetOwnershipRangeColumn()`. Each vector has one layout, which can be obtained with `VecGetLayout()` and `MatGetLayouts()`. Layouts support the routines `PetscLayoutGetLocalSize()`, `PetscLayoutGetSize()`, `PetscLayoutGetBlockSize()`, `PetscLayoutGetRanges()`, `PetscLayoutCompare()` as well as a variety of creation routines. These are used by the `Vec` and `Mat` and so are rarely needed directly. Finally `PetscSplitOwnership()` is a utility routine that does the same splitting of ownership ranges as `PetscLayout`.
213
214(sec_matsparse)=
215
216### Sparse Matrices
217
218The default matrix representation within PETSc is the general sparse AIJ
219format (also called the compressed sparse
220row format, CSR). This section discusses tips for *efficiently* using
221this matrix format for large-scale applications. Additional formats
222(such as block compressed row and block symmetric storage, which are
223generally much more efficient for problems with multiple degrees of
224freedom per node) are discussed below. Beginning users need not concern
225themselves initially with such details and may wish to proceed directly
226to {any}`sec_matoptions`. However, when an application code
227progresses to the point of tuning for efficiency and/or generating
228timing results, it is *crucial* to read this information.
229
230#### Sequential AIJ Sparse Matrices
231
232In the PETSc AIJ matrix formats, we store the nonzero elements by rows,
233along with an array of corresponding column numbers and an array of
234pointers to the beginning of each row. Note that the diagonal matrix
235entries are stored with the rest of the nonzeros (not separately).
236
237To create a sequential AIJ sparse matrix, `A`, with `m` rows and
238`n` columns, one uses the command
239
240```
241MatCreateSeqAIJ(PETSC_COMM_SELF,PetscInt m,PetscInt n,PetscInt nz,PetscInt *nnz,Mat *A);
242```
243
244where `nz` or `nnz` can be used to preallocate matrix memory, as
245discussed below. The user can set `nz=0` and `nnz=NULL` for PETSc to
246control all matrix memory allocation.
247
248The sequential and parallel AIJ matrix storage formats by default employ
249*i-nodes* (identical nodes) when possible. We search for consecutive
250rows with the same nonzero structure, thereby reusing matrix information
251for increased efficiency. Related options database keys are
252`-mat_no_inode` (do not use i-nodes) and `-mat_inode_limit <limit>`
253(set i-node limit (max limit=5)). Note that problems with a single degree
254of freedom per grid node will automatically not use i-nodes.
255
256The internal data representation for the AIJ formats employs zero-based
257indexing.
258
259#### Preallocation of Memory for Sequential AIJ Sparse Matrices
260
261The dynamic process of allocating new memory and copying from the old
262storage to the new is *intrinsically very expensive*. Thus, to obtain
263good performance when assembling an AIJ matrix, it is crucial to
264preallocate the memory needed for the sparse matrix. The user has two
265choices for preallocating matrix memory via `MatCreateSeqAIJ()`.
266
267One can use the scalar `nz` to specify the expected number of nonzeros
268for each row. This is generally fine if the number of nonzeros per row
269is roughly the same throughout the matrix (or as a quick and easy first
270step for preallocation). If one underestimates the actual number of
271nonzeros in a given row, then during the assembly process PETSc will
272automatically allocate additional needed space. However, this extra
273memory allocation can slow the computation.
274
275If different rows have very different numbers of nonzeros, one should
276attempt to indicate (nearly) the exact number of elements intended for
277the various rows with the optional array, `nnz` of length `m`, where
278`m` is the number of rows, for example
279
280```
281PetscInt nnz[m];
282nnz[0] = <nonzeros in row 0>
283nnz[1] = <nonzeros in row 1>
284....
285nnz[m-1] = <nonzeros in row m-1>
286```
287
288In this case, the assembly process will require no additional memory
289allocations if the `nnz` estimates are correct. If, however, the
290`nnz` estimates are incorrect, PETSc will automatically obtain the
291additional needed space, at a slight loss of efficiency.
292
293Using the array `nnz` to preallocate memory is especially important
294for efficient matrix assembly if the number of nonzeros varies
295considerably among the rows. One can generally set `nnz` either by
296knowing in advance the problem structure (e.g., the stencil for finite
297difference problems on a structured grid) or by precomputing the
298information by using a segment of code similar to that for the regular
299matrix assembly. The overhead of determining the `nnz` array will be
300quite small compared with the overhead of the inherently expensive
301`malloc`s and moves of data that are needed for dynamic allocation
302during matrix assembly. Always guess high if an exact value is not known
303(extra space is cheaper than too little).
304
305Thus, when assembling a sparse matrix with very different numbers of
306nonzeros in various rows, one could proceed as follows for finite
307difference methods:
308
3091. Allocate integer array `nnz`.
3102. Loop over grid, counting the expected number of nonzeros for the
311   row(s) associated with the various grid points.
3123. Create the sparse matrix via `MatCreateSeqAIJ()` or alternative.
3134. Loop over the grid, generating matrix entries and inserting in matrix
314   via `MatSetValues()`.
315
316For (vertex-based) finite element type calculations, an analogous
317procedure is as follows:
318
3191. Allocate integer array `nnz`.
3202. Loop over vertices, computing the number of neighbor vertices, which
321   determines the number of nonzeros for the corresponding matrix
322   row(s).
3233. Create the sparse matrix via `MatCreateSeqAIJ()` or alternative.
3244. Loop over elements, generating matrix entries and inserting in matrix
325   via `MatSetValues()`.
326
327The `-info` option causes the routines `MatAssemblyBegin()` and
328`MatAssemblyEnd()` to print information about the success of the
329preallocation. Consider the following example for the `MATSEQAIJ`
330matrix format:
331
332```
333MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:20 unneeded, 100 used
334MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 0
335```
336
337The first line indicates that the user preallocated 120 spaces but only
338100 were used. The second line indicates that the user preallocated
339enough space so that PETSc did not have to internally allocate
340additional space (an expensive operation). In the next example the user
341did not preallocate sufficient space, as indicated by the fact that the
342number of mallocs is very large (bad for efficiency):
343
344```
345MatAssemblyEnd_SeqAIJ:Matrix size 10 X 10; storage space:47 unneeded, 1000 used
346MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 40000
347```
348
349Although at first glance such procedures for determining the matrix
350structure in advance may seem unusual, they are actually very efficient
351because they alleviate the need for dynamic construction of the matrix
352data structure, which can be very expensive.
353
354#### Parallel AIJ Sparse Matrices
355
356Parallel sparse matrices with the AIJ format can be created with the
357command
358
359```
360MatCreateAIJ(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,PetscInt d_nz,PetscInt *d_nnz, PetscInt o_nz,PetscInt *o_nnz,Mat *A);
361```
362
363`A` is the newly created matrix, while the arguments `m`, `M`, and
364`N`, indicate the number of local rows and the number of global rows
365and columns, respectively. In the PETSc partitioning scheme, all the
366matrix columns are local and `n` is the number of columns
367corresponding to the local part of a parallel vector. Either the local or
368global parameters can be replaced with `PETSC_DECIDE`, so that PETSc
369will determine them. The matrix is stored with a fixed number of rows on
370each process, given by `m`, or determined by PETSc if `m` is
371`PETSC_DECIDE`.
372
373If `PETSC_DECIDE` is not used for the arguments `m` and `n`, then
374the user must ensure that they are chosen to be compatible with the
375vectors. To do this, one first considers the matrix-vector product
376$y = A x$. The `m` that is used in the matrix creation routine
377`MatCreateAIJ()` must match the local size used in the vector creation
378routine `VecCreateMPI()` for `y`. Likewise, the `n` used must
379match that used as the local size in `VecCreateMPI()` for `x`.
380
381The user must set `d_nz=0`, `o_nz=0`, `d_nnz=`NULL, and
382`o_nnz=NULL` for PETSc to control dynamic allocation of matrix memory
383space. Analogous to `nz` and `nnz` for the routine
384`MatCreateSeqAIJ()`, these arguments optionally specify nonzero
385information for the diagonal (`d_nz` and `d_nnz`) and off-diagonal
386(`o_nz` and `o_nnz`) parts of the matrix. For a square global
387matrix, we define each process’s diagonal portion to be its local rows
388and the corresponding columns (a square submatrix); each process’s
389off-diagonal portion encompasses the remainder of the local matrix (a
390rectangular submatrix). The rank in the MPI communicator determines the
391absolute ordering of the blocks. That is, the process with rank 0 in the
392communicator given to `MatCreateAIJ()` contains the top rows of the
393matrix; the i$^{th}$ process in that communicator contains the
394i$^{th}$ block of the matrix.
395
396#### Preallocation of Memory for Parallel AIJ Sparse Matrices
397
398As discussed above, preallocation of memory is critical for achieving
399good performance during matrix assembly, as this reduces the number of
400allocations and copies required. We present an example for three
401processes to indicate how this may be done for the `MATMPIAIJ` matrix
402format. Consider the 8 by 8 matrix, which is partitioned by default with
403three rows on the first process, three on the second and two on the
404third.
405
406$$
407\left( \begin{array}{cccccccccc}
4081  & 2  & 0  & | & 0  & 3  & 0  & |  & 0  & 4  \\
4090  & 5  & 6  & | & 7  & 0  & 0  & |  & 8  & 0 \\
4109  & 0  & 10 & | & 11 & 0  & 0  & |  & 12 & 0  \\
411\hline \\
41213 & 0  & 14 & | & 15 & 16 & 17 & |  & 0  & 0  \\
4130  & 18 & 0  & | & 19 & 20 & 21 & |  & 0  & 0 \\
4140  & 0  & 0  & | & 22 & 23 & 0  & |  & 24 & 0 \\
415\hline \\
41625 & 26 & 27 & | & 0  & 0  & 28 & |  & 29 & 0 \\
41730 & 0  & 0  & | & 31 & 32 & 33 & |  & 0  &34
418\end{array} \right)
419$$
420
421The “diagonal” submatrix, `d`, on the first process is given by
422
423$$
424\left( \begin{array}{ccc}
4251  & 2  & 0  \\
4260  & 5  & 6  \\
4279  & 0  & 10
428\end{array} \right),
429$$
430
431while the “off-diagonal” submatrix, `o`, matrix is given by
432
433$$
434\left( \begin{array}{ccccc}
435 0  & 3  & 0   & 0  & 4  \\
436 7  & 0  & 0   & 8  & 0  \\
437 11 & 0  & 0   & 12 & 0  \\
438\end{array} \right).
439$$
440
441For the first process one could set `d_nz` to 2 (since each row has 2
442nonzeros) or, alternatively, set `d_nnz` to $\{2,2,2\}.$ The
443`o_nz` could be set to 2 since each row of the `o` matrix has 2
444nonzeros, or `o_nnz` could be set to $\{2,2,2\}$.
445
446For the second process the `d` submatrix is given by
447
448$$
449\left( \begin{array}{cccccccccc}
450 15 & 16 & 17 \\
451 19 & 20 & 21 \\
452 22 & 23 & 0
453\end{array} \right) .
454$$
455
456Thus, one could set `d_nz` to 3, since the maximum number of nonzeros
457in each row is 3, or alternatively one could set `d_nnz` to
458$\{3,3,2\}$, thereby indicating that the first two rows will have
4593 nonzeros while the third has 2. The corresponding `o` submatrix for
460the second process is
461
462$$
463\left( \begin{array}{cccccccccc}
46413 & 0  & 14 &  0  & 0  \\
4650  & 18 & 0  &  0  & 0 \\
4660  & 0  & 0  &  24 & 0 \\
467\end{array} \right)
468$$
469
470so that one could set `o_nz` to 2 or `o_nnz` to {2,1,1}.
471
472Note that the user never directly works with the `d` and `o`
473submatrices, except when preallocating storage space as indicated above.
474Also, the user need not preallocate exactly the correct amount of space;
475as long as a sufficiently close estimate is given, the high efficiency
476for matrix assembly will remain.
477
478As described above, the option `-info` will print information about
479the success of preallocation during matrix assembly. For the
480`MATMPIAIJ` and `MATMPIBAIJ` formats, PETSc will also list the
481number of elements owned by on each process that were generated on a
482different process. For example, the statements
483
484```
485MatAssemblyBegin_MPIAIJ:Stash has 10 entries, uses 0 mallocs
486MatAssemblyBegin_MPIAIJ:Stash has 3 entries, uses 0 mallocs
487MatAssemblyBegin_MPIAIJ:Stash has 5 entries, uses 0 mallocs
488```
489
490indicate that very few values have been generated on different
491processes. On the other hand, the statements
492
493```
494MatAssemblyBegin_MPIAIJ:Stash has 100000 entries, uses 100 mallocs
495MatAssemblyBegin_MPIAIJ:Stash has 77777 entries, uses 70 mallocs
496```
497
498indicate that many values have been generated on the “wrong” processes.
499This situation can be very inefficient, since the transfer of values to
500the “correct” process is generally expensive. By using the command
501`MatGetOwnershipRange()` in application codes, the user should be able
502to generate most entries on the owning process.
503
504*Note*: It is fine to generate some entries on the “wrong” process.
505Often this can lead to cleaner, simpler, less buggy codes. One should
506never make code overly complicated in order to generate all values
507locally. Rather, one should organize the code in such a way that *most*
508values are generated locally.
509
510The routine `MatCreateAIJCUSPARSE()` allows one to create GPU based matrices for NVIDIA systems.
511`MatCreateAIJKokkos()` can create matrices for use with CPU, OpenMP, NVIDIA, AMD, or Intel based GPU systems.
512
513It is sometimes difficult to compute the required preallocation information efficiently, hence PETSc provides a
514special `MatType`, `MATPREALLOCATOR` that helps make computing this information more straightforward. One first creates a matrix of this type and then, using the same
515code that one would use to actually compute the matrices numerical values, calls `MatSetValues()` for this matrix, without needing to provide any
516preallocation information (one need not provide the matrix numerical values). Once this is complete one uses `MatPreallocatorPreallocate()` to
517provide the accumulated preallocation information to
518the actual matrix one will use for the computations. We hope to simplify this process in the future, allowing the removal of `MATPREALLOCATOR`,
519instead simply allowing the use of its efficient insertion process automatically during the first assembly of any matrix type directly without
520requiring the detailed preallocation information.
521
522See {any}`doc_matrix` for a table of the matrix types available in PETSc.
523
524(sec_matlmvm)=
525
526#### Limited-Memory Variable Metric (LMVM) Matrices
527
528Variable metric methods, also known as quasi-Newton methods, are
529frequently used for root finding problems and approximate Jacobian
530matrices or their inverses via sequential nonlinear updates based on the
531secant condition. The limited-memory variants do not store the full
532explicit Jacobian, and instead compute forward products and inverse
533applications based on a fixed number of stored update vectors.
534
535```{eval-rst}
536.. list-table:: PETSc LMVM matrix implementations.
537  :name: tab_matlmvmimpl
538  :header-rows: 1
539
540  * - Method
541    - PETSc Type
542    - Name
543    - Property
544  * - "Good" Broyden   :cite:`keyprefix-griewank2012broyden`
545    - ``MATLMVMBrdn``
546    - ``lmvmbrdn``
547    - Square
548  * - "Bad" Broyden :cite:`keyprefix-griewank2012broyden`
549    - ``MATLMVMBadBrdn``
550    - ``lmvmbadbrdn``
551    - Square
552  * - Symmetric Rank-1 :cite:`keyprefix-nocedal2006numerical`
553    - ``MATLMVMSR1``
554    - ``lmvmsr1``
555    - Symmetric
556  * - Davidon-Fletcher-Powell (DFP) :cite:`keyprefix-nocedal2006numerical`
557    - ``MATLMVMDFP``
558    - ``lmvmdfp``
559    - SPD
560  * - Dense Davidon-Fletcher-Powell (DFP) :cite:`keyprefix-ErwayMarcia2017`
561    - ``MATLMVMDDFP``
562    - ``lmvmddfp``
563    - SPD
564  * - Broyden-Fletcher-Goldfarb-Shanno (BFGS) :cite:`keyprefix-nocedal2006numerical`
565    - ``MATLMVMBFGS``
566    - ``lmvmbfgs``
567    - SPD
568  * - Dense Broyden-Fletcher-Goldfarb-Shanno (BFGS) :cite:`keyprefix-ErwayMarcia2017`
569    - ``MATLMVMDBFGS``
570    - ``lmvmdbfgs``
571    - SPD
572  * - Dense Quasi-Newton
573    - ``MATLMVMDQN``
574    - ``lmvmdqn``
575    - SPD
576  * - Restricted Broyden Family :cite:`keyprefix-erway2017solving`
577    - ``MATLMVMSymBrdn``
578    - ``lmvmsymbrdn``
579    - SPD
580  * - Restricted Broyden Family (full-memory diagonal)
581    - ``MATLMVMDiagBrdn``
582    - ``lmvmdiagbrdn``
583    - SPD
584```
585
586PETSc implements seven different LMVM matrices listed in the
587table above. They can be created using the
588`MatCreate()` and `MatSetType()` workflow, and share a number of
589common interface functions. We will review the most important ones
590below:
591
592- `MatLMVMAllocate(Mat B, Vec X, Vec F)` – Creates the internal data
593  structures necessary to store nonlinear updates and compute
594  forward/inverse applications. The `X` vector defines the solution
595  space while the `F` defines the function space for the history of
596  updates.
597- `MatLMVMUpdate(Mat B, Vec X, Vec F)` – Applies a nonlinear update to
598  the approximate Jacobian such that $s_k = x_k - x_{k-1}$ and
599  $y_k = f(x_k) - f(x_{k-1})$, where $k$ is the index for
600  the update.
601- `MatLMVMReset(Mat B, PetscBool destructive)` – Flushes the
602  accumulated nonlinear updates and resets the matrix to the initial
603  state. If `destructive = PETSC_TRUE`, the reset also destroys the
604  internal data structures and necessitates another allocation call
605  before the matrix can be updated and used for products and solves.
606- `MatLMVMSetJ0(Mat B, Mat J0)` – Defines the initial Jacobian to
607  apply the updates to. If no initial Jacobian is provided, the updates
608  are applied to an identity matrix.
609
610LMVM matrices can be applied to vectors in forward mode via
611`MatMult()` or `MatMultAdd()`, and in inverse mode via
612`MatSolve()`. They also support `MatCreateVecs()`, `MatDuplicate()`
613and `MatCopy()` operations.
614
615Restricted Broyden Family, DFP and BFGS methods, including their dense
616versions, additionally implement special Jacobian initialization and
617scaling options available via
618`-mat_lmvm_scale_type <none,scalar,diagonal>`. We describe these
619choices below:
620
621- `none` – Sets the initial Jacobian to be equal to the identity
622  matrix. No extra computations are required when obtaining the search
623  direction or updating the approximation. However, the number of
624  function evaluations required to converge the Newton solution is
625  typically much larger than what is required when using other
626  initializations.
627
628- `scalar` – Defines the initial Jacobian as a scalar multiple of the
629  identity matrix. The scalar value $\sigma$ is chosen by solving
630  the one dimensional optimization problem
631
632  $$
633  \min_\sigma \|\sigma^\alpha Y - \sigma^{\alpha - 1} S\|_F^2,
634  $$
635
636  where $S$ and $Y$ are the matrices whose columns contain
637  a subset of update vectors $s_k$ and $y_k$, and
638  $\alpha \in [0, 1]$ is defined by the user via
639  `-mat_lmvm_alpha` and has a different default value for each LMVM
640  implementation (e.g.: default $\alpha = 1$ for BFGS produces
641  the well-known $y_k^T s_k / y_k^T y_k$ scalar initialization).
642  The number of updates to be used in the $S$ and $Y$
643  matrices is 1 by default (i.e.: the latest update only) and can be
644  changed via `-mat_lmvm_sigma_hist`. This technique is inspired by
645  Gilbert and Lemarechal {cite}`keyprefix-gilbert-lemarechal`.
646
647- `diagonal` – Uses a full-memory restricted Broyden update formula
648  to construct a diagonal matrix for the Jacobian initialization.
649  Although the full-memory formula is utilized, the actual memory
650  footprint is restricted to only the vector representing the diagonal
651  and some additional work vectors used in its construction. The
652  diagonal terms are also re-scaled with every update as suggested in
653  {cite}`keyprefix-gilbert-lemarechal`. This initialization requires
654  the most computational effort of the available choices but typically
655  results in a significant reduction in the number of function
656  evaluations taken to compute a solution.
657
658The dense implementations are numerically equivalent to DFP and BFGS,
659but they try to minimize memory transfer at the cost of storage
660{cite}`keyprefix-ErwayMarcia2017`. Generally, dense formulations of DFP
661and BFGS, `MATLMVMDDFP` and `MATLMVMDBFGS`, should be faster than
662classical recursive versions - on both CPU and GPU. It should be noted
663that `MatMult` of dense BFGS, and `MatSolve` of dense DFP requires
664Cholesky factorization, which may be numerically unstable, if a Jacobian
665option other than `none` is used. Therefore, the default
666implementation is to enable classical recursive algorithms to avoid
667the Cholesky factorization. This option can be toggled via
668`-mat_lbfgs_recursive` and `-mat_ldfp_recursive`.
669
670Dense Quasi-Newton, `MATLMVMDQN` is an implementation that uses
671`MatSolve` of `MATLMVMDBFGS` for its `MatSolve`, and uses
672`MatMult` of `MATLMVMDDFP` for its `MatMult`. It can be
673seen as a hybrid implementation to avoid both recursive implementation
674and Cholesky factorization, trading numerical accuracy for performances.
675
676Note that the user-provided initial Jacobian via `MatLMVMSetJ0()`
677overrides and disables all built-in initialization methods.
678
679(sec_matdense)=
680
681### Dense Matrices
682
683PETSc provides both sequential and parallel dense matrix formats, where
684each process stores its entries in a column-major array in the usual
685Fortran style. To create a sequential, dense PETSc matrix, `A` of
686dimensions `m` by `n`, the user should call
687
688```
689MatCreateSeqDense(PETSC_COMM_SELF,PetscInt m,PetscInt n,PetscScalar *data,Mat *A);
690```
691
692The variable `data` enables the user to optionally provide the
693location of the data for matrix storage (intended for Fortran users who
694wish to allocate their own storage space). Most users should merely set
695`data` to `NULL` for PETSc to control matrix memory allocation. To
696create a parallel, dense matrix, `A`, the user should call
697
698```
699MatCreateDense(MPI_Comm comm,PetscInt m,PetscInt n,PetscInt M,PetscInt N,PetscScalar *data,Mat *A)
700```
701
702The arguments `m`, `n`, `M`, and `N`, indicate the number of
703local rows and columns and the number of global rows and columns,
704respectively. Either the local or global parameters can be replaced with
705`PETSC_DECIDE`, so that PETSc will determine them. The matrix is
706stored with a fixed number of rows on each process, given by `m`, or
707determined by PETSc if `m` is `PETSC_DECIDE`.
708
709PETSc does not provide parallel dense direct solvers, instead
710interfacing to external packages that provide these solvers. Our focus
711is on sparse iterative solvers.
712
713(sec_matnest)=
714
715### Block Matrices
716
717Block matrices arise when coupling variables with different meaning,
718especially when solving problems with constraints (e.g. incompressible
719flow) and “multi-physics” problems. Usually the number of blocks is
720small and each block is partitioned in parallel. We illustrate for a
721$3\times 3$ system with components labeled $a,b,c$. With
722some numbering of unknowns, the matrix could be written as
723
724$$
725\left( \begin{array}{ccc}
726    A_{aa} & A_{ab} & A_{ac} \\
727    A_{ba} & A_{bb} & A_{bc} \\
728    A_{ca} & A_{cb} & A_{cc}
729  \end{array} \right) .
730$$
731
732There are two fundamentally different ways that this matrix could be
733stored, as a single assembled sparse matrix where entries from all
734blocks are merged together (“monolithic”), or as separate assembled
735matrices for each block (“nested”). These formats have different
736performance characteristics depending on the operation being performed.
737In particular, many preconditioners require a monolithic format, but
738some that are very effective for solving block systems (see
739{any}`sec_block_matrices`) are more efficient when a nested
740format is used. In order to stay flexible, we would like to be able to
741use the same code to assemble block matrices in both monolithic and
742nested formats. Additionally, for software maintainability and testing,
743especially in a multi-physics context where different groups might be
744responsible for assembling each of the blocks, it is desirable to be
745able to use exactly the same code to assemble a single block
746independently as to assemble it as part of a larger system. To do this,
747we introduce the four spaces shown in {numref}`fig_localspaces`.
748
749- The monolithic global space is the space in which the Krylov and
750  Newton solvers operate, with collective semantics across the entire
751  block system.
752- The split global space splits the blocks apart, but each split still
753  has collective semantics.
754- The split local space adds ghost points and separates the blocks.
755  Operations in this space can be performed with no parallel
756  communication. This is often the most natural, and certainly the most
757  powerful, space for matrix assembly code.
758- The monolithic local space can be thought of as adding ghost points
759  to the monolithic global space, but it is often more natural to use
760  it simply as a concatenation of split local spaces on each process.
761  It is not common to explicitly manipulate vectors or matrices in this
762  space (at least not during assembly), but it is a useful for
763  declaring which part of a matrix is being assembled.
764
765:::{figure} /images/manual/localspaces.svg
766:alt: The relationship between spaces used for coupled assembly.
767:name: fig_localspaces
768
769The relationship between spaces used for coupled assembly.
770:::
771
772The key to format-independent assembly is the function
773
774```
775MatGetLocalSubMatrix(Mat A,IS isrow,IS iscol,Mat *submat);
776```
777
778which provides a “view” `submat` into a matrix `A` that operates in
779the monolithic global space. The `submat` transforms from the split
780local space defined by `iscol` to the split local space defined by
781`isrow`. The index sets specify the parts of the monolithic local
782space that `submat` should operate in. If a nested matrix format is
783used, then `MatGetLocalSubMatrix()` finds the nested block and returns
784it without making any copies. In this case, `submat` is fully
785functional and has a parallel communicator. If a monolithic matrix
786format is used, then `MatGetLocalSubMatrix()` returns a proxy matrix
787on `PETSC_COMM_SELF` that does not provide values or implement
788`MatMult()`, but does implement `MatSetValuesLocal()` and, if
789`isrow,iscol` have a constant block size,
790`MatSetValuesBlockedLocal()`. Note that although `submat` may not be
791a fully functional matrix and the caller does not even know a priori
792which communicator it will reside on, it always implements the local
793assembly functions (which are not collective). The index sets
794`isrow,iscol` can be obtained using `DMCompositeGetLocalISs()` if
795`DMCOMPOSITE` is being used. `DMCOMPOSITE` can also be used to create
796matrices, in which case the `MATNEST` format can be specified using
797`-prefix_dm_mat_type nest` and `MATAIJ` can be specified using
798`-prefix_dm_mat_type aij`. See
799<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/snes/tutorials/ex28.c.html">SNES Tutorial ex28</a>
800for a simple example using this interface.
801
802(sec_matoptions)=
803
804## Basic Matrix Operations
805
806Table {any}`2.2 <fig_matrixops>` summarizes basic PETSc matrix operations.
807We briefly discuss a few of these routines in more detail below.
808
809The parallel matrix can multiply a vector with `n` local entries,
810returning a vector with `m` local entries. That is, to form the
811product
812
813```
814MatMult(Mat A,Vec x,Vec y);
815```
816
817the vectors `x` and `y` should be generated with
818
819```
820VecCreateMPI(MPI_Comm comm,n,N,&x);
821VecCreateMPI(MPI_Comm comm,m,M,&y);
822```
823
824By default, if the user lets PETSc decide the number of components to be
825stored locally (by passing in `PETSC_DECIDE` as the second argument to
826`VecCreateMPI()` or using `VecCreate()`), vectors and matrices of
827the same dimension are automatically compatible for parallel
828matrix-vector operations.
829
830Along with the matrix-vector multiplication routine, there is a version
831for the transpose of the matrix,
832
833```
834MatMultTranspose(Mat A,Vec x,Vec y);
835```
836
837There are also versions that add the result to another vector:
838
839```
840MatMultAdd(Mat A,Vec x,Vec y,Vec w);
841MatMultTransposeAdd(Mat A,Vec x,Vec y,Vec w);
842```
843
844These routines, respectively, produce $w = A*x + y$ and
845$w = A^{T}*x + y$ . In C it is legal for the vectors `y` and
846`w` to be identical. In Fortran, this situation is forbidden by the
847language standard, but we allow it anyway.
848
849One can print a matrix (sequential or parallel) to the screen with the
850command
851
852```
853MatView(Mat mat,PETSC_VIEWER_STDOUT_WORLD);
854```
855
856Other viewers can be used as well. For instance, one can draw the
857nonzero structure of the matrix into the default X-window with the
858command
859
860```
861MatView(Mat mat,PETSC_VIEWER_DRAW_WORLD);
862```
863
864Also one can use
865
866```
867MatView(Mat mat,PetscViewer viewer);
868```
869
870where `viewer` was obtained with `PetscViewerDrawOpen()`. Additional
871viewers and options are given in the `MatView()` man page and
872{any}`sec_viewers`.
873
874```{eval-rst}
875.. list-table:: PETSc Matrix Operations
876  :name: fig_matrixops
877  :header-rows: 1
878
879  * - Function Name
880    - Operation
881  * - ``MatAXPY(Mat Y, PetscScalar a, Mat X, MatStructure s);``
882    - :math:`Y = Y + a*X`
883  * - ``MatAYPX(Mat Y, PetscScalar a, Mat X, MatStructure s);``
884    - :math:`Y = a*Y + X`
885  * - ``MatMult(Mat A,Vec x, Vec y);``
886    - :math:`y = A*x`
887  * - ``MatMultAdd(Mat A,Vec x, Vec y,Vec z);``
888    - :math:`z = y + A*x`
889  * - ``MatMultTranspose(Mat A,Vec x, Vec y);``
890    - :math:`y = A^{T}*x`
891  * - ``MatMultTransposeAdd(Mat A, Vec x, Vec y, Vec z);``
892    - :math:`z = y + A^{T}*x`
893  * - ``MatNorm(Mat A,NormType type, PetscReal *r);``
894    - :math:`r = A_{type}`
895  * - ``MatDiagonalScale(Mat A,Vec l,Vec r);``
896    - :math:`A = \text{diag}(l)*A*\text{diag}(r)`
897  * - ``MatScale(Mat A,PetscScalar a);``
898    - :math:`A = a*A`
899  * - ``MatConvert(Mat A, MatType type, Mat *B);``
900    - :math:`B = A`
901  * - ``MatCopy(Mat A, Mat B, MatStructure s);``
902    - :math:`B = A`
903  * - ``MatGetDiagonal(Mat A, Vec x);``
904    - :math:`x = \text{diag}(A)`
905  * - ``MatTranspose(Mat A, MatReuse, Mat* B);``
906    - :math:`B = A^{T}`
907  * - ``MatZeroEntries(Mat A);``
908    - :math:`A = 0`
909  * - ``MatShift(Mat Y, PetscScalar a);``
910    - :math:`Y =  Y + a*I`
911```
912
913```{eval-rst}
914.. list-table:: Values of MatStructure
915  :name: fig_matstructure
916  :header-rows: 1
917
918  * - Name
919    - Meaning
920  * - ``SAME_NONZERO_PATTERN``
921    - the matrices have an identical nonzero pattern
922  * - ``DIFFERENT_NONZERO_PATTERN``
923    - the matrices may have a different nonzero pattern
924  * - ``SUBSET_NONZERO_PATTERN``
925    - the second matrix has a subset of the nonzeros in the first matrix
926  * - ``UNKNOWN_NONZERO_PATTERN``
927    - there is nothing known about the relation between the nonzero patterns of the two matrices
928```
929
930The `NormType` argument to `MatNorm()` is one of `NORM_1`,
931`NORM_INFINITY`, and `NORM_FROBENIUS`.
932
933(sec_matrixfree)=
934
935## Application Specific Custom Matrices
936
937Some people like to use matrix-free methods, which do
938not require explicit storage of the matrix, for the numerical solution
939of partial differential equations.
940Similarly, users may already have a custom matrix data structure and routines
941for that data structure and would like to wrap their code up into a `Mat`;
942that is, provide their own custom matrix type.
943
944To use the PETSc provided matrix-free matrix that uses finite differencing to approximate the matrix-vector product
945use `MatCreateMFFD()`, see {any}`sec_nlmatrixfree`.
946To provide your own matrix operations (such as `MatMult()`)
947use the following command to create a `Mat` structure
948without ever actually generating the matrix:
949
950```
951MatCreateShell(MPI_Comm comm, PetscInt m, PetscInt n, PetscInt M, PetscInt N, PetscCtx ctx, Mat *mat);
952```
953
954Here `M` and `N` are the global matrix dimensions (rows and
955columns), `m` and `n` are the local matrix dimensions, and `ctx`
956is a pointer to data needed by any user-defined shell matrix operations;
957the manual page has additional details about these parameters. Most
958matrix-free algorithms require only the application of the linear
959operator to a vector. To provide this action, the user must write a
960routine with the calling sequence
961
962```
963UserMult(Mat mat,Vec x,Vec y);
964```
965
966and then associate it with the matrix, `mat`, by using the command
967
968```
969MatShellSetOperation(Mat mat,MatOperation MATOP_MULT, (void(*)(void)) PetscErrorCode (*UserMult)(Mat,Vec,Vec));
970```
971
972Here `MATOP_MULT` is the name of the operation for matrix-vector
973multiplication. Within each user-defined routine (such as
974`UserMult()`), the user should call `MatShellGetContext()` to obtain
975the user-defined context, `ctx`, that was set by `MatCreateShell()`.
976This shell matrix can be used with the iterative linear equation solvers
977discussed in the following chapters.
978
979The routine `MatShellSetOperation()` can be used to set any other
980matrix operations as well. The file
981`$PETSC_DIR/include/petscmat.h` (<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/include/petscmat.h.html">source</a>)
982provides a complete list of matrix operations, which have the form
983`MATOP_<OPERATION>`, where `<OPERATION>` is the name (in all capital
984letters) of the user interface routine (for example, `MatMult()`
985$\to$ `MATOP_MULT`). All user-provided functions have the same
986calling sequence as the usual matrix interface routines, since the
987user-defined functions are intended to be accessed through the same
988interface, e.g., `MatMult(Mat,Vec,Vec)` $\to$
989`UserMult(Mat,Vec,Vec)`. The final argument for
990`MatShellSetOperation()` needs to be cast to a `void *`, since the
991final argument could (depending on the `MatOperation`) be a variety of
992different functions.
993
994Note that `MatShellSetOperation()` can also be used as a “backdoor”
995means of introducing user-defined changes in matrix operations for other
996storage formats (for example, to override the default LU factorization
997routine supplied within PETSc for the `MATSEQAIJ` format). However, we
998urge anyone who introduces such changes to use caution, since it would
999be very easy to accidentally create a bug in the new routine that could
1000affect other routines as well.
1001
1002See also {any}`sec_nlmatrixfree` for details on one set of
1003helpful utilities for using the matrix-free approach for nonlinear
1004solvers.
1005
1006(sec_mattranspose)=
1007
1008## Transposes of Matrices
1009
1010PETSc provides several ways to work with transposes of matrix.
1011
1012```
1013MatTranspose(Mat A,MatReuse MAT_INITIAL_MATRIX or MAT_INPLACE_MATRIX or MAT_REUSE_MATRIX,Mat *B)
1014```
1015
1016will either do an in-place or out-of-place matrix explicit formation of the matrix transpose. After it has been called
1017with `MAT_INPLACE_MATRIX` it may be called again with `MAT_REUSE_MATRIX` and it will recompute the transpose if the A
1018matrix has changed. Internally it keeps track of whether the nonzero pattern of A has not changed so
1019will reuse the symbolic transpose when possible for efficiency.
1020
1021```
1022MatTransposeSymbolic(Mat A,Mat *B)
1023```
1024
1025only does the symbolic transpose on the matrix. After it is called `MatTranspose()` may be called with
1026`MAT_REUSE_MATRIX` to compute the numerical transpose.
1027
1028Occasionally one may already have a B matrix with the needed sparsity pattern to store the transpose and wants to reuse that
1029space instead of creating a new matrix by calling `MatTranspose`(A,\`\`MAT_INITIAL_MATRIX\`\`,&B) but they cannot just call
1030`MatTranspose`(A,\`\`MAT_REUSE_MATRIX\`\`,&B) so instead they can call `MatTransposeSetPrecusor`(A,B) and then call
1031`MatTranspose`(A,\`\`MAT_REUSE_MATRIX\`\`,&B). This routine just provides to B the meta-data it needs to compute the numerical
1032factorization efficiently.
1033
1034The routine `MatCreateTranspose`(A,&B) provides a surrogate matrix B that behaviors like the transpose of A without forming
1035the transpose explicitly. For example, `MatMult`(B,x,y) will compute the matrix-vector product of A transpose times x.
1036
1037(sec_othermat)=
1038
1039## Other Matrix Operations
1040
1041In many iterative calculations (for instance, in a nonlinear equations
1042solver), it is important for efficiency purposes to reuse the nonzero
1043structure of a matrix, rather than determining it anew every time the
1044matrix is generated. To retain a given matrix but reinitialize its
1045contents, one can employ
1046
1047```
1048MatZeroEntries(Mat A);
1049```
1050
1051This routine will zero the matrix entries in the data structure but keep
1052all the data that indicates where the nonzeros are located. In this way
1053a new matrix assembly will be much less expensive, since no memory
1054allocations or copies will be needed. Of course, one can also explicitly
1055set selected matrix elements to zero by calling `MatSetValues()`.
1056
1057By default, if new entries are made in locations where no nonzeros
1058previously existed, space will be allocated for the new entries. To
1059prevent the allocation of additional memory and simply discard those new
1060entries, one can use the option
1061
1062```
1063MatSetOption(Mat A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
1064```
1065
1066Once the matrix has been assembled, one can factor it numerically
1067without repeating the ordering or the symbolic factorization. This
1068option can save some computational time, although it does require that
1069the factorization is not done in-place.
1070
1071In the numerical solution of elliptic partial differential equations, it
1072can be cumbersome to deal with Dirichlet boundary conditions. In
1073particular, one would like to assemble the matrix without regard to
1074boundary conditions and then at the end apply the Dirichlet boundary
1075conditions. In numerical analysis classes this process is usually
1076presented as moving the known boundary conditions to the right-hand side
1077and then solving a smaller linear system for the interior unknowns.
1078Unfortunately, implementing this requires extracting a large submatrix
1079from the original matrix and creating its corresponding data structures.
1080This process can be expensive in terms of both time and memory.
1081
1082One simple way to deal with this difficulty is to replace those rows in
1083the matrix associated with known boundary conditions, by rows of the
1084identity matrix (or some scaling of it). This action can be done with
1085the command
1086
1087```
1088MatZeroRows(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag_value,Vec x,Vec b),
1089```
1090
1091or equivalently,
1092
1093```
1094MatZeroRowsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b);
1095```
1096
1097For sparse matrices this removes the data structures for certain rows of
1098the matrix. If the pointer `diag_value` is `NULL`, it even removes
1099the diagonal entry. If the pointer is not null, it uses that given value
1100at the pointer location in the diagonal entry of the eliminated rows.
1101
1102One nice feature of this approach is that when solving a nonlinear
1103problem such that at each iteration the Dirichlet boundary conditions
1104are in the same positions and the matrix retains the same nonzero
1105structure, the user can call `MatZeroRows()` in the first iteration.
1106Then, before generating the matrix in the second iteration the user
1107should call
1108
1109```
1110MatSetOption(Mat A,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
1111```
1112
1113From that point, no new values will be inserted into those (boundary)
1114rows of the matrix.
1115
1116The functions `MatZeroRowsLocal()` and `MatZeroRowsLocalIS()` can
1117also be used if for each process one provides the Dirichlet locations in
1118the local numbering of the matrix. A drawback of `MatZeroRows()` is
1119that it destroys the symmetry of a matrix. Thus one can use
1120
1121```
1122MatZeroRowsColumns(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag_value,Vec x,Vec b),
1123```
1124
1125or equivalently,
1126
1127```
1128MatZeroRowsColumnsIS(Mat A,IS rows,PetscScalar diag_value,Vec x,Vec b);
1129```
1130
1131Note that with all of these for a given assembled matrix it can be only
1132called once to update the x and b vector. It cannot be used if one
1133wishes to solve multiple right-hand side problems for the same matrix
1134since the matrix entries needed for updating the b vector are removed in
1135its first use.
1136
1137Once the zeroed rows are removed the new matrix has possibly many rows
1138with only a diagonal entry affecting the parallel load balancing. The
1139`PCREDISTRIBUTE` preconditioner removes all the zeroed rows (and
1140associated columns and adjusts the right-hand side based on the removed
1141columns) and then rebalances the resulting rows of smaller matrix across
1142the processes. Thus one can use `MatZeroRows()` to set the Dirichlet
1143points and then solve with the preconditioner `PCREDISTRIBUTE`. Note
1144if the original matrix was symmetric the smaller solved matrix will also
1145be symmetric.
1146
1147Another matrix routine of interest is
1148
1149```
1150MatConvert(Mat mat,MatType newtype,Mat *M)
1151```
1152
1153which converts the matrix `mat` to new matrix, `M`, that has either
1154the same or different format. Set `newtype` to `MATSAME` to copy the
1155matrix, keeping the same matrix format. See
1156`$PETSC_DIR/include/petscmat.h` (<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/include/petscmat.h.html">source</a>)
1157for other available matrix types; standard ones are `MATSEQDENSE`,
1158`MATSEQAIJ`, `MATMPIAIJ`, `MATSEQBAIJ` and `MATMPIBAIJ`.
1159
1160In certain applications it may be necessary for application codes to
1161directly access elements of a matrix. This may be done by using the the
1162command (for local rows only)
1163
1164```
1165MatGetRow(Mat A,PetscInt row, PetscInt *ncols,const PetscInt (*cols)[],const PetscScalar (*vals)[]);
1166```
1167
1168The argument `ncols` returns the number of nonzeros in that row, while
1169`cols` and `vals` returns the column indices (with indices starting
1170at zero) and values in the row. If only the column indices are needed
1171(and not the corresponding matrix elements), one can use `NULL` for
1172the `vals` argument. Similarly, one can use `NULL` for the `cols`
1173argument. The user can only examine the values extracted with
1174`MatGetRow()`; the values *cannot* be altered. To change the matrix
1175entries, one must use `MatSetValues()`.
1176
1177Once the user has finished using a row, he or she *must* call
1178
1179```
1180MatRestoreRow(Mat A,PetscInt row,PetscInt *ncols,PetscInt **cols,PetscScalar **vals);
1181```
1182
1183to free any space that was allocated during the call to `MatGetRow()`.
1184
1185(sec_symbolic_numeric)=
1186
1187## Symbolic and Numeric Stages in Sparse Matrix Operations
1188
1189Many sparse matrix operations can be optimized by dividing the computation into two stages: a symbolic stage that
1190creates any required data structures and does all the computations that do not require the matrices' numerical values followed by one or more uses of a
1191numerical stage that use the symbolically computed information. Examples of such operations include `MatTranspose()`, `MatCreateSubMatrices()`,
1192`MatCholeskyFactorSymbolic()`, and `MatCholeskyFactorNumeric()`.
1193PETSc uses two different API's to take advantage of these optimizations.
1194
1195The first approach explicitly divides the computation in the API. This approach is used, for example, with `MatCholeskyFactorSymbolic()`, `MatCholeskyFactorNumeric()`.
1196The caller can take advantage of their knowledge of changes in the nonzero structure of the sparse matrices to call the appropriate routines as needed. In fact, they can
1197use `MatGetNonzeroState()` to determine if a new symbolic computation is needed. The drawback of this approach is that the caller of these routines has to
1198manage the creation of new matrices when the nonzero structure changes.
1199
1200The second approach, as exemplified by `MatTranspose()`, does not expose the two stages explicit in the API, instead a flag, `MatReuse` is passed through the
1201API to indicate if a symbolic data structure is already available or needs to be computed. Thus `MatTranspose(A,MAT_INITIAL_MATRIX,&B)` is called first, then
1202`MatTranspose(A,MAT_REUSE_MATRIX,&B)` can be called repeatedly with new numerical values in the A matrix. In theory, if the nonzero structure of A changes, the
1203symbolic computations for B could be redone automatically inside the same B matrix when there is a change in the nonzero state of the A matrix. In practice, in PETSc, the
1204`MAT_REUSE_MATRIX` for most PETSc routines only works if the nonzero structure does not change and the code may crash otherwise. The advantage of this approach
1205(when the nonzero structure changes are handled correctly) is that the calling code does not need to keep track of the nonzero state of the matrices; everything
1206"just works". However, the caller must still know when it is the first call to the routine so the flag `MAT_INITIAL_MATRIX` is being used. If the underlying implementation language supported detecting a yet to be initialized variable at run time, the `MatReuse` flag would not be need.
1207
1208PETSc uses two approaches because the same programming problem was solved with two different ways during PETSc's early development.
1209A better model would combine both approaches; an explicit
1210separation of the stages and a unified operation that internally utilized the two stages appropriately and also handled changes to the nonzero structure. Code could be simplified in many places with this approach, in most places the use of the unified API would replace the use of the separate stages.
1211
1212See {any}`sec_matsub` and {any}`sec_matmatproduct`.
1213
1214(sec_graph)=
1215
1216## Graph Operations
1217
1218PETSc has four families of graph operations that treat sparse `Mat` as representing graphs.
1219
1220```{eval-rst}
1221.. list-table::
1222   :widths: auto
1223   :align: center
1224   :header-rows: 1
1225
1226   * - Operation
1227     - Type
1228     - Available methods
1229     - User guide
1230   * - Ordering to reduce fill
1231     - N/A
1232     - ``MatOrderingType``
1233     - :any:`sec_matfactor`
1234   * - Partitioning for parallelism
1235     - ``MatPartitioning``
1236     - ``MatPartitioningType``
1237     - :any:`sec_partitioning`
1238   * - Coloring for parallelism or Jacobians
1239     - ``MatColoring``
1240     - ``MatColoringType``
1241     - :any:`sec_fdmatrix`
1242   * - Coarsening for multigrid
1243     - ``MatCoarsen``
1244     - ``MatCoarsenType``
1245     - :any:`sec_amg`
1246```
1247
1248(sec_partitioning)=
1249
1250## Partitioning
1251
1252For almost all unstructured grid computation, the distribution of
1253portions of the grid across the process’s work load and memory can have
1254a very large impact on performance. In most PDE calculations the grid
1255partitioning and distribution across the processes can (and should) be
1256done in a “pre-processing” step before the numerical computations.
1257However, this does not mean it need be done in a separate, sequential
1258program; rather, it should be done before one sets up the parallel grid
1259data structures in the actual program. PETSc provides an interface to
1260the ParMETIS (developed by George Karypis; see
1261[the PETSc installation instructions](https://petsc.org/release/install/)
1262for directions on installing PETSc to use ParMETIS) to allow the
1263partitioning to be done in parallel. PETSc does not currently provide
1264directly support for dynamic repartitioning, load balancing by migrating
1265matrix entries between processes, etc. For problems that require mesh
1266refinement, PETSc uses the “rebuild the data structure” approach, as
1267opposed to the “maintain dynamic data structures that support the
1268insertion/deletion of additional vector and matrix rows and columns
1269entries” approach.
1270
1271Partitioning in PETSc is organized around the `MatPartitioning`
1272object. One first creates a parallel matrix that contains the
1273connectivity information about the grid (or other graph-type object)
1274that is to be partitioned. This is done with the command
1275
1276```
1277MatCreateMPIAdj(MPI_Comm comm,int mlocal,PetscInt n,const PetscInt ia[],const PetscInt ja[],PetscInt *weights,Mat *Adj);
1278```
1279
1280The argument `mlocal` indicates the number of rows of the graph being
1281provided by the given process, `n` is the total number of columns;
1282equal to the sum of all the `mlocal`. The arguments `ia` and `ja`
1283are the row pointers and column pointers for the given rows; these are
1284the usual format for parallel compressed sparse row storage, using
1285indices starting at 0, *not* 1.
1286
1287:::{figure} /images/manual/usg.*
1288:alt: Numbering on Simple Unstructured Grid
1289:name: fig_usg
1290
1291Numbering on Simple Unstructured Grid
1292:::
1293
1294This, of course, assumes that one has already distributed the grid
1295(graph) information among the processes. The details of this initial
1296distribution is not important; it could be simply determined by
1297assigning to the first process the first $n_0$ nodes from a file,
1298the second process the next $n_1$ nodes, etc.
1299
1300For example, we demonstrate the form of the `ia` and `ja` for a
1301triangular grid where we
1302
13031. partition by element (triangle)
1304
1305- Process 0: `mlocal = 2`, `n = 4`, `ja =``{2,3, 3}`,
1306  `ia =` `{0,2,3}`
1307- Process 1: `mlocal = 2`, `n = 4`, `ja =``{0, 0,1}`,
1308  `ia =``{0,1,3}`
1309
1310Note that elements are not connected to themselves and we only indicate
1311edge connections (in some contexts single vertex connections between
1312elements may also be included). We use a space above to denote the
1313transition between rows in the matrix; and
1314
13152. partition by vertex.
1316
1317- Process 0: `mlocal = 3`, `n = 6`,
1318  `ja =``{3,4, 4,5, 3,4,5}`, `ia =``{0, 2, 4, 7}`
1319- Process 1: `mlocal = 3`, `n = 6`,
1320  `ja =``{0,2, 4, 0,1,2,3,5, 1,2,4}`,
1321  `ia =``{0, 3, 8, 11}`.
1322
1323Once the connectivity matrix has been created the following code will
1324generate the renumbering required for the new partition
1325
1326```
1327MatPartitioningCreate(MPI_Comm comm,MatPartitioning *part);
1328MatPartitioningSetAdjacency(MatPartitioning part,Mat Adj);
1329MatPartitioningSetFromOptions(MatPartitioning part);
1330MatPartitioningApply(MatPartitioning part,IS *is);
1331MatPartitioningDestroy(MatPartitioning *part);
1332MatDestroy(Mat *Adj);
1333ISPartitioningToNumbering(IS is,IS *isg);
1334```
1335
1336The resulting `isg` contains for each local node the new global number
1337of that node. The resulting `is` contains the new process number that
1338each local node has been assigned to.
1339
1340Now that a new numbering of the nodes has been determined, one must
1341renumber all the nodes and migrate the grid information to the correct
1342process. The command
1343
1344```
1345AOCreateBasicIS(isg,NULL,&ao);
1346```
1347
1348generates, see {any}`sec_ao`, an `AO` object that can be
1349used in conjunction with the `is` and `isg` to move the relevant
1350grid information to the correct process and renumber the nodes etc. In
1351this context, the new ordering is the “application” ordering so
1352`AOPetscToApplication()` converts old global indices to new global
1353indices and `AOApplicationToPetsc()` converts new global indices back
1354to old global indices.
1355
1356PETSc does not currently provide tools that completely manage the
1357migration and node renumbering, since it will be dependent on the
1358particular data structure you use to store the grid information and the
1359type of grid information that you need for your application. We do plan
1360to include more support for this in the future, but designing the
1361appropriate general user interface and providing a scalable
1362implementation that can be used for a wide variety of different grids
1363requires a great deal of time.
1364
1365See {any}`sec_fdmatrix` and {any}`sec_matfactor` for discussions on performing graph coloring and computing graph reorderings to
1366reduce fill in sparse matrix factorizations.
1367
1368```{eval-rst}
1369.. bibliography:: /petsc.bib
1370   :filter: docname in docnames
1371   :keyprefix: keyprefix-
1372   :labelprefix: ref-
1373```
1374