xref: /petsc/doc/manual/ksp.md (revision 4e8208cbcbc709572b8abe32f33c78b69c819375)
1(ch_ksp)=
2
3# KSP: Linear System Solvers
4
5The `KSP` object is the heart of PETSc, because it provides uniform
6and efficient access to all of the package’s linear system solvers,
7including parallel and sequential, direct and iterative. `KSP` is
8intended for solving systems of the form
9
10$$
11A x = b,
12$$ (eq_axeqb)
13
14where $A$ denotes the matrix representation of a linear operator,
15$b$ is the right-hand-side vector, and $x$ is the solution
16vector. `KSP` uses the same calling sequence for both direct and
17iterative solution of a linear system. In addition, particular solution
18techniques and their associated options can be selected at runtime.
19
20`KSP` can also be used to solve least squares problems, using, for example, `KSPLSQR`. See
21`PETSCREGRESSORLINEAR` for tools focusing on linear regression.
22
23The combination of a Krylov subspace method and a preconditioner is at
24the center of most modern numerical codes for the iterative solution of
25linear systems. Many textbooks (e.g. {cite}`fgn` {cite}`vandervorst2003`, or {cite}`saad2003`) provide an
26overview of the theory of such methods.
27The `KSP` package, discussed in
28{any}`sec_ksp`, provides many popular Krylov subspace
29iterative methods; the `PC` module, described in
30{any}`sec_pc`, includes a variety of preconditioners.
31
32(sec_usingksp)=
33
34## Using KSP
35
36To solve a linear system with `KSP`, one must first create a solver
37context with the command
38
39```
40KSPCreate(MPI_Comm comm,KSP *ksp);
41```
42
43Here `comm` is the MPI communicator and `ksp` is the newly formed
44solver context. Before actually solving a linear system with `KSP`,
45the user must call the following routine to set the matrices associated
46with the linear system:
47
48```
49KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat);
50```
51
52The argument `Amat`, representing the matrix that defines the linear
53system, is a symbolic placeholder for any kind of matrix or operator. In
54particular, `KSP` *does* support matrix-free methods. The routine
55`MatCreateShell()` in {any}`sec_matrixfree`
56provides further information regarding matrix-free methods. Typically,
57the matrix from which the preconditioner is to be constructed, `Pmat`,
58is the same as the matrix that defines the linear system, `Amat`;
59however, occasionally these matrices differ (for instance, when a
60matrix used to compute the preconditioner is obtained from a lower order method than that
61employed to form the linear system matrix).
62
63Much of the power of `KSP` can be accessed through the single routine
64
65```
66KSPSetFromOptions(KSP ksp);
67```
68
69This routine accepts the option `-help` as well as any of
70the `KSP` and `PC` options discussed below. To solve a linear
71system, one sets the right hand size and solution vectors using the
72command
73
74```
75KSPSolve(KSP ksp,Vec b,Vec x);
76```
77
78where `b` and `x` respectively denote the right-hand side and
79solution vectors. On return, the iteration number at which the iterative
80process stopped can be obtained using
81
82```
83KSPGetIterationNumber(KSP ksp, PetscInt *its);
84```
85
86Note that this does not state that the method converged at this
87iteration: it can also have reached the maximum number of iterations, or
88have diverged.
89
90{any}`sec_convergencetests` gives more details
91regarding convergence testing. Note that multiple linear solves can be
92performed by the same `KSP` context. Once the `KSP` context is no
93longer needed, it should be destroyed with the command
94
95```
96KSPDestroy(KSP *ksp);
97```
98
99The above procedure is sufficient for general use of the `KSP`
100package. One additional step is required for users who wish to customize
101certain preconditioners (e.g., see {any}`sec_bjacobi`) or
102to log certain performance data using the PETSc profiling facilities (as
103discussed in {any}`ch_profiling`). In this case, the user can
104optionally explicitly call
105
106```
107KSPSetUp(KSP ksp);
108```
109
110before calling `KSPSolve()` to perform any setup required for the
111linear solvers. The explicit call of this routine enables the separate
112profiling of any computations performed during the set up phase, such
113as incomplete factorization for the ILU preconditioner.
114
115The default solver within `KSP` is restarted GMRES, `KSPGMRES`, preconditioned for
116the uniprocess case with ILU(0), and for the multiprocess case with the
117block Jacobi method (with one block per process, each of which is solved
118with ILU(0)). A variety of other solvers and options are also available.
119To allow application programmers to set any of the preconditioner or
120Krylov subspace options directly within the code, we provide routines
121that extract the `PC` and `KSP` contexts,
122
123```
124KSPGetPC(KSP ksp,PC *pc);
125```
126
127The application programmer can then directly call any of the `PC` or
128`KSP` routines to modify the corresponding default options.
129
130To solve a linear system with a direct solver (supported by
131PETSc for sequential matrices, and by several external solvers through
132PETSc interfaces, see {any}`sec_externalsol`) one may use
133the options `-ksp_type` `preonly` (or the equivalent `-ksp_type` `none`)
134`-pc_type` `lu` or `-pc_type` `cholesky` (see below).
135
136By default, if a direct solver is used, the factorization is *not* done
137in-place. This approach prevents the user from the unexpected surprise
138of having a corrupted matrix after a linear solve. The routine
139`PCFactorSetUseInPlace()`, discussed below, causes factorization to be
140done in-place.
141
142## Solving Successive Linear Systems
143
144When solving multiple linear systems of the same size with the same
145method, several options are available. To solve successive linear
146systems having the *same* matrix from which to construct the preconditioner (i.e., the same data
147structure with exactly the same matrix elements) but different
148right-hand-side vectors, the user should simply call `KSPSolve()`
149multiple times. The preconditioner setup operations (e.g., factorization
150for ILU) will be done during the first call to `KSPSolve()` only; such
151operations will *not* be repeated for successive solves.
152
153To solve successive linear systems that have *different* matrix values, because you
154have changed the matrix values in the `Mat` objects you passed to `KSPSetOperators()`,
155still simply call `KPSSolve()`. In this case the preconditioner will be recomputed
156automatically. Use the option `-ksp_reuse_preconditioner true`, or call
157`KSPSetReusePreconditioner()`, to reuse the previously computed preconditioner.
158For many problems, if the matrix changes values only slightly, reusing the
159old preconditioner can be more efficient.
160
161If you wish to reuse the `KSP` with a different sized matrix and vectors, you must
162call `KSPReset()` before calling `KSPSetOperators()` with the new matrix.
163
164(sec_ksp)=
165
166## Krylov Methods
167
168The Krylov subspace methods accept a number of options, many of which
169are discussed below. First, to set the Krylov subspace method that is to
170be used, one calls the command
171
172```
173KSPSetType(KSP ksp,KSPType method);
174```
175
176The type can be one of `KSPRICHARDSON`, `KSPCHEBYSHEV`, `KSPCG`,
177`KSPGMRES`, `KSPTCQMR`, `KSPBCGS`, `KSPCGS`, `KSPTFQMR`,
178`KSPCR`, `KSPLSQR`, `KSPBICG`, `KSPPREONLY` (or the equivalent `KSPNONE`), or others; see
179{any}`tab-kspdefaults` or the `KSPType` man page for more.
180The `KSP` method can also be set with the options database command
181`-ksp_type`, followed by one of the options `richardson`,
182`chebyshev`, `cg`, `gmres`, `tcqmr`, `bcgs`, `cgs`,
183`tfqmr`, `cr`, `lsqr`, `bicg`, `preonly` (or the equivalent `none`), or others (see
184{any}`tab-kspdefaults` or the `KSPType` man page). There are
185method-specific options. For instance, for the Richardson, Chebyshev, and
186GMRES methods:
187
188```
189KSPRichardsonSetScale(KSP ksp,PetscReal scale);
190KSPChebyshevSetEigenvalues(KSP ksp,PetscReal emax,PetscReal emin);
191KSPGMRESSetRestart(KSP ksp,PetscInt max_steps);
192```
193
194The default parameter values are
195`scale=1.0, emax=0.01, emin=100.0`, and `max_steps=30`. The
196GMRES restart and Richardson damping factor can also be set with the
197options `-ksp_gmres_restart <n>` and
198`-ksp_richardson_scale <factor>`.
199
200The default technique for orthogonalization of the Krylov vectors in
201GMRES is the unmodified (classical) Gram-Schmidt method, which can be
202set with
203
204```
205KSPGMRESSetOrthogonalization(KSP ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);
206```
207
208or the options database command `-ksp_gmres_classicalgramschmidt`. By
209default this will *not* use iterative refinement to improve the
210stability of the orthogonalization. This can be changed with the option
211
212```
213KSPGMRESSetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType type)
214```
215
216or via the options database with
217
218```
219-ksp_gmres_cgs_refinement_type <refine_never,refine_ifneeded,refine_always>
220```
221
222The values for `KSPGMRESCGSRefinementType()` are
223`KSP_GMRES_CGS_REFINE_NEVER`, `KSP_GMRES_CGS_REFINE_IFNEEDED`
224and `KSP_GMRES_CGS_REFINE_ALWAYS`.
225
226One can also use modified Gram-Schmidt, by using the orthogonalization
227routine `KSPGMRESModifiedGramSchmidtOrthogonalization()` or by using
228the command line option `-ksp_gmres_modifiedgramschmidt`.
229
230For the conjugate gradient method with complex numbers, there are two
231slightly different algorithms depending on whether the matrix is
232Hermitian symmetric or truly symmetric (the default is to assume that it
233is Hermitian symmetric). To indicate that it is symmetric, one uses the
234command
235
236```
237KSPCGSetType(ksp,KSP_CG_SYMMETRIC);
238```
239
240Note that this option is not valid for all matrices.
241
242Some `KSP` types do not support preconditioning. For instance,
243the CGLS algorithm does not involve a preconditioner; any preconditioner
244set to work with the `KSP` object is ignored if `KSPCGLS` was
245selected.
246
247By default, `KSP` assumes an initial guess of zero by zeroing the
248initial value for the solution vector that is given; this zeroing is
249done at the call to `KSPSolve()`. To use a nonzero initial guess, the
250user *must* call
251
252```
253KSPSetInitialGuessNonzero(KSP ksp,PetscBool flg);
254```
255
256(sec_ksppc)=
257
258### Preconditioning within KSP
259
260Since the rate of convergence of Krylov projection methods for a
261particular linear system is strongly dependent on its spectrum,
262preconditioning is typically used to alter the spectrum and hence
263accelerate the convergence rate of iterative techniques. Preconditioning
264can be applied to the system {eq}`eq_axeqb` by
265
266$$
267(M_L^{-1} A M_R^{-1}) \, (M_R x) = M_L^{-1} b,
268$$ (eq_prec)
269
270where $M_L$ and $M_R$ indicate preconditioning matrices (or,
271matrices from which the preconditioner is to be constructed). If
272$M_L = I$ in {eq}`eq_prec`, right preconditioning
273results, and the residual of {eq}`eq_axeqb`,
274
275$$
276r \equiv b - Ax = b - A M_R^{-1} \, M_R x,
277$$
278
279is preserved. In contrast, the residual is altered for left
280($M_R = I$) and symmetric preconditioning, as given by
281
282$$
283r_L \equiv M_L^{-1} b - M_L^{-1} A x = M_L^{-1} r.
284$$
285
286By default, most KSP implementations use left preconditioning. Some more
287naturally use other options, though. For instance, `KSPQCG` defaults
288to use symmetric preconditioning and `KSPFGMRES` uses right
289preconditioning by default. Right preconditioning can be activated for
290some methods by using the options database command
291`-ksp_pc_side right` or calling the routine
292
293```
294KSPSetPCSide(ksp,PC_RIGHT);
295```
296
297Attempting to use right preconditioning for a method that does not
298currently support it results in an error message of the form
299
300```none
301KSPSetUp_Richardson:No right preconditioning for KSPRICHARDSON
302```
303
304```{eval-rst}
305.. list-table:: KSP Objects
306  :name: tab-kspdefaults
307  :header-rows: 1
308
309  * - Method
310    - KSPType
311    - Options Database
312  * - Richardson
313    - ``KSPRICHARDSON``
314    - ``richardson``
315  * - Chebyshev
316    - ``KSPCHEBYSHEV``
317    - ``chebyshev``
318  * - Conjugate Gradient :cite:`hs:52`
319    - ``KSPCG``
320    - ``cg``
321  * - Pipelined Conjugate Gradients :cite:`ghyselsvanroose2014`
322    - ``KSPPIPECG``
323    - ``pipecg``
324  * - Pipelined Conjugate Gradients (Gropp)
325    - ``KSPGROPPCG``
326    - ``groppcg``
327  * - Pipelined Conjugate Gradients with Residual Replacement
328    - ``KSPPIPECGRR``
329    - ``pipecgrr``
330  * - Conjugate Gradients for the Normal Equations
331    - ``KSPCGNE``
332    - ``cgne``
333  * - Flexible Conjugate Gradients :cite:`flexiblecg`
334    - ``KSPFCG``
335    - ``fcg``
336  * -  Pipelined, Flexible Conjugate Gradients :cite:`sananschneppmay2016`
337    - ``KSPPIPEFCG``
338    - ``pipefcg``
339  * - Conjugate Gradients for Least Squares
340    - ``KSPCGLS``
341    - ``cgls``
342  * - Conjugate Gradients with Constraint (1)
343    - ``KSPNASH``
344    - ``nash``
345  * - Conjugate Gradients with Constraint (2)
346    - ``KSPSTCG``
347    - ``stcg``
348  * - Conjugate Gradients with Constraint (3)
349    - ``KSPGLTR``
350    - ``gltr``
351  * - Conjugate Gradients with Constraint (4)
352    - ``KSPQCG``
353    - ``qcg``
354  * - BiConjugate Gradient
355    - ``KSPBICG``
356    - ``bicg``
357  * - BiCGSTAB :cite:`v:92`
358    - ``KSPBCGS``
359    - ``bcgs``
360  * - Improved BiCGSTAB
361    - ``KSPIBCGS``
362    - ``ibcgs``
363  * - QMRCGSTAB :cite:`chan1994qmrcgs`
364    - ``KSPQMRCGS``
365    - ``qmrcgs``
366  * - Flexible BiCGSTAB
367    - ``KSPFBCGS``
368    - ``fbcgs``
369  * - Flexible BiCGSTAB (variant)
370    - ``KSPFBCGSR``
371    - ``fbcgsr``
372  * - Enhanced BiCGSTAB(L)
373    - ``KSPBCGSL``
374    - ``bcgsl``
375  * - Minimal Residual Method :cite:`paige.saunders:solution`
376    - ``KSPMINRES``
377    - ``minres``
378  * - Generalized Minimal Residual :cite:`saad.schultz:gmres`
379    - ``KSPGMRES``
380    - ``gmres``
381  * - Flexible Generalized Minimal Residual :cite:`saad1993`
382    - ``KSPFGMRES``
383    - ``fgmres``
384  * - Deflated Generalized Minimal Residual
385    - ``KSPDGMRES``
386    - ``dgmres``
387  * - Pipelined Generalized Minimal Residual :cite:`ghyselsashbymeerbergenvanroose2013`
388    - ``KSPPGMRES``
389    - ``pgmres``
390  * - Pipelined, Flexible Generalized Minimal Residual :cite:`sananschneppmay2016`
391    - ``KSPPIPEFGMRES``
392    - ``pipefgmres``
393  * - Generalized Minimal Residual with Accelerated Restart
394    - ``KSPLGMRES``
395    - ``lgmres``
396  * - Conjugate Residual :cite:`eisenstat1983variational`
397    - ``KSPCR``
398    - ``cr``
399  * - Generalized Conjugate Residual
400    - ``KSPGCR``
401    - ``gcr``
402  * - Pipelined Conjugate Residual
403    - ``KSPPIPECR``
404    - ``pipecr``
405  * - Pipelined, Flexible Conjugate Residual :cite:`sananschneppmay2016`
406    - ``KSPPIPEGCR``
407    - ``pipegcr``
408  * - FETI-DP
409    - ``KSPFETIDP``
410    - ``fetidp``
411  * - Conjugate Gradient Squared :cite:`so:89`
412    - ``KSPCGS``
413    - ``cgs``
414  * - Transpose-Free Quasi-Minimal Residual (1) :cite:`f:93`
415    - ``KSPTFQMR``
416    - ``tfqmr``
417  * - Transpose-Free Quasi-Minimal Residual (2)
418    - ``KSPTCQMR``
419    - ``tcqmr``
420  * - Least Squares Method
421    - ``KSPLSQR``
422    - ``lsqr``
423  * - Symmetric LQ Method :cite:`paige.saunders:solution`
424    - ``KSPSYMMLQ``
425    - ``symmlq``
426  * - TSIRM
427    - ``KSPTSIRM``
428    - ``tsirm``
429  * - Python Shell
430    - ``KSPPYTHON``
431    - ``python``
432  * - Shell for no ``KSP`` method
433    - ``KSPNONE``
434    - ``none``
435
436```
437
438Note: the bi-conjugate gradient method requires application of both the
439matrix and its transpose plus the preconditioner and its transpose.
440Currently not all matrices and preconditioners provide this support and
441thus the `KSPBICG` cannot always be used.
442
443Note: PETSc implements the FETI-DP (Finite Element Tearing and
444Interconnecting Dual-Primal) method as an implementation of `KSP` since it recasts the
445original problem into a constrained minimization one with Lagrange
446multipliers. The only matrix type supported is `MATIS`. Support for
447saddle point problems is provided. See the man page for `KSPFETIDP` for
448further details.
449
450(sec_convergencetests)=
451
452### Convergence Tests
453
454The default convergence test, `KSPConvergedDefault()`, uses the \$ l_2 \$ norm of the preconditioned \$ B(b - A x) \$ or unconditioned residual \$ b - Ax\$, depending on the `KSPType` and the value of `KSPNormType` set with `KSPSetNormType`. For `KSPCG` and `KSPGMRES` the default is the norm of the preconditioned residual.
455The preconditioned residual is used by default for
456convergence testing of all left-preconditioned `KSP` methods. For the
457conjugate gradient, Richardson, and Chebyshev methods the true residual
458can be used by the options database command
459`-ksp_norm_type unpreconditioned` or by calling the routine
460
461```
462KSPSetNormType(ksp, KSP_NORM_UNPRECONDITIONED);
463```
464
465`KSPCG` also supports using the natural norm induced by the symmetric positive-definite
466matrix that defines the linear system with the options database command `-ksp_norm_type natural` or by calling the routine
467
468```
469KSPSetNormType(ksp, KSP_NORM_NATURAL);
470```
471
472Convergence (or divergence) is decided
473by three quantities: the decrease of the residual norm relative to the
474norm of the right-hand side, `rtol`, the absolute size of the residual
475norm, `atol`, and the relative increase in the residual, `dtol`.
476Convergence is detected at iteration $k$ if
477
478$$
479\| r_k \|_2 < {\rm max} ( \text{rtol} * \| b \|_2, \text{atol}),
480$$
481
482where $r_k = b - A x_k$. Divergence is detected if
483
484$$
485\| r_k \|_2 > \text{dtol} * \| b \|_2.
486$$
487
488These parameters, as well as the maximum number of allowable iterations,
489can be set with the routine
490
491```
492KSPSetTolerances(KSP ksp,PetscReal rtol,PetscReal atol,PetscReal dtol,PetscInt maxits);
493```
494
495The user can retain the current value of any of these parameters by
496specifying `PETSC_CURRENT` as the corresponding tolerance; the
497defaults are `rtol=1e-5`, `atol=1e-50`, `dtol=1e5`, and
498`maxits=1e4`. Using `PETSC_DETERMINE` will set the parameters back to their
499initial values when the object's type was set. These parameters can also be set from the options
500database with the commands `-ksp_rtol` `<rtol>`, `-ksp_atol`
501`<atol>`, `-ksp_divtol` `<dtol>`, and `-ksp_max_it` `<its>`.
502
503In addition to providing an interface to a simple convergence test,
504`KSP` allows the application programmer the flexibility to provide
505customized convergence-testing routines. The user can specify a
506customized routine with the command
507
508```
509KSPSetConvergenceTest(KSP ksp, PetscErrorCode (*test)(KSP ksp, PetscInt it, PetscReal rnorm, KSPConvergedReason *reason, PetscCtx ctx), PetscCtx ctx, PetscErrorCode (*destroy)(PetscCtxRt ctx));
510```
511
512The final routine argument, `ctx`, is an optional context for private
513data for the user-defined convergence routine, `test`. Other `test`
514routine arguments are the iteration number, `it`, and the residual’s
515norm, `rnorm`. The routine for detecting convergence,
516`test`, should set `reason` to positive for convergence, 0 for no
517convergence, and negative for failure to converge. A full list of
518possible values is given in the `KSPConvergedReason` manual page.
519You can use `KSPGetConvergedReason()` after
520`KSPSolve()` to see why convergence/divergence was detected.
521
522(sec_kspmonitor)=
523
524### Convergence Monitoring
525
526By default, the Krylov solvers, `KSPSolve()`, run silently without displaying
527information about the iterations. The user can indicate that the norms
528of the residuals should be displayed at each iteration by using `-ksp_monitor` with
529the options database. To display the residual norms in a graphical
530window (running under X Windows), one should use
531`-ksp_monitor draw::draw_lg`. Application programmers can also
532provide their own routines to perform the monitoring by using the
533command
534
535```
536KSPMonitorSet(KSP ksp, PetscErrorCode (*mon)(KSP ksp, PetscInt it, PetscReal rnorm, PetscCtx ctx), PetscCtx ctx, (PetscCtxDestroyFn *)mondestroy);
537```
538
539The final routine argument, `ctx`, is an optional context for private
540data for the user-defined monitoring routine, `mon`. Other `mon`
541routine arguments are the iteration number (`it`) and the residual’s
542norm (`rnorm`), as discussed above in {any}`sec_convergencetests`.
543A helpful routine within user-defined
544monitors is `PetscObjectGetComm((PetscObject)ksp,MPI_Comm *comm)`,
545which returns in `comm` the MPI communicator for the `KSP` context.
546See {any}`sec_writing` for more discussion of the use of
547MPI communicators within PETSc.
548
549Many monitoring routines are supplied with PETSc, including
550
551```
552KSPMonitorResidual(KSP, PetscInt, PetscReal, PetscCtx);
553KSPMonitorSingularValue(KSP, PetscInt, PetscReal, PetscCtx);
554KSPMonitorTrueResidual(KSP, PetscInt, PetscReal, PetscCtx);
555```
556
557The default monitor simply prints an estimate of a norm of
558the residual at each iteration. The routine
559`KSPMonitorSingularValue()` is appropriate only for use with the
560conjugate gradient method or GMRES, since it prints estimates of the
561extreme singular values of the preconditioned operator at each
562iteration computed via the Lanczos or Arnoldi algorithms.
563
564Since `KSPMonitorTrueResidual()` prints the true
565residual at each iteration by actually computing the residual using the
566formula $r = b - Ax$, the routine is slow and should be used only
567for testing or convergence studies, not for timing. These `KSPSolve()` monitors may
568be accessed with the command line options `-ksp_monitor`,
569`-ksp_monitor_singular_value`, and `-ksp_monitor_true_residual`.
570
571To employ the default graphical monitor, one should use the command
572`-ksp_monitor draw::draw_lg`.
573
574One can cancel hardwired monitoring routines for KSP at runtime with
575`-ksp_monitor_cancel`.
576
577### Understanding the Operator’s Spectrum
578
579Since the convergence of Krylov subspace methods depends strongly on the
580spectrum (eigenvalues) of the preconditioned operator, PETSc has
581specific routines for eigenvalue approximation via the Arnoldi or
582Lanczos iteration. First, before the linear solve one must call
583
584```
585KSPSetComputeEigenvalues(ksp,PETSC_TRUE);
586```
587
588Then after the `KSP` solve one calls
589
590```
591KSPComputeEigenvalues(KSP ksp,PetscInt n,PetscReal *realpart,PetscReal *complexpart,PetscInt *neig);
592```
593
594Here, `n` is the size of the two arrays and the eigenvalues are
595inserted into those two arrays. `neig` is the number of eigenvalues
596computed; this number depends on the size of the Krylov space generated
597during the linear system solution, for GMRES it is never larger than the
598`restart` parameter. There is an additional routine
599
600```
601KSPComputeEigenvaluesExplicitly(KSP ksp, PetscInt n,PetscReal *realpart,PetscReal *complexpart);
602```
603
604that is useful only for very small problems. It explicitly computes the
605full representation of the preconditioned operator and calls LAPACK to
606compute its eigenvalues. It should be only used for matrices of size up
607to a couple hundred. The `PetscDrawSP*()` routines are very useful for
608drawing scatter plots of the eigenvalues.
609
610The eigenvalues may also be computed and displayed graphically with the
611options data base commands `-ksp_view_eigenvalues draw` and
612`-ksp_view_eigenvalues_explicit draw`. Or they can be dumped to the
613screen in ASCII text via `-ksp_view_eigenvalues` and
614`-ksp_view_eigenvalues_explicit`.
615
616(sec_flexibleksp)=
617
618### Flexible Krylov Methods
619
620Standard Krylov methods require that the preconditioner be a linear operator, thus, for example, a standard `KSP` method
621cannot use a `KSP` in its preconditioner, as is common in the Block-Jacobi method `PCBJACOBI`, for example.
622Flexible Krylov methods are a subset of methods that allow (with modest additional requirements
623on memory) the preconditioner to be nonlinear. For example, they can be used with the `PCKSP` preconditioner.
624The flexible `KSP` methods have the label "Flexible" in {any}`tab-kspdefaults`.
625
626One can use `KSPMonitorDynamicTolerance()` to control the tolerances used by inner `KSP` solvers in `PCKSP`, `PCBJACOBI`, and `PCDEFLATION`.
627
628In addition to supporting `PCKSP`, the flexible methods support `KSPFlexibleSetModifyPC()` to
629allow the user to provide a callback function that changes the preconditioner at each Krylov iteration. Its calling sequence is as follows.
630
631```
632PetscErrorCode f(KSP ksp, PetscInt total_its, PetscInt its_since_restart, PetscReal res_norm, PetscCtx ctx);
633```
634
635(sec_pipelineksp)=
636
637### Pipelined Krylov Methods
638
639Standard Krylov methods have one or more global reductions resulting from the computations of inner products or norms in each iteration.
640These reductions need to block until all MPI processes have received the results. For a large number of MPI processes (this number is machine dependent
641but can be above 10,000 processes) this synchronization is very time consuming and can significantly slow the computation. Pipelined Krylov
642methods overlap the reduction operations with local computations (generally the application of the matrix-vector products and precondtiioners)
643thus effectively "hiding" the time of the reductions. In addition, they may reduce the number of global synchronizations by rearranging the
644computations in a way that some of them can be collapsed, e.g., two or more calls to `MPI_Allreduce()` may be combined into one call.
645The pipeline `KSP` methods have the label "Pipeline" in {any}`tab-kspdefaults`.
646
647Special configuration of MPI may be necessary for reductions to make asynchronous progress, which is important for
648performance of pipelined methods. See {any}`doc_faq_pipelined` for details.
649
650### Other KSP Options
651
652To obtain the solution vector and right-hand side from a `KSP`
653context, one uses
654
655```
656KSPGetSolution(KSP ksp,Vec *x);
657KSPGetRhs(KSP ksp,Vec *rhs);
658```
659
660During the iterative process the solution may not yet have been
661calculated or it may be stored in a different location. To access the
662approximate solution during the iterative process, one uses the command
663
664```
665KSPBuildSolution(KSP ksp,Vec w,Vec *v);
666```
667
668where the solution is returned in `v`. The user can optionally provide
669a vector in `w` as the location to store the vector; however, if `w`
670is `NULL`, space allocated by PETSc in the `KSP` context is used.
671One should not destroy this vector. For certain `KSP` methods (e.g.,
672GMRES), the construction of the solution is expensive, while for many
673others it doesn’t even require a vector copy.
674
675Access to the residual is done in a similar way with the command
676
677```
678KSPBuildResidual(KSP ksp,Vec t,Vec w,Vec *v);
679```
680
681Again, for GMRES and certain other methods this is an expensive
682operation.
683
684(sec_pc)=
685
686## Preconditioners
687
688As discussed in {any}`sec_ksppc`, Krylov subspace methods
689are typically used in conjunction with a preconditioner. To employ a
690particular preconditioning method, the user can either select it from
691the options database using input of the form `-pc_type <methodname>`
692or set the method with the command
693
694```
695PCSetType(PC pc,PCType method);
696```
697
698In {any}`tab-pcdefaults` we summarize the basic
699preconditioning methods supported in PETSc. See the `PCType` manual
700page for a complete list.
701
702The `PCSHELL` preconditioner allows users to provide their own
703specific, application-provided custom preconditioner.
704
705The direct
706preconditioner, `PCLU` , is, in fact, a direct solver for the linear
707system that uses LU factorization. `PCLU` is included as a
708preconditioner so that PETSc has a consistent interface among direct and
709iterative linear solvers.
710
711PETSc provides several domain decomposition methods/preconditioners including
712`PCASM`, `PCGASM`, `PCBDDC`, and `PCHPDDM`. In addition PETSc provides
713multiple multigrid solvers/preconditioners including `PCMG`, `PCGAMG`, `PCHYPRE`,
714and `PCML`. See further discussion below.
715
716```{eval-rst}
717.. list-table:: PETSc Preconditioners (partial list)
718   :name: tab-pcdefaults
719   :header-rows: 1
720
721   * - Method
722     - PCType
723     - Options Database
724   * - Jacobi
725     - ``PCJACOBI``
726     - ``jacobi``
727   * - Block Jacobi
728     - ``PCBJACOBI``
729     - ``bjacobi``
730   * - SOR (and SSOR)
731     - ``PCSOR``
732     - ``sor``
733   * - SOR with Eisenstat trick
734     - ``PCEISENSTAT``
735     - ``eisenstat``
736   * - Incomplete Cholesky
737     - ``PCICC``
738     - ``icc``
739   * - Incomplete LU
740     - ``PCILU``
741     - ``ilu``
742   * - Additive Schwarz
743     - ``PCASM``
744     - ``asm``
745   * - Generalized Additive Schwarz
746     - ``PCGASM``
747     - ``gasm``
748   * - Algebraic Multigrid
749     - ``PCGAMG``
750     - ``gamg``
751   * - Balancing Domain Decomposition by Constraints
752     - ``PCBDDC``
753     - ``bddc``
754   * - Linear solver
755     - ``PCKSP``
756     - ``ksp``
757   * - Combination of preconditioners
758     - ``PCCOMPOSITE``
759     - ``composite``
760   * - LU
761     - ``PCLU``
762     - ``lu``
763   * - Cholesky
764     - ``PCCHOLESKY``
765     - ``cholesky``
766   * - No preconditioning
767     - ``PCNONE``
768     - ``none``
769   * - Shell for user-defined ``PC``
770     - ``PCSHELL``
771     - ``shell``
772```
773
774Each preconditioner may have associated with it a set of options, which
775can be set with routines and options database commands provided for this
776purpose. Such routine names and commands are all of the form
777`PC<TYPE><Option>` and `-pc_<type>_<option> [value]`. A complete
778list can be found by consulting the `PCType` manual page; we discuss
779just a few in the sections below.
780
781(sec_ilu_icc)=
782
783### ILU and ICC Preconditioners
784
785Some of the options for ILU preconditioner are
786
787```
788PCFactorSetLevels(PC pc,PetscInt levels);
789PCFactorSetReuseOrdering(PC pc,PetscBool flag);
790PCFactorSetDropTolerance(PC pc,PetscReal dt,PetscReal dtcol,PetscInt dtcount);
791PCFactorSetReuseFill(PC pc,PetscBool flag);
792PCFactorSetUseInPlace(PC pc,PetscBool flg);
793PCFactorSetAllowDiagonalFill(PC pc,PetscBool flg);
794```
795
796When repeatedly solving linear systems with the same `KSP` context,
797one can reuse some information computed during the first linear solve.
798In particular, `PCFactorSetReuseOrdering()` causes the ordering (for
799example, set with `-pc_factor_mat_ordering_type` `order`) computed
800in the first factorization to be reused for later factorizations.
801`PCFactorSetUseInPlace()` is often used with `PCASM` or
802`PCBJACOBI` when zero fill is used, since it reuses the matrix space
803to store the incomplete factorization it saves memory and copying time.
804Note that in-place factorization is not appropriate with any ordering
805besides natural and cannot be used with the drop tolerance
806factorization. These options may be set in the database with
807
808- `-pc_factor_levels <levels>`
809- `-pc_factor_reuse_ordering`
810- `-pc_factor_reuse_fill`
811- `-pc_factor_in_place`
812- `-pc_factor_nonzeros_along_diagonal`
813- `-pc_factor_diagonal_fill`
814
815See {any}`sec_symbolfactor` for information on
816preallocation of memory for anticipated fill during factorization. By
817alleviating the considerable overhead for dynamic memory allocation,
818such tuning can significantly enhance performance.
819
820PETSc supports incomplete factorization preconditioners
821for several matrix types for sequential matrices (for example
822`MATSEQAIJ`, `MATSEQBAIJ`, and `MATSEQSBAIJ`).
823
824### SOR and SSOR Preconditioners
825
826PETSc provides only a sequential SOR preconditioner; it can only be
827used with sequential matrices or as the subblock preconditioner when
828using block Jacobi or ASM preconditioning (see below).
829
830The options for SOR preconditioning with `PCSOR` are
831
832```
833PCSORSetOmega(PC pc,PetscReal omega);
834PCSORSetIterations(PC pc,PetscInt its,PetscInt lits);
835PCSORSetSymmetric(PC pc,MatSORType type);
836```
837
838The first of these commands sets the relaxation factor for successive
839over (under) relaxation. The second command sets the number of inner
840iterations `its` and local iterations `lits` (the number of
841smoothing sweeps on a process before doing a ghost point update from the
842other processes) to use between steps of the Krylov space method. The
843total number of SOR sweeps is given by `its*lits`. The third command
844sets the kind of SOR sweep, where the argument `type` can be one of
845`SOR_FORWARD_SWEEP`, `SOR_BACKWARD_SWEEP` or
846`SOR_SYMMETRIC_SWEEP`, the default being `SOR_FORWARD_SWEEP`.
847Setting the type to be `SOR_SYMMETRIC_SWEEP` produces the SSOR method.
848In addition, each process can locally and independently perform the
849specified variant of SOR with the types `SOR_LOCAL_FORWARD_SWEEP`,
850`SOR_LOCAL_BACKWARD_SWEEP`, and `SOR_LOCAL_SYMMETRIC_SWEEP`. These
851variants can also be set with the options `-pc_sor_omega <omega>`,
852`-pc_sor_its <its>`, `-pc_sor_lits <lits>`, `-pc_sor_backward`,
853`-pc_sor_symmetric`, `-pc_sor_local_forward`,
854`-pc_sor_local_backward`, and `-pc_sor_local_symmetric`.
855
856The Eisenstat trick {cite}`eisenstat81` for SSOR
857preconditioning can be employed with the method `PCEISENSTAT`
858(`-pc_type` `eisenstat`). By using both left and right
859preconditioning of the linear system, this variant of SSOR requires
860about half of the floating-point operations for conventional SSOR. The
861option `-pc_eisenstat_no_diagonal_scaling` (or the routine
862`PCEisenstatSetNoDiagonalScaling()`) turns off diagonal scaling in
863conjunction with Eisenstat SSOR method, while the option
864`-pc_eisenstat_omega <omega>` (or the routine
865`PCEisenstatSetOmega(PC pc,PetscReal omega)`) sets the SSOR relaxation
866coefficient, `omega`, as discussed above.
867
868(sec_factorization)=
869
870### LU Factorization
871
872The LU preconditioner provides several options. The first, given by the
873command
874
875```
876PCFactorSetUseInPlace(PC pc,PetscBool flg);
877```
878
879causes the factorization to be performed in-place and hence destroys the
880original matrix. The options database variant of this command is
881`-pc_factor_in_place`. Another direct preconditioner option is
882selecting the ordering of equations with the command
883`-pc_factor_mat_ordering_type <ordering>`. The possible orderings are
884
885- `MATORDERINGNATURAL` - Natural
886- `MATORDERINGND` - Nested Dissection
887- `MATORDERING1WD` - One-way Dissection
888- `MATORDERINGRCM` - Reverse Cuthill-McKee
889- `MATORDERINGQMD` - Quotient Minimum Degree
890
891These orderings can also be set through the options database by
892specifying one of the following: `-pc_factor_mat_ordering_type`
893`natural`, or `nd`, or `1wd`, or `rcm`, or `qmd`. In addition,
894see `MatGetOrdering()`, discussed in {any}`sec_matfactor`.
895
896The sparse LU factorization provided in PETSc does not perform pivoting
897for numerical stability (since they are designed to preserve nonzero
898structure), and thus occasionally an LU factorization will fail with a
899zero pivot when, in fact, the matrix is non-singular. The option
900`-pc_factor_nonzeros_along_diagonal <tol>` will often help eliminate
901the zero pivot, by preprocessing the column ordering to remove small
902values from the diagonal. Here, `tol` is an optional tolerance to
903decide if a value is nonzero; by default it is `1.e-10`.
904
905In addition, {any}`sec_symbolfactor` provides information
906on preallocation of memory for anticipated fill during factorization.
907Such tuning can significantly enhance performance, since it eliminates
908the considerable overhead for dynamic memory allocation.
909
910(sec_bjacobi)=
911
912### Block Jacobi and Overlapping Additive Schwarz Preconditioners
913
914The block Jacobi and overlapping additive Schwarz (domain decomposition) methods in PETSc are
915supported in parallel; however, only the uniprocess version of the block
916Gauss-Seidel method is available. By default, the PETSc
917implementations of these methods employ ILU(0) factorization on each
918individual block (that is, the default solver on each subblock is
919`PCType=PCILU`, `KSPType=KSPPREONLY` (or equivalently `KSPType=KSPNONE`); the user can set alternative
920linear solvers via the options `-sub_ksp_type` and `-sub_pc_type`.
921In fact, all of the `KSP` and `PC` options can be applied to the
922subproblems by inserting the prefix `-sub_` at the beginning of the
923option name. These options database commands set the particular options
924for *all* of the blocks within the global problem. In addition, the
925routines
926
927```
928PCBJacobiGetSubKSP(PC pc,PetscInt *n_local,PetscInt *first_local,KSP **subksp);
929PCASMGetSubKSP(PC pc,PetscInt *n_local,PetscInt *first_local,KSP **subksp);
930```
931
932extract the `KSP` context for each local block. The argument
933`n_local` is the number of blocks on the calling process, and
934`first_local` indicates the global number of the first block on the
935process. The blocks are numbered successively by processes from zero
936through $b_g-1$, where $b_g$ is the number of global blocks.
937The array of `KSP` contexts for the local blocks is given by
938`subksp`. This mechanism enables the user to set different solvers for
939the various blocks. To set the appropriate data structures, the user
940*must* explicitly call `KSPSetUp()` before calling
941`PCBJacobiGetSubKSP()` or `PCASMGetSubKSP(`). For further details,
942see
943<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex7.c.html">KSP Tutorial ex7</a>
944or
945<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex8.c.html">KSP Tutorial ex8</a>.
946
947The block Jacobi, block Gauss-Seidel, and additive Schwarz
948preconditioners allow the user to set the number of blocks into which
949the problem is divided. The options database commands to set this value
950are `-pc_bjacobi_blocks` `n` and `-pc_bgs_blocks` `n`, and,
951within a program, the corresponding routines are
952
953```
954PCBJacobiSetTotalBlocks(PC pc,PetscInt blocks,PetscInt *size);
955PCASMSetTotalSubdomains(PC pc,PetscInt n,IS *is,IS *islocal);
956PCASMSetType(PC pc,PCASMType type);
957```
958
959The optional argument `size` is an array indicating the size of each
960block. Currently, for certain parallel matrix formats, only a single
961block per process is supported. However, the `MATMPIAIJ` and
962`MATMPIBAIJ` formats support the use of general blocks as long as no
963blocks are shared among processes. The `is` argument contains the
964index sets that define the subdomains.
965
966The object `PCASMType` is one of `PC_ASM_BASIC`,
967`PC_ASM_INTERPOLATE`, `PC_ASM_RESTRICT`, or `PC_ASM_NONE` and may
968also be set with the options database `-pc_asm_type` `[basic`,
969`interpolate`, `restrict`, `none]`. The type `PC_ASM_BASIC` (or
970`-pc_asm_type` `basic`) corresponds to the standard additive Schwarz
971method that uses the full restriction and interpolation operators. The
972type `PC_ASM_RESTRICT` (or `-pc_asm_type` `restrict`) uses a full
973restriction operator, but during the interpolation process ignores the
974off-process values. Similarly, `PC_ASM_INTERPOLATE` (or
975`-pc_asm_type` `interpolate`) uses a limited restriction process in
976conjunction with a full interpolation, while `PC_ASM_NONE` (or
977`-pc_asm_type` `none`) ignores off-process values for both
978restriction and interpolation. The ASM types with limited restriction or
979interpolation were suggested by Xiao-Chuan Cai and Marcus Sarkis
980{cite}`cs99`. `PC_ASM_RESTRICT` is the PETSc default, as
981it saves substantial communication and for many problems has the added
982benefit of requiring fewer iterations for convergence than the standard
983additive Schwarz method.
984
985The user can also set the number of blocks and sizes on a per-process
986basis with the commands
987
988```
989PCBJacobiSetLocalBlocks(PC pc,PetscInt blocks,PetscInt *size);
990PCASMSetLocalSubdomains(PC pc,PetscInt N,IS *is,IS *islocal);
991```
992
993For the ASM preconditioner one can use the following command to set the
994overlap to compute in constructing the subdomains.
995
996```
997PCASMSetOverlap(PC pc,PetscInt overlap);
998```
999
1000The overlap defaults to 1, so if one desires that no additional overlap
1001be computed beyond what may have been set with a call to
1002`PCASMSetTotalSubdomains()` or `PCASMSetLocalSubdomains()`, then
1003`overlap` must be set to be 0. In particular, if one does *not*
1004explicitly set the subdomains in an application code, then all overlap
1005would be computed internally by PETSc, and using an overlap of 0 would
1006result in an ASM variant that is equivalent to the block Jacobi
1007preconditioner. Note that one can define initial index sets `is` with
1008*any* overlap via `PCASMSetTotalSubdomains()` or
1009`PCASMSetLocalSubdomains()`; the routine `PCASMSetOverlap()` merely
1010allows PETSc to extend that overlap further if desired.
1011
1012`PCGASM` is a generalization of `PCASM` that allows
1013the user to specify subdomains that span multiple MPI processes. This can be
1014useful for problems where small subdomains result in poor convergence.
1015To be effective, the multi-processor subproblems must be solved using a
1016sufficiently strong subsolver, such as `PCLU`, for which `SuperLU_DIST` or a
1017similar parallel direct solver could be used; other choices may include
1018a multigrid solver on the subdomains.
1019
1020The interface for `PCGASM` is similar to that of `PCASM`. In
1021particular, `PCGASMType` is one of `PC_GASM_BASIC`,
1022`PC_GASM_INTERPOLATE`, `PC_GASM_RESTRICT`, `PC_GASM_NONE`. These
1023options have the same meaning as with `PCASM` and may also be set with
1024the options database `-pc_gasm_type` `[basic`, `interpolate`,
1025`restrict`, `none]`.
1026
1027Unlike `PCASM`, however, `PCGASM` allows the user to define
1028subdomains that span multiple MPI processes. The simplest way to do this is
1029using a call to `PCGASMSetTotalSubdomains(PC pc,PetscInt N)` with
1030the total number of subdomains `N` that is smaller than the MPI
1031communicator `size`. In this case `PCGASM` will coalesce `size/N`
1032consecutive single-rank subdomains into a single multi-rank subdomain.
1033The single-rank subdomains contain the degrees of freedom corresponding
1034to the locally-owned rows of the `PCGASM` matrix used to compute the preconditioner –
1035these are the subdomains `PCASM` and `PCGASM` use by default.
1036
1037Each of the multirank subdomain subproblems is defined on the
1038subcommunicator that contains the coalesced `PCGASM` processes. In general
1039this might not result in a very good subproblem if the single-rank
1040problems corresponding to the coalesced processes are not very strongly
1041connected. In the future this will be addressed with a hierarchical
1042partitioner that generates well-connected coarse subdomains first before
1043subpartitioning them into the single-rank subdomains.
1044
1045In the meantime the user can provide his or her own multi-rank
1046subdomains by calling `PCGASMSetSubdomains(PC,IS[],IS[])` where each
1047of the `IS` objects on the list defines the inner (without the
1048overlap) or the outer (including the overlap) subdomain on the
1049subcommunicator of the `IS` object. A helper subroutine
1050`PCGASMCreateSubdomains2D()` is similar to PCASM’s but is capable of
1051constructing multi-rank subdomains that can be then used with
1052`PCGASMSetSubdomains()`. An alternative way of creating multi-rank
1053subdomains is by using the underlying `DM` object, if it is capable of
1054generating such decompositions via `DMCreateDomainDecomposition()`.
1055Ordinarily the decomposition specified by the user via
1056`PCGASMSetSubdomains()` takes precedence, unless
1057`PCGASMSetUseDMSubdomains()` instructs `PCGASM` to prefer
1058`DM`-created decompositions.
1059
1060Currently there is no support for increasing the overlap of multi-rank
1061subdomains via `PCGASMSetOverlap()` – this functionality works only
1062for subdomains that fit within a single MPI process, exactly as in
1063`PCASM`.
1064
1065Examples of the described `PCGASM` usage can be found in
1066<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex62.c.html">KSP Tutorial ex62</a>.
1067In particular, `runex62_superlu_dist` illustrates the use of
1068`SuperLU_DIST` as the subdomain solver on coalesced multi-rank
1069subdomains. The `runex62_2D_*` examples illustrate the use of
1070`PCGASMCreateSubdomains2D()`.
1071
1072(sec_amg)=
1073
1074### Algebraic Multigrid (AMG) Preconditioners
1075
1076PETSc has a native algebraic multigrid preconditioner `PCGAMG` –
1077*gamg* – and interfaces to three external AMG packages: *hypre*, *ML*
1078and *AMGx* (CUDA platforms only) that can be downloaded in the
1079configuration phase (e.g., `--download-hypre` ) and used by
1080specifying that command line parameter (e.g., `-pc_type hypre`).
1081*Hypre* is relatively monolithic in that a PETSc matrix is converted into a hypre
1082matrix, and then *hypre* is called to solve the entire problem. *ML* is more
1083modular because PETSc only has *ML* generate the coarse grid spaces
1084(columns of the prolongation operator), which is the core of an AMG method,
1085and then constructs a `PCMG` with Galerkin coarse grid operator
1086construction. `PCGAMG` is designed from the beginning to be modular, to
1087allow for new components to be added easily and also populates a
1088multigrid preconditioner `PCMG` so generic multigrid parameters are
1089used (see {any}`sec_mg`). PETSc provides a fully supported (smoothed) aggregation AMG, but supports the addition of new methods
1090(`-pc_type gamg -pc_gamg_type agg` or `PCSetType(pc,PCGAMG)` and
1091`PCGAMGSetType(pc, PCGAMGAGG)`. Examples of extension are reference implementations of
1092a classical AMG method (`-pc_gamg_type classical`), a (2D) hybrid geometric
1093AMG method (`-pc_gamg_type geo`) that are not supported. A 2.5D AMG method DofColumns
1094{cite}`isaacstadlerghattas2015` supports 2D coarsenings extruded in the third dimension. `PCGAMG` does require the use
1095of `MATAIJ` matrices. For instance, `MATBAIJ` matrices are not supported. One
1096can use `MATAIJ` instead of `MATBAIJ` without changing any code other than the
1097constructor (or the `-mat_type` from the command line). For instance,
1098`MatSetValuesBlocked` works with `MATAIJ` matrices.
1099
1100**Important parameters for PCGAMGAGG**
1101
1102- Control the generation of the coarse grid
1103
1104  > - `-pc_gamg_aggressive_coarsening` \<n:int:1> Use aggressive coarsening on the finest n levels to construct the coarser mesh.
1105  >   See `PCGAMGAGGSetNSmooths()`. The larger value produces a faster preconditioner to create and solve, but the convergence may be slower.
1106  > - `-pc_gamg_low_memory_threshold_filter` \<bool:false> Filter small matrix entries before coarsening the mesh.
1107  >   See `PCGAMGSetLowMemoryFilter()`.
1108  > - `-pc_gamg_threshold` \<tol:real:0.0> The threshold of small values to drop when `-pc_gamg_low_memory_threshold_filter` is used. A
1109  >   negative value means keeping even the locations with 0.0. See `PCGAMGSetThreshold()`
1110  > - `-pc_gamg_threshold_scale` \<v>:real:1.0> Set a scale factor applied to each coarser level when `-pc_gamg_low_memory_threshold_filter` is used.
1111  >   See `PCGAMGSetThresholdScale()`.
1112  > - `-pc_gamg_mat_coarsen_type` \<mis|hem|misk:misk> Algorithm used to coarsen the matrix graph. See `MatCoarsenSetType()`.
1113  > - `-pc_gamg_mat_coarsen_max_it` \<it:int:4> Maximum HEM iterations to use. See `MatCoarsenSetMaximumIterations()`.
1114  > - `-pc_gamg_aggressive_mis_k` \<k:int:2> k distance in MIS coarsening (>2 is 'aggressive') to use in coarsening.
1115  >   See `PCGAMGMISkSetAggressive()`. The larger value produces a preconditioner that is faster to create and solve with but the convergence may be slower.
1116  >   This option and the previous option work to determine how aggressively the grids are coarsened.
1117  > - `-pc_gamg_mis_k_minimum_degree_ordering` \<bool:false> Use a minimum degree ordering in the greedy MIS algorithm used to coarsen.
1118  >   See `PCGAMGMISkSetMinDegreeOrdering()`
1119
1120- Control the generation of the prolongation for `PCGAMGAGG`
1121
1122  > - `-pc_gamg_agg_nsmooths` \<n:int:1> Number of smoothing steps to be used in constructing the prolongation. For symmetric problems,
1123  >   generally, one or more is best. For some strongly nonsymmetric problems, 0 may be best. See `PCGAMGSetNSmooths()`.
1124
1125- Control the amount of parallelism on the levels
1126
1127  > - `-pc_gamg_process_eq_limit` \<n:int:50> Sets the minimum number of equations allowed per process when coarsening (otherwise, fewer MPI processes
1128  >   are used for the coarser mesh). A larger value will cause the coarser problems to be run on fewer MPI processes, resulting
1129  >   in less communication and possibly a faster time to solution. See `PCGAMGSetProcEqLim()`.
1130  >
1131  > - `-pc_gamg_rank_reduction_factors` \<rn,rn-1,...,r1:int> Set a schedule for MPI rank reduction on coarse grids. `See PCGAMGSetRankReductionFactors()`
1132  >   This overrides the lessening of processes that would arise from `-pc_gamg_process_eq_limit`.
1133  >
1134  > - `-pc_gamg_repartition` \<bool:false> Run a partitioner on each coarser mesh generated rather than using the default partition arising from the
1135  >   finer mesh. See `PCGAMGSetRepartition()`. This increases the preconditioner setup time but will result in less time per
1136  >   iteration of the solver.
1137  >
1138  > - `-pc_gamg_parallel_coarse_grid_solver` \<bool:false> Allow the coarse grid solve to run in parallel, depending on the value of `-pc_gamg_coarse_eq_limit`.
1139  >   See `PCGAMGSetParallelCoarseGridSolve()`. If the coarse grid problem is large then this can
1140  >   improve the time to solution.
1141  >
1142  >   - `-pc_gamg_coarse_eq_limit` \<n:int:50> Sets the minimum number of equations allowed per process on the coarsest level when coarsening
1143  >     (otherwise fewer MPI processes will be used). A larger value will cause the coarse problems to be run on fewer MPI processes.
1144  >     This only applies if `-pc_gamg_parallel_coarse_grid_solver` is set to true. See `PCGAMGSetCoarseEqLim()`.
1145
1146- Control the smoothers
1147
1148  > - `-pc_mg_levels` \<n:int> Set the maximum number of levels to use.
1149  > - `-mg_levels_ksp_type` \<KSPType:chebyshev> If `KSPCHEBYSHEV` or `KSPRICHARDSON` is not used, then the Krylov
1150  >   method for the entire multigrid solve has to be a flexible method such as `KSPFGMRES`. Generally, the
1151  >   stronger the Krylov method the faster the convergence, but with more cost per iteration. See `KSPSetType()`.
1152  > - `-mg_levels_ksp_max_it` \<its:int:2> Sets the number of iterations to run the smoother on each level. Generally, the more iterations
1153  >   , the faster the convergence, but with more cost per multigrid iteration. See `PCMGSetNumberSmooth()`.
1154  > - `-mg_levels_ksp_xxx` Sets options for the `KSP` in the smoother on the levels.
1155  > - `-mg_levels_pc_type` \<PCType:jacobi> Sets the smoother to use on each level. See `PCSetType()`. Generally, the
1156  >   stronger the preconditioner the faster the convergence, but with more cost per iteration.
1157  > - `-mg_levels_pc_xxx` Sets options for the `PC` in the smoother on the levels.
1158  > - `-mg_coarse_ksp_type` \<KSPType:none> Sets the solver `KSPType` to use on the coarsest level.
1159  > - `-mg_coarse_pc_type` \<PCType:lu> Sets the solver `PCType` to use on the coarsest level.
1160  > - `-pc_gamg_asm_use_agg` \<bool:false> Use `PCASM` as the smoother on each level with the aggregates defined by the coarsening process are
1161  >   the subdomains. This option automatically switches the smoother on the levels to be `PCASM`.
1162  > - `-mg_levels_pc_asm_overlap` \<n:int:0> Use non-zero overlap with `-pc_gamg_asm_use_agg`. See `PCASMSetOverlap()`.
1163
1164- Control the multigrid algorithm
1165
1166  > - `-pc_mg_type` \<additive|multiplicative|full|kaskade:multiplicative> The type of multigrid to use. Usually, multiplicative is the fastest.
1167  > - `-pc_mg_cycle_type` \<v|w:v> Use V- or W-cycle with `-pc_mg_type` `multiplicative`
1168
1169`PCGAMG` provides unsmoothed aggregation (`-pc_gamg_agg_nsmooths 0`) and
1170smoothed aggregation (`-pc_gamg_agg_nsmooths 1` or
1171`PCGAMGSetNSmooths(pc,1)`). Smoothed aggregation (SA), {cite}`vanek1996algebraic`, {cite}`vanek2001convergence`, is recommended
1172for symmetric positive definite systems. Unsmoothed aggregation can be
1173useful for asymmetric problems and problems where the highest eigenestimates are problematic. If poor convergence rates are observed using
1174the smoothed version, one can test unsmoothed aggregation.
1175
1176**Eigenvalue estimates:** The parameters for the KSP eigen estimator,
1177used for SA, can be set with `-pc_gamg_esteig_ksp_max_it` and
1178`-pc_gamg_esteig_ksp_type`. For example, CG generally converges to the
1179highest eigenvalue faster than GMRES (the default for KSP) if your problem
1180is symmetric positive definite. One can specify CG with
1181`-pc_gamg_esteig_ksp_type cg`. The default for
1182`-pc_gamg_esteig_ksp_max_it` is 10, which we have found is pretty safe
1183with a (default) safety factor of 1.1. One can specify the range of real
1184eigenvalues in the same way as with Chebyshev KSP solvers
1185(smoothers), with `-pc_gamg_eigenvalues <emin,emax>`. GAMG sets the MG
1186smoother type to chebyshev by default. By default, GAMG uses its eigen
1187estimate, if it has one, for Chebyshev smoothers if the smoother uses
1188Jacobi preconditioning. This can be overridden with
1189`-pc_gamg_use_sa_esteig  <true,false>`.
1190
1191AMG methods require knowledge of the number of degrees of freedom per
1192vertex; the default is one (a scalar problem). Vector problems like
1193elasticity should set the block size of the matrix appropriately with
1194`-mat_block_size bs` or `MatSetBlockSize(mat,bs)`. Equations must be
1195ordered in “vertex-major” ordering (e.g.,
1196$x_1,y_1,z_1,x_2,y_2,...$).
1197
1198**Near null space:** Smoothed aggregation requires an explicit
1199representation of the (near) null space of the operator for optimal
1200performance. One can provide an orthonormal set of null space vectors
1201with `MatSetNearNullSpace()`. The vector of all ones is the default
1202for each variable given by the block size (e.g., the translational rigid
1203body modes). For elasticity, where rotational rigid body modes are
1204required to complete the near null-space you can use
1205`MatNullSpaceCreateRigidBody()` to create the null space vectors and
1206then `MatSetNearNullSpace()`.
1207
1208**Coarse grid data model:** The GAMG framework provides for reducing the
1209number of active processes on coarse grids to reduce communication costs
1210when there is not enough parallelism to keep relative communication
1211costs down. Most AMG solvers reduce to just one active process on the
1212coarsest grid (the PETSc MG framework also supports redundantly solving
1213the coarse grid on all processes to reduce communication
1214costs potentially). However, this forcing to one process can be overridden if one
1215wishes to use a parallel coarse grid solver. GAMG generalizes this by
1216reducing the active number of processes on other coarse grids.
1217GAMG will select the number of active processors by fitting the desired
1218number of equations per process (set with
1219`-pc_gamg_process_eq_limit <50>,`) at each level given that size of
1220each level. If $P_i < P$ processors are desired on a level
1221$i$, then the first $P_i$ processes are populated with the grid
1222and the remaining are empty on that grid. One can, and probably should,
1223repartition the coarse grids with `-pc_gamg_repartition <true>`,
1224otherwise an integer process reduction factor ($q$) is selected
1225and the equations on the first $q$ processes are moved to process
12260, and so on. As mentioned, multigrid generally coarsens the problem
1227until it is small enough to be solved with an exact solver (e.g., LU or
1228SVD) in a relatively short time. GAMG will stop coarsening when the
1229number of the equation on a grid falls below the threshold given by
1230`-pc_gamg_coarse_eq_limit <50>,`.
1231
1232**Coarse grid parameters:** There are several options to provide
1233parameters to the coarsening algorithm and parallel data layout. Run a
1234code using `PCGAMG` with `-help` to get a full listing of GAMG
1235parameters with short descriptions. The rate of coarsening is
1236critical in AMG performance – too slow coarsening will result in an
1237overly expensive solver per iteration and too fast coarsening will
1238result in decrease in the convergence rate. `-pc_gamg_threshold <-1>`
1239and `-pc_gamg_aggressive_coarsening <N>` are the primary parameters that
1240control coarsening rates, which is very important for AMG performance. A
1241greedy maximal independent set (MIS) algorithm is used in coarsening.
1242Squaring the graph implements MIS-2; the root vertex in an
1243aggregate is more than two edges away from another root vertex instead
1244of more than one in MIS. The threshold parameter sets a normalized
1245threshold for which edges are removed from the MIS graph, thereby
1246coarsening slower. Zero will keep all non-zero edges, a negative number
1247will keep zero edges, and a positive number will drop small edges. Typical
1248finite threshold values are in the range of $0.01 - 0.05$. There
1249are additional parameters for changing the weights on coarse grids.
1250
1251The parallel MIS algorithms require symmetric weights/matrices. Thus `PCGAMG`
1252will automatically make the graph symmetric if it is not symmetric. Since this
1253has additional cost, users should indicate the symmetry of the matrices they
1254provide by calling
1255
1256```
1257MatSetOption(mat,MAT_SYMMETRIC,PETSC_TRUE (or PETSC_FALSE))
1258```
1259
1260or
1261
1262```
1263MatSetOption(mat,MAT_STRUCTURALLY_SYMMETRIC,PETSC_TRUE (or PETSC_FALSE)).
1264```
1265
1266If they know that the matrix will always have symmetry despite future changes
1267to the matrix (with, for example, `MatSetValues()`) then they should also call
1268
1269```
1270MatSetOption(mat,MAT_SYMMETRY_ETERNAL,PETSC_TRUE (or PETSC_FALSE))
1271```
1272
1273or
1274
1275```
1276MatSetOption(mat,MAT_STRUCTURAL_SYMMETRY_ETERNAL,PETSC_TRUE (or PETSC_FALSE)).
1277```
1278
1279Using this information allows the algorithm to skip unnecessary computations.
1280
1281**Troubleshooting algebraic multigrid methods:** If `PCGAMG`, *ML*, *AMGx* or
1282*hypre* does not perform well; the first thing to try is one of the other
1283methods. Often, the default parameters or just the strengths of different
1284algorithms can fix performance problems or provide useful information to
1285guide further debugging. There are several sources of poor performance
1286of AMG solvers and often special purpose methods must be developed to
1287achieve the full potential of multigrid. To name just a few sources of
1288performance degradation that may not be fixed with parameters in PETSc
1289currently: non-elliptic operators, curl/curl operators, highly stretched
1290grids or highly anisotropic problems, large jumps in material
1291coefficients with complex geometry (AMG is particularly well suited to
1292jumps in coefficients, but it is not a perfect solution), highly
1293incompressible elasticity, not to mention ill-posed problems and many
1294others. For Grad-Div and Curl-Curl operators, you may want to try the
1295Auxiliary-space Maxwell Solver (AMS,
1296`-pc_type hypre -pc_hypre_type ams`) or the Auxiliary-space Divergence
1297Solver (ADS, `-pc_type hypre -pc_hypre_type ads`) solvers. These
1298solvers need some additional information on the underlying mesh;
1299specifically, AMS needs the discrete gradient operator, which can be
1300specified via `PCHYPRESetDiscreteGradient()`. In addition to the
1301discrete gradient, ADS also needs the specification of the discrete curl
1302operator, which can be set using `PCHYPRESetDiscreteCurl()`.
1303
1304**I am converging slowly, what do I do?** AMG methods are sensitive to
1305coarsening rates and methods; for GAMG use `-pc_gamg_threshold <x>`
1306or `PCGAMGSetThreshold()` to regulate coarsening rates; higher values decrease
1307the coarsening rate. A high threshold (e.g., $x=0.08$) will result in an
1308expensive but potentially powerful preconditioner, and a low threshold
1309(e.g., $x=0.0$) will result in faster coarsening, fewer levels,
1310cheaper solves, and generally worse convergence rates.
1311
1312Aggressive_coarsening is the second mechanism for
1313increasing the coarsening rate and thereby decreasing the cost of the
1314coarse grids and generally decreasing the solver convergence rate.
1315Use `-pc_gamg_aggressive_coarsening <N>`, or
1316`PCGAMGSetAggressiveLevels(pc,N)`, to aggressively coarsen the graph on the finest N
1317levels. The default is $N=1$. There are two options for aggressive coarsening: 1) the
1318default, square graph: use $A^T A$ in the MIS coarsening algorithm and 2) coarsen with MIS-2 (instead
1319of the default of MIS-1). Use `-pc_gamg_aggressive_square_graph false`
1320to use MIS-k coarsening and `-pc_gamg_aggressive_mis_k k` to select
1321the level of MIS other than the default $k=2$.
1322The square graph approach seems to coarsen slower, which results in
1323larger coarse grids and is more expensive, but generally improves
1324the convergence rate.
1325If the coarse grids are expensive to compute, and use a lot of memory,
1326using MIS-2 is a good alternative (setting MIS-1 effectively turns
1327aggressive coarsening off). Note that MIS-3 is also supported.
1328
1329One can run with `-info :pc` and grep for `PCGAMG` to get statistics on
1330each level, which can be used to see if you are coarsening at an
1331appropriate rate. With smoothed aggregation, you generally want to coarse
1332at about a rate of 3:1 in each dimension. Coarsening too slowly will
1333result in large numbers of non-zeros per row on coarse grids (this is
1334reported). The number of non-zeros can go up very high, say about 300
1335(times the degrees of freedom per vertex) on a 3D hex mesh. One can also
1336look at the grid complexity, which is also reported (the ratio of the
1337total number of matrix entries for all levels to the number of matrix
1338entries on the fine level). Grid complexity should be well under 2.0 and
1339preferably around $1.3$ or lower. If convergence is poor and the
1340Galerkin coarse grid construction is much smaller than the time for each
1341solve, one can safely decrease the coarsening rate.
1342`-pc_gamg_threshold` $-1.0$ is the simplest and most robust
1343option and is recommended if poor convergence rates are observed, at
1344least until the source of the problem is discovered. In conclusion, decreasing the coarsening rate (increasing the
1345threshold) should be tried if convergence is slow.
1346
1347**A note on Chebyshev smoothers.** Chebyshev solvers are attractive as
1348multigrid smoothers because they can target a specific interval of the
1349spectrum, which is the purpose of a smoother. The spectral bounds for
1350Chebyshev solvers are simple to compute because they rely on the highest
1351eigenvalue of your (diagonally preconditioned) operator, which is
1352conceptually simple to compute. However, if this highest eigenvalue
1353estimate is not accurate (too low), the solvers can fail with an
1354indefinite preconditioner message. One can run with `-info` and grep
1355for `PCGAMG` to get these estimates or use `-ksp_view`. These highest
1356eigenvalues are generally between 1.5-3.0. For symmetric positive
1357definite systems, CG is a better eigenvalue estimator
1358`-mg_levels_esteig_ksp_type cg`. Bad Eigen estimates often cause indefinite matrix messages. Explicitly damped Jacobi or Krylov
1359smoothers can provide an alternative to Chebyshev, and *hypre* has
1360alternative smoothers.
1361
1362**Now, am I solving alright? Can I expect better?** If you find that you
1363are getting nearly one digit in reduction of the residual per iteration
1364and are using a modest number of point smoothing steps (e.g., 1-4
1365iterations of SOR), then you may be fairly close to textbook multigrid
1366efficiency. However, you also need to check the setup costs. This can be
1367determined by running with `-log_view` and check that the time for the
1368Galerkin coarse grid construction (`MatPtAP()`) is not (much) more than
1369the time spent in each solve (`KSPSolve()`). If the `MatPtAP()` time is
1370too large, then one can increase the coarsening rate by decreasing the
1371threshold and using aggressive coarsening
1372(`-pc_gamg_aggressive_coarsening <N>`, squares the graph on the finest N
1373levels). Likewise, if your `MatPtAP()` time is short and your convergence
1374If the rate is not ideal, you could decrease the coarsening rate.
1375
1376PETSc’s AMG solver is a framework for developers to
1377easily add AMG capabilities, like new AMG methods or an AMG component
1378like a matrix triple product. Contact us directly if you are interested
1379in contributing.
1380
1381Using algebraic multigrid as a "standalone" solver is possible but not recommended, as it does not accelerate it with a Krylov method.
1382Use a `KSPType` of `KSPRICHARDSON`
1383(or equivalently `-ksp_type richardson`) to achieve this. Using `KSPPREONLY` will not work since it only applies a single multigrid cycle.
1384
1385#### Adaptive Interpolation
1386
1387**Interpolation** transfers a function from the coarse space to the fine space. We would like this process to be accurate for the functions resolved by the coarse grid, in particular the approximate solution computed there. By default, we create these matrices using local interpolation of the fine grid dual basis functions in the coarse basis. However, an adaptive procedure can optimize the coefficients of the interpolator to reproduce pairs of coarse/fine functions which should approximate the lowest modes of the generalized eigenproblem
1388
1389$$
1390A x = \lambda M x
1391$$
1392
1393where $A$ is the system matrix and $M$ is the smoother. Note that for defect-correction MG, the interpolated solution from the coarse space need not be as accurate as the fine solution, for the same reason that updates in iterative refinement can be less accurate. However, in FAS or in the final interpolation step for each level of Full Multigrid, we must have interpolation as accurate as the fine solution since we are moving the entire solution itself.
1394
1395**Injection** should accurately transfer the fine solution to the coarse grid. Accuracy here means that the action of a coarse dual function on either should produce approximately the same result. In the structured grid case, this means that we just use the same values on coarse points. This can result in aliasing.
1396
1397**Restriction** is intended to transfer the fine residual to the coarse space. Here we use averaging (often the transpose of the interpolation operation) to damp out the fine space contributions. Thus, it is less accurate than injection, but avoids aliasing of the high modes.
1398
1399For a multigrid cycle, the interpolator $P$ is intended to accurately reproduce "smooth" functions from the coarse space in the fine space, keeping the energy of the interpolant about the same. For the Laplacian on a structured mesh, it is easy to determine what these low-frequency functions are. They are the Fourier modes. However an arbitrary operator $A$ will have different coarse modes that we want to resolve accurately on the fine grid, so that our coarse solve produces a good guess for the fine problem. How do we make sure that our interpolator $P$ can do this?
1400
1401We first must decide what we mean by accurate interpolation of some functions. Suppose we know the continuum function $f$ that we care about, and we are only interested in a finite element description of discrete functions. Then the coarse function representing $f$ is given by
1402
1403$$
1404f^C = \sum_i f^C_i \phi^C_i,
1405$$
1406
1407and similarly the fine grid form is
1408
1409$$
1410f^F = \sum_i f^F_i \phi^F_i.
1411$$
1412
1413Now we would like the interpolant of the coarse representer to the fine grid to be as close as possible to the fine representer in a least squares sense, meaning we want to solve the minimization problem
1414
1415$$
1416\min_{P} \| f^F - P f^C \|_2
1417$$
1418
1419Now we can express $P$ as a matrix by looking at the matrix elements $P_{ij} = \phi^F_i P \phi^C_j$. Then we have
1420
1421$$
1422\begin{aligned}
1423  &\phi^F_i f^F - \phi^F_i P f^C \\
1424= &f^F_i - \sum_j P_{ij} f^C_j
1425\end{aligned}
1426$$
1427
1428so that our discrete optimization problem is
1429
1430$$
1431\min_{P_{ij}} \| f^F_i - \sum_j P_{ij} f^C_j \|_2
1432$$
1433
1434and we will treat each row of the interpolator as a separate optimization problem. We could allow an arbitrary sparsity pattern, or try to determine adaptively, as is done in sparse approximate inverse preconditioning. However, we know the supports of the basis functions in finite elements, and thus the naive sparsity pattern from local interpolation can be used.
1435
1436We note here that the BAMG framework of Brannick et al. {cite}`brandtbrannickkahllivshits2011` does not use fine and coarse functions spaces, but rather a fine point/coarse point division which we will not employ here. Our general PETSc routine should work for both since the input would be the checking set (fine basis coefficients or fine space points) and the approximation set (coarse basis coefficients in the support or coarse points in the sparsity pattern).
1437
1438We can easily solve the above problem using QR factorization. However, there are many smooth functions from the coarse space that we want interpolated accurately, and a single $f$ would not constrain the values $P_{ij}`$ well. Therefore, we will use several functions $\{f_k\}$ in our minimization,
1439
1440$$
1441\begin{aligned}
1442  &\min_{P_{ij}} \sum_k w_k \| f^{F,k}_i - \sum_j P_{ij} f^{C,k}_j \|_2 \\
1443= &\min_{P_{ij}} \sum_k \| \sqrt{w_k} f^{F,k}_i - \sqrt{w_k} \sum_j P_{ij} f^{C,k}_j \|_2 \\
1444= &\min_{P_{ij}} \| W^{1/2} \mathbf{f}^{F}_i - W^{1/2} \mathbf{f}^{C} p_i \|_2
1445\end{aligned}
1446$$
1447
1448where
1449
1450$$
1451\begin{aligned}
1452W         &= \begin{pmatrix} w_0 & & \\ & \ddots & \\ & & w_K \end{pmatrix} \\
1453\mathbf{f}^{F}_i &= \begin{pmatrix} f^{F,0}_i \\ \vdots \\ f^{F,K}_i \end{pmatrix} \\
1454\mathbf{f}^{C}   &= \begin{pmatrix} f^{C,0}_0 & \cdots & f^{C,0}_n \\ \vdots & \ddots &  \vdots \\ f^{C,K}_0 & \cdots & f^{C,K}_n \end{pmatrix} \\
1455p_i       &= \begin{pmatrix} P_{i0} \\ \vdots \\ P_{in} \end{pmatrix}
1456\end{aligned}
1457$$
1458
1459or alternatively
1460
1461$$
1462\begin{aligned}
1463[W]_{kk}     &= w_k \\
1464[f^{F}_i]_k  &= f^{F,k}_i \\
1465[f^{C}]_{kj} &= f^{C,k}_j \\
1466[p_i]_j      &= P_{ij}
1467\end{aligned}
1468$$
1469
1470We thus have a standard least-squares problem
1471
1472$$
1473\min_{P_{ij}} \| b - A x \|_2
1474$$
1475
1476where
1477
1478$$
1479\begin{aligned}
1480A &= W^{1/2} f^{C} \\
1481b &= W^{1/2} f^{F}_i \\
1482x &= p_i
1483\end{aligned}
1484$$
1485
1486which can be solved using LAPACK.
1487
1488We will typically perform this optimization on a multigrid level $l$ when the change in eigenvalue from level $l+1$ is relatively large, meaning
1489
1490$$
1491\frac{|\lambda_l - \lambda_{l+1}|}{|\lambda_l|}.
1492$$
1493
1494This indicates that the generalized eigenvector associated with that eigenvalue was not adequately represented by $P^l_{l+1}`$, and the interpolator should be recomputed.
1495
1496```{raw} html
1497<hr>
1498```
1499
1500### Balancing Domain Decomposition by Constraints
1501
1502PETSc provides the Balancing Domain Decomposition by Constraints (`PCBDDC`)
1503method for preconditioning parallel finite element problems stored in
1504unassembled format (see `MATIS`). `PCBDDC` is a 2-level non-overlapping
1505domain decomposition method which can be easily adapted to different
1506problems and discretizations by means of few user customizations. The
1507application of the preconditioner to a vector consists in the static
1508condensation of the residual at the interior of the subdomains by means
1509of local Dirichlet solves, followed by an additive combination of Neumann
1510local corrections and the solution of a global coupled coarse problem.
1511Command line options for the underlying `KSP` objects are prefixed by
1512`-pc_bddc_dirichlet`, `-pc_bddc_neumann`, and `-pc_bddc_coarse`
1513respectively.
1514
1515The implementation supports any kind of linear system, and
1516assumes a one-to-one mapping between subdomains and MPI processes.
1517Complex numbers are supported as well. For non-symmetric problems, use
1518the runtime option `-pc_bddc_symmetric 0`.
1519
1520Unlike conventional non-overlapping methods that iterates just on the
1521degrees of freedom at the interface between subdomain, `PCBDDC`
1522iterates on the whole set of degrees of freedom, allowing the use of
1523approximate subdomain solvers. When using approximate solvers, the
1524command line switches `-pc_bddc_dirichlet_approximate` and/or
1525`-pc_bddc_neumann_approximate` should be used to inform `PCBDDC`. If
1526any of the local problems is singular, the nullspace of the local
1527operator should be attached to the local matrix via
1528`MatSetNullSpace()`.
1529
1530At the basis of the method there’s the analysis of the connected
1531components of the interface for the detection of vertices, edges and
1532faces equivalence classes. Additional information on the degrees of
1533freedom can be supplied to `PCBDDC` by using the following functions:
1534
1535- `PCBDDCSetDofsSplitting()`
1536- `PCBDDCSetLocalAdjacencyGraph()`
1537- `PCBDDCSetPrimalVerticesLocalIS()`
1538- `PCBDDCSetNeumannBoundaries()`
1539- `PCBDDCSetDirichletBoundaries()`
1540- `PCBDDCSetNeumannBoundariesLocal()`
1541- `PCBDDCSetDirichletBoundariesLocal()`
1542
1543Crucial for the convergence of the iterative process is the
1544specification of the primal constraints to be imposed at the interface
1545between subdomains. `PCBDDC` uses by default vertex continuities and
1546edge arithmetic averages, which are enough for the three-dimensional
1547Poisson problem with constant coefficients. The user can switch on and
1548off the usage of vertices, edges or face constraints by using the
1549command line switches `-pc_bddc_use_vertices`, `-pc_bddc_use_edges`,
1550`-pc_bddc_use_faces`. A customization of the constraints is available
1551by attaching a `MatNullSpace` object to the  matrix used to compute the preconditioner via
1552`MatSetNearNullSpace()`. The vectors of the `MatNullSpace` object
1553should represent the constraints in the form of quadrature rules;
1554quadrature rules for different classes of the interface can be listed in
1555the same vector. The number of vectors of the `MatNullSpace` object
1556corresponds to the maximum number of constraints that can be imposed for
1557each class. Once all the quadrature rules for a given interface class
1558have been extracted, an SVD operation is performed to retain the
1559non-singular modes. As an example, the rigid body modes represent an
1560effective choice for elasticity, even in the almost incompressible case.
1561For particular problems, e.g. edge-based discretization with Nedelec
1562elements, a user defined change of basis of the degrees of freedom can
1563be beneficial for `PCBDDC`; use `PCBDDCSetChangeOfBasisMat()` to
1564customize the change of basis.
1565
1566The `PCBDDC` method is usually robust with respect to jumps in the material
1567parameters aligned with the interface; for PDEs with more than one
1568material parameter you may also consider to use the so-called deluxe
1569scaling, available via the command line switch
1570`-pc_bddc_use_deluxe_scaling`. Other scalings are available, see
1571`PCISSetSubdomainScalingFactor()`,
1572`PCISSetSubdomainDiagonalScaling()` or
1573`PCISSetUseStiffnessScaling()`. However, the convergence properties of
1574the `PCBDDC` method degrades in presence of large jumps in the material
1575coefficients not aligned with the interface; for such cases, PETSc has
1576the capability of adaptively computing the primal constraints. Adaptive
1577selection of constraints could be requested by specifying a threshold
1578value at command line by using `-pc_bddc_adaptive_threshold x`. Valid
1579values for the threshold `x` ranges from 1 to infinity, with smaller
1580values corresponding to more robust preconditioners. For SPD problems in
15812D, or in 3D with only face degrees of freedom (like in the case of
1582Raviart-Thomas or Brezzi-Douglas-Marini elements), such a threshold is a
1583very accurate estimator of the condition number of the resulting
1584preconditioned operator. Since the adaptive selection of constraints for
1585`PCBDDC` methods is still an active topic of research, its implementation is
1586currently limited to SPD problems; moreover, because the technique
1587requires the explicit knowledge of the local Schur complements, it needs
1588the external package MUMPS.
1589
1590When solving problems decomposed in thousands of subdomains or more, the
1591solution of the `PCBDDC` coarse problem could become a bottleneck; in order
1592to overcome this issue, the user could either consider to solve the
1593parallel coarse problem on a subset of the communicator associated with
1594`PCBDDC` by using the command line switch
1595`-pc_bddc_coarse_redistribute`, or instead use a multilevel approach.
1596The latter can be requested by specifying the number of requested level
1597at command line (`-pc_bddc_levels`) or by using `PCBDDCSetLevels()`.
1598An additional parameter (see `PCBDDCSetCoarseningRatio()`) controls
1599the number of subdomains that will be generated at the next level; the
1600larger the coarsening ratio, the lower the number of coarser subdomains.
1601
1602For further details, see the example
1603<a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/ksp/ksp/tutorials/ex59.c">KSP Tutorial ex59</a>
1604and the online documentation for `PCBDDC`.
1605
1606### Shell Preconditioners
1607
1608The shell preconditioner simply uses an application-provided routine to
1609implement the preconditioner. That is, it allows users to write or wrap their
1610own custom preconditioners as a `PC` and use it with `KSP`, etc.
1611
1612To provide a custom preconditioner application, use
1613
1614```
1615PCShellSetApply(PC pc,PetscErrorCode (*apply)(PC,Vec,Vec));
1616```
1617
1618Often a preconditioner needs access to an application-provided data
1619structured. For this, one should use
1620
1621```
1622PCShellSetContext(PC pc, PetscCtx ctx);
1623```
1624
1625to set this data structure and
1626
1627```
1628PCShellGetContext(PC pc, PetscCtxRt ctx);
1629```
1630
1631to retrieve it in `apply`. The three routine arguments of `apply()`
1632are the `PC`, the input vector, and the output vector, respectively.
1633
1634For a preconditioner that requires some sort of “setup” before being
1635used, that requires a new setup every time the operator is changed, one
1636can provide a routine that is called every time the operator is changed
1637(usually via `KSPSetOperators()`).
1638
1639```
1640PCShellSetSetUp(PC pc,PetscErrorCode (*setup)(PC));
1641```
1642
1643The argument to the `setup` routine is the same `PC` object which
1644can be used to obtain the operators with `PCGetOperators()` and the
1645application-provided data structure that was set with
1646`PCShellSetContext()`.
1647
1648(sec_combining_pcs)=
1649
1650### Combining Preconditioners
1651
1652The `PC` type `PCCOMPOSITE` allows one to form new preconditioners
1653by combining already-defined preconditioners and solvers. Combining
1654preconditioners usually requires some experimentation to find a
1655combination of preconditioners that works better than any single method.
1656It is a tricky business and is not recommended until your application
1657code is complete and running and you are trying to improve performance.
1658In many cases using a single preconditioner is better than a
1659combination; an exception is the multigrid/multilevel preconditioners
1660(solvers) that are always combinations of some sort, see {any}`sec_mg`.
1661
1662Let $B_1$ and $B_2$ represent the application of two
1663preconditioners of type `type1` and `type2`. The preconditioner
1664$B = B_1 + B_2$ can be obtained with
1665
1666```
1667PCSetType(pc,PCCOMPOSITE);
1668PCCompositeAddPCType(pc,type1);
1669PCCompositeAddPCType(pc,type2);
1670```
1671
1672Any number of preconditioners may added in this way.
1673
1674This way of combining preconditioners is called additive, since the
1675actions of the preconditioners are added together. This is the default
1676behavior. An alternative can be set with the option
1677
1678```
1679PCCompositeSetType(pc,PC_COMPOSITE_MULTIPLICATIVE);
1680```
1681
1682In this form the new residual is updated after the application of each
1683preconditioner and the next preconditioner applied to the next residual.
1684For example, with two composed preconditioners: $B_1$ and
1685$B_2$; $y = B x$ is obtained from
1686
1687$$
1688\begin{aligned}
1689y    = B_1 x \\
1690w_1  = x - A y \\
1691y    = y + B_2 w_1\end{aligned}
1692$$
1693
1694Loosely, this corresponds to a Gauss-Seidel iteration, while additive
1695corresponds to a Jacobi iteration.
1696
1697Under most circumstances, the multiplicative form requires one-half the
1698number of iterations as the additive form; however, the multiplicative
1699form does require the application of $A$ inside the
1700preconditioner.
1701
1702In the multiplicative version, the calculation of the residual inside
1703the preconditioner can be done in two ways: using the original linear
1704system matrix or using the matrix used to build the preconditioners
1705$B_1$, $B_2$, etc. By default it uses the “preconditioner
1706matrix”, to use the `Amat` matrix use the option
1707
1708```
1709PCSetUseAmat(PC pc);
1710```
1711
1712The individual preconditioners can be accessed (in order to set options)
1713via
1714
1715```
1716PCCompositeGetPC(PC pc,PetscInt count,PC *subpc);
1717```
1718
1719For example, to set the first sub preconditioners to use ILU(1)
1720
1721```
1722PC subpc;
1723PCCompositeGetPC(pc,0,&subpc);
1724PCFactorSetFill(subpc,1);
1725```
1726
1727One can also change the operator that is used to construct a particular
1728`PC` in the composite `PC` calling `PCSetOperators()` on the obtained `PC`.
1729`PCFIELDSPLIT`, {any}`sec_block_matrices`, provides an alternative approach to defining composite preconditioners
1730with a variety of pre-defined compositions.
1731
1732These various options can also be set via the options database. For
1733example, `-pc_type` `composite` `-pc_composite_pcs` `jacobi,ilu`
1734causes the composite preconditioner to be used with two preconditioners:
1735Jacobi and ILU. The option `-pc_composite_type` `multiplicative`
1736initiates the multiplicative version of the algorithm, while
1737`-pc_composite_type` `additive` the additive version. Using the
1738`Amat` matrix is obtained with the option `-pc_use_amat`. One sets
1739options for the sub-preconditioners with the extra prefix `-sub_N_`
1740where `N` is the number of the sub-preconditioner. For example,
1741`-sub_0_pc_ifactor_fill` `0`.
1742
1743PETSc also allows a preconditioner to be a complete `KSPSolve()` linear solver. This
1744is achieved with the `PCKSP` type.
1745
1746```
1747PCSetType(PC pc,PCKSP);
1748PCKSPGetKSP(pc,&ksp);
1749 /* set any KSP/PC options */
1750```
1751
1752From the command line one can use 5 iterations of biCG-stab with ILU(0)
1753preconditioning as the preconditioner with
1754`-pc_type ksp -ksp_pc_type ilu -ksp_ksp_max_it 5 -ksp_ksp_type bcgs`.
1755
1756By default the inner `KSP` solver uses the outer preconditioner
1757matrix, `Pmat`, as the matrix to be solved in the linear system; to
1758use the matrix that defines the linear system, `Amat` use the option
1759
1760```
1761PCSetUseAmat(PC pc);
1762```
1763
1764or at the command line with `-pc_use_amat`.
1765
1766Naturally, one can use a `PCKSP` preconditioner inside a composite
1767preconditioner. For example,
1768`-pc_type composite -pc_composite_pcs ilu,ksp -sub_1_pc_type jacobi -sub_1_ksp_max_it 10`
1769uses two preconditioners: ILU(0) and 10 iterations of GMRES with Jacobi
1770preconditioning. However, it is not clear whether one would ever wish to
1771do such a thing.
1772
1773(sec_mg)=
1774
1775### Multigrid Preconditioners
1776
1777A large suite of routines is available for using geometric multigrid as
1778a preconditioner [^id3]. In the `PCMG` framework, the user is required to
1779provide the coarse grid solver, smoothers, restriction and interpolation
1780operators, and code to calculate residuals. The `PCMG` package allows
1781these components to be encapsulated within a PETSc-compliant
1782preconditioner. We fully support both matrix-free and matrix-based
1783multigrid solvers.
1784
1785A multigrid preconditioner is created with the four commands
1786
1787```
1788KSPCreate(MPI_Comm comm,KSP *ksp);
1789KSPGetPC(KSP ksp,PC *pc);
1790PCSetType(PC pc,PCMG);
1791PCMGSetLevels(pc,PetscInt levels,MPI_Comm *comms);
1792```
1793
1794If the number of levels is not set with `PCMGSetLevels()` or `-pc_mg_levels` and no `DM`
1795has been attached to the `PCMG` with `KSPSetDM()` (or `SNESSetDM()` or `TSSetDM()`), then
1796`PCMG` uses only one level! This is different from the algebraic multigrid methods
1797such as `PCGAMG`, `PCML`, and `PCHYPRE` which internally determine the number of levels
1798to use.
1799
1800A large number of parameters affect the multigrid behavior. The command
1801
1802```
1803PCMGSetType(PC pc,PCMGType mode);
1804```
1805
1806indicates which form of multigrid to apply {cite}`1sbg`.
1807
1808For standard V or W-cycle multigrids, one sets the `mode` to be
1809`PC_MG_MULTIPLICATIVE`; for the additive form (which in certain cases
1810reduces to the BPX method, or additive multilevel Schwarz, or multilevel
1811diagonal scaling), one uses `PC_MG_ADDITIVE` as the `mode`. For a
1812variant of full multigrid, one can use `PC_MG_FULL`, and for the
1813Kaskade algorithm `PC_MG_KASKADE`. For the multiplicative and full
1814multigrid options, one can use a W-cycle by calling
1815
1816```
1817PCMGSetCycleType(PC pc,PCMGCycleType ctype);
1818```
1819
1820with a value of `PC_MG_CYCLE_W` for `ctype`. The commands above can
1821also be set from the options database. The option names are
1822`-pc_mg_type [multiplicative, additive, full, kaskade]`, and
1823`-pc_mg_cycle_type` `<ctype>`.
1824
1825The user can control the amount of smoothing by configuring the solvers
1826on the levels. By default, the up and down smoothers are identical. If
1827separate configuration of up and down smooths is required, it can be
1828requested with the option `-pc_mg_distinct_smoothup` or the routine
1829
1830```
1831PCMGSetDistinctSmoothUp(PC pc);
1832```
1833
1834The multigrid routines, which determine the solvers and
1835interpolation/restriction operators that are used, are mandatory. To set
1836the coarse grid solver, one must call
1837
1838```
1839PCMGGetCoarseSolve(PC pc,KSP *ksp);
1840```
1841
1842and set the appropriate options in `ksp`. Similarly, the smoothers are
1843controlled by first calling
1844
1845```
1846PCMGGetSmoother(PC pc,PetscInt level,KSP *ksp);
1847```
1848
1849and then setting the various options in the `ksp.` For example,
1850
1851```
1852PCMGGetSmoother(pc,1,&ksp);
1853KSPSetOperators(ksp,A1,A1);
1854```
1855
1856sets the matrix that defines the smoother on level 1 of the multigrid.
1857While
1858
1859```
1860PCMGGetSmoother(pc,1,&ksp);
1861KSPGetPC(ksp,&pc);
1862PCSetType(pc,PCSOR);
1863```
1864
1865sets SOR as the smoother to use on level 1.
1866
1867To use a different pre- or postsmoother, one should call the following
1868routines instead.
1869
1870```
1871PCMGGetSmootherUp(PC pc,PetscInt level,KSP *upksp);
1872PCMGGetSmootherDown(PC pc,PetscInt level,KSP *downksp);
1873```
1874
1875Use
1876
1877```
1878PCMGSetInterpolation(PC pc,PetscInt level,Mat P);
1879```
1880
1881and
1882
1883```
1884PCMGSetRestriction(PC pc,PetscInt level,Mat R);
1885```
1886
1887to define the intergrid transfer operations. If only one of these is
1888set, its transpose will be used for the other.
1889
1890It is possible for these interpolation operations to be matrix-free (see
1891{any}`sec_matrixfree`); One should then make
1892sure that these operations are defined for the (matrix-free) matrices
1893passed in. Note that this system is arranged so that if the
1894interpolation is the transpose of the restriction, you can pass the same
1895`mat` argument to both `PCMGSetRestriction()` and
1896`PCMGSetInterpolation()`.
1897
1898On each level except the coarsest, one must also set the routine to
1899compute the residual. The following command suffices:
1900
1901```
1902PCMGSetResidual(PC pc,PetscInt level,PetscErrorCode (*residual)(Mat,Vec,Vec,Vec),Mat mat);
1903```
1904
1905The `residual()` function normally does not need to be set if one’s
1906operator is stored in `Mat` format. In certain circumstances, where it
1907is much cheaper to calculate the residual directly, rather than through
1908the usual formula $b - Ax$, the user may wish to provide an
1909alternative.
1910
1911Finally, the user may provide three work vectors for each level (except
1912on the finest, where only the residual work vector is required). The
1913work vectors are set with the commands
1914
1915```
1916PCMGSetRhs(PC pc,PetscInt level,Vec b);
1917PCMGSetX(PC pc,PetscInt level,Vec x);
1918PCMGSetR(PC pc,PetscInt level,Vec r);
1919```
1920
1921The `PC` references these vectors, so you should call `VecDestroy()`
1922when you are finished with them. If any of these vectors are not
1923provided, the preconditioner will allocate them.
1924
1925One can control the `KSP` and `PC` options used on the various
1926levels (as well as the coarse grid) using the prefix `mg_levels_`
1927(`mg_coarse_` for the coarse grid). For example,
1928`-mg_levels_ksp_type cg` will cause the CG method to be used as the
1929Krylov method for each level. Or
1930`-mg_levels_pc_type ilu -mg_levels_pc_factor_levels 2` will cause the
1931ILU preconditioner to be used on each level with two levels of fill in
1932the incomplete factorization.
1933
1934If `KSPSetDM()` (or `SNESSetDM()` or `TSSetDM()`) has been called, then `PCMG` will use geometric
1935information from the `DM` to construct the multigrid hierarchy automatically. In this case one
1936does not need to call the various `PCMGSet` routines listed above.
1937
1938(sec_block_matrices)=
1939
1940## Solving Block Matrices with PCFIELDSPLIT
1941
1942Block matrices represent an important class of problems in numerical
1943linear algebra and offer the possibility of far more efficient iterative
1944solvers than just treating the entire matrix as a black box. In this
1945section, we use the common linear algebra definition of block matrices, where matrices are divided into a small, problem-size independent (two,
1946three, or so) number of very large blocks. These blocks arise naturally
1947from the underlying physics or discretization of the problem, such as the velocity and pressure. Under a certain numbering of
1948unknowns, the matrix can be written as
1949
1950$$
1951\left( \begin{array}{cccc}
1952A_{00}   & A_{01} & A_{02} & A_{03} \\
1953A_{10}   & A_{11} & A_{12} & A_{13} \\
1954A_{20}   & A_{21} & A_{22} & A_{23} \\
1955A_{30}   & A_{31} & A_{32} & A_{33} \\
1956\end{array} \right),
1957$$
1958
1959where each $A_{ij}$ is an entire block. The matrices on a parallel computer are not explicitly stored this way. Instead, each process will
1960own some rows of $A_{0*}$, $A_{1*}$ etc. On a
1961process, the blocks may be stored in one block followed by another
1962
1963$$
1964\left( \begin{array}{ccccccc}
1965A_{{00}_{00}}   & A_{{00}_{01}} & A_{{00}_{02}} & ... & A_{{01}_{00}} & A_{{01}_{01}} & ...  \\
1966A_{{00}_{10}}   & A_{{00}_{11}} & A_{{00}_{12}} & ... & A_{{01}_{10}} & A_{{01}_{11}} & ... \\
1967A_{{00}_{20}}   & A_{{00}_{21}} & A_{{00}_{22}} & ... & A_{{01}_{20}} & A_{{01}_{21}}  & ...\\
1968... \\
1969A_{{10}_{00}}   & A_{{10}_{01}} & A_{{10}_{02}} & ... & A_{{11}_{00}} & A_{{11}_{01}}  & ... \\
1970A_{{10}_{10}}   & A_{{10}_{11}} & A_{{10}_{12}} & ... & A_{{11}_{10}} & A_{{11}_{11}}  & ... \\
1971... \\
1972\end{array} \right)
1973$$
1974
1975or interlaced, for example, with four blocks
1976
1977$$
1978\left( \begin{array}{ccccc}
1979A_{{00}_{00}}   & A_{{01}_{00}} &  A_{{00}_{01}} & A_{{01}_{01}} &  ... \\
1980A_{{10}_{00}}   & A_{{11}_{00}} &  A_{{10}_{01}} & A_{{11}_{01}} &  ... \\
1981A_{{00}_{10}}   & A_{{01}_{10}} & A_{{00}_{11}} & A_{{01}_{11}} & ...\\
1982A_{{10}_{10}}   & A_{{11}_{10}} & A_{{10}_{11}} & A_{{11}_{11}} & ...\\
1983...
1984\end{array} \right).
1985$$
1986
1987Note that for interlaced storage, the number of rows/columns of each
1988block must be the same size. Matrices obtained with `DMCreateMatrix()`
1989where the `DM` is a `DMDA` are always stored interlaced. Block
1990matrices can also be stored using the `MATNEST` format, which holds
1991separate assembled blocks. Each of these nested matrices is itself
1992distributed in parallel. It is more efficient to use `MATNEST` with
1993the methods described in this section because there are fewer copies and
1994better formats (e.g., `MATBAIJ` or `MATSBAIJ`) can be used for the
1995components, but it is not possible to use many other methods with
1996`MATNEST`. See {any}`sec_matnest` for more on assembling
1997block matrices without depending on a specific matrix format.
1998
1999The PETSc `PCFIELDSPLIT` preconditioner implements the
2000“block” solvers in PETSc, {cite}`elman2008tcp`. There are three ways to provide the
2001information that defines the blocks. If the matrices are stored as
2002interlaced then `PCFieldSplitSetFields()` can be called repeatedly to
2003indicate which fields belong to each block. More generally
2004`PCFieldSplitSetIS()` can be used to indicate exactly which
2005rows/columns of the matrix belong to a particular block (field). You can provide
2006names for each block with these routines; if you do not, they are numbered from 0. With these two approaches, the blocks may
2007overlap (though they generally will not overlap). If only one block is defined,
2008then the complement of the matrices is used to define the other block.
2009Finally, the option `-pc_fieldsplit_detect_saddle_point` causes two
2010diagonal blocks to be found, one associated with all rows/columns that
2011have zeros on the diagonals and the rest.
2012
2013**Important parameters for PCFIELDSPLIT**
2014
2015- Control the fields used
2016
2017  - `-pc_fieldsplit_detect_saddle_point` \<bool:false> Generate two fields, the first consists of all rows with a nonzero on the diagonal, and the second will be all rows
2018    with zero on the diagonal. See `PCFieldSplitSetDetectSaddlePoint()`.
2019
2020  - `-pc_fieldsplit_dm_splits` \<bool:true> Use the `DM` attached to the preconditioner to determine the fields. See `PCFieldSplitSetDMSplits()` and
2021    `DMCreateFieldDecomposition()`.
2022
2023  - `-pc_fieldsplit_%d_fields` \<f1,f2,...:int> Use f1, f2, .. to define field `d`. The `fn` are in the range of 0, ..., bs-1 where bs is the block size
2024    of the matrix or set with `PCFieldSplitSetBlockSize()`. See `PCFieldSplitSetFields()`.
2025
2026    - `-pc_fieldsplit_default` \<bool:true> Automatically add any fields needed that have not been supplied explicitly by `-pc_fieldsplit_%d_fields`.
2027
2028  - `DMFieldsplitSetIS()` Provide the `IS` that defines a particular field.
2029
2030- Control the type of the block preconditioner
2031
2032  - `-pc_fieldsplit_type` \<additive|multiplicative|symmetric_multiplicative|schur|gkb:multiplicative> The order in which the field solves are applied.
2033    For symmetric problems where `KSPCG` is used `symmetric_multiplicative` must be used instead of `multiplicative`. `additive` is the least expensive
2034    to apply but provides the worst convergence. `schur` requires either a good preconditioner for the Schur complement or a naturally well-conditioned
2035    Schur complement, but when it works well can be extremely effective. See `PCFieldSplitSetType()`. `gkb` is for symmetric saddle-point problems (the lower-right
2036    the block is zero).
2037
2038  - `-pc_fieldsplit_diag_use_amat` \<bool:false> Use the first matrix that is passed to `KSPSetJacobian()` to construct the block-diagonal sub-matrices used in the algorithms,
2039    by default, the second matrix is used.
2040
2041  - Options for Schur preconditioner: `-pc_fieldsplit_type`
2042    `schur`
2043
2044    - `-pc_fieldsplit_schur_fact_type` \<diag|lower|upper|full:diag> See `PCFieldSplitSetSchurFactType()`. `full` reduces the iterations but each iteration requires additional
2045      field solves.
2046
2047    - `-pc_fieldsplit_schur_precondition` \<self|selfp|user|a11|full:user> How the Schur complement is preconditioned. See `PCFieldSplitSetSchurPre()`.
2048
2049      - `-fieldsplit_1_mat_schur_complement_ainv_type` \<diag|lump:diag> Use the lumped diagonal of $A_{00}$ when `-pc_fieldsplit_schur_precondition`
2050        `selfp` is used.
2051
2052    - `-pc_fieldsplit_schur_scale` \<real:-1.0> Controls the sign flip of S for `-pc_fieldsplit_schur_fact_type` `diag`.
2053      See `PCFieldSplitSetSchurScale()`
2054
2055    - `fieldsplit_1_xxx` controls the solver for the Schur complement system.
2056      If a `DM` provided the fields, use the second field name set in the `DM` instead of 1.
2057
2058      - `-fieldsplit_1_pc_type` `lsc` `-fieldsplit_1_lsc_pc_xxx` use
2059        the least squares commutators {cite}`elmanhowleshadidshuttleworthtuminaro2006` {cite}`silvester2001efficient`
2060        preconditioner for the Schur complement with any preconditioner for the least-squares matrix, see `PCLSC`.
2061        If a `DM` provided the fields, use the second field name set in the `DM` instead of 1.
2062
2063    - `-fieldsplit_upper_xxx` Set options for the solver in the upper solver when `-pc_fieldsplit_schur_fact_type`
2064      `upper` or `full` is used. Defaults to
2065      using the solver as provided with `-fieldsplit_0_xxx`.
2066
2067    - `-fieldsplit_1_inner_xxx` Set the options for the solver inside the application of the Schur complement;
2068      defaults to using the solver as provided with `-fieldsplit_0_xxx`. If a `DM` provides the fields use the name of the second field name set in the `DM` instead of 1.
2069
2070  - Options for GKB preconditioner: `-pc_fieldsplit_type` gkb
2071
2072    - `-pc_fieldsplit_gkb_tol` \<real:1e-5> See `PCFieldSplitSetGKBTol()`.
2073    - `-pc_fieldsplit_gkb_delay` \<int:5> See `PCFieldSplitSetGKBDelay()`.
2074    - `-pc_fieldsplit_gkb_nu` \<real:1.0> See `PCFieldSplitSetGKBNu()`.
2075    - `-pc_fieldsplit_gkb_maxit` \<int:100> See `PCFieldSplitSetGKBMaxit()`.
2076    - `-pc_fieldsplit_gkb_monitor` \<bool:false> Monitor the convergence of the inner solver.
2077
2078- Options for additive and multiplication field solvers:
2079
2080   - `-fieldsplit_%d_xxx` Set options for the solver for field number `d`. For example, `-fieldsplit_0_pc_type`
2081    `jacobi`. When the fields are obtained from a `DM` use the
2082    field name instead of `d`.
2083
2084For simplicity, we restrict our matrices to two-by-two blocks in the rest of the section. So the matrix is
2085
2086$$
2087\left( \begin{array}{cc}
2088A_{00}   & A_{01} \\
2089A_{10}   & A_{11} \\
2090\end{array} \right).
2091$$
2092
2093On occasion, the user may provide another matrix that is used to
2094construct parts of the preconditioner
2095
2096$$
2097\left( \begin{array}{cc}
2098Ap_{00}   & Ap_{01} \\
2099Ap_{10}   & Ap_{11} \\
2100\end{array} \right).
2101$$
2102
2103For notational simplicity define $\text{ksp}(A,Ap)$ to mean
2104approximately solving a linear system using `KSP` with the operator
2105$A$ and preconditioner built from matrix $Ap$.
2106
2107For matrices defined with any number of blocks, there are three “block”
2108algorithms available: block Jacobi,
2109
2110$$
2111\left( \begin{array}{cc}
2112  \text{ksp}(A_{00},Ap_{00})   & 0 \\
2113  0   & \text{ksp}(A_{11},Ap_{11}) \\
2114\end{array} \right)
2115$$
2116
2117block Gauss-Seidel,
2118
2119$$
2120\left( \begin{array}{cc}
2121I   & 0 \\
21220 & A^{-1}_{11} \\
2123\end{array} \right)
2124\left( \begin{array}{cc}
2125I   & 0 \\
2126-A_{10} & I \\
2127\end{array} \right)
2128\left( \begin{array}{cc}
2129A^{-1}_{00}   & 0 \\
21300 & I \\
2131\end{array} \right)
2132$$
2133
2134which is implemented [^id4] as
2135
2136$$
2137\left( \begin{array}{cc}
2138I   & 0 \\
2139  0 & \text{ksp}(A_{11},Ap_{11}) \\
2140\end{array} \right)
2141$$
2142
2143$$
2144\left[
2145\left( \begin{array}{cc}
21460   & 0 \\
21470 & I \\
2148\end{array} \right) +
2149\left( \begin{array}{cc}
2150I   & 0 \\
2151-A_{10} & -A_{11} \\
2152\end{array} \right)
2153\left( \begin{array}{cc}
2154I   & 0 \\
21550 & 0 \\
2156\end{array} \right)
2157\right]
2158$$
2159
2160$$
2161\left( \begin{array}{cc}
2162  \text{ksp}(A_{00},Ap_{00})   & 0 \\
21630 & I \\
2164\end{array} \right)
2165$$
2166
2167and symmetric block Gauss-Seidel
2168
2169$$
2170\left( \begin{array}{cc}
2171A_{00}^{-1}   & 0 \\
21720 & I \\
2173\end{array} \right)
2174\left( \begin{array}{cc}
2175I   & -A_{01} \\
21760 & I \\
2177\end{array} \right)
2178\left( \begin{array}{cc}
2179A_{00}   & 0 \\
21800 & A_{11}^{-1} \\
2181\end{array} \right)
2182\left( \begin{array}{cc}
2183I   & 0 \\
2184-A_{10} & I \\
2185\end{array} \right)
2186\left( \begin{array}{cc}
2187A_{00}^{-1}   & 0 \\
21880 & I \\
2189\end{array} \right).
2190$$
2191
2192These can be accessed with
2193`-pc_fieldsplit_type <additive,multiplicative,symmetric_multiplicative>`
2194or the function `PCFieldSplitSetType()`. The option prefixes for the
2195internal KSPs are given by `-fieldsplit_name_`.
2196
2197By default blocks $A_{00}, A_{01}$ and so on are extracted out of
2198`Pmat`, the matrix that the `KSP` uses to build the preconditioner,
2199and not out of `Amat` (i.e., $A$ itself). As discussed above, in
2200{any}`sec_combining_pcs`, however, it is
2201possible to use `Amat` instead of `Pmat` by calling
2202`PCSetUseAmat(pc)` or using `-pc_use_amat` on the command line.
2203Alternatively, you can have `PCFIELDSPLIT` extract the diagonal blocks
2204$A_{00}, A_{11}$ etc. out of `Amat` by calling
2205`PCFieldSplitSetDiagUseAmat(pc,PETSC_TRUE)` or supplying command-line
2206argument `-pc_fieldsplit_diag_use_amat`. Similarly,
2207`PCFieldSplitSetOffDiagUseAmat(pc,{PETSC_TRUE`) or
2208`-pc_fieldsplit_off_diag_use_amat` will cause the off-diagonal blocks
2209$A_{01},A_{10}$ etc. to be extracted out of `Amat`.
2210
2211For two-by-two blocks only, there is another family of solvers based on
2212Schur complements. The inverse of the Schur complement factorization is
2213
2214$$
2215\left[
2216\left( \begin{array}{cc}
2217I   & 0 \\
2218A_{10}A_{00}^{-1} & I \\
2219\end{array} \right)
2220\left( \begin{array}{cc}
2221A_{00}  & 0 \\
22220 & S \\
2223\end{array} \right)
2224\left( \begin{array}{cc}
2225I   & A_{00}^{-1} A_{01} \\
22260 & I \\
2227\end{array} \right)
2228\right]^{-1} =
2229$$
2230
2231$$
2232\left( \begin{array}{cc}
2233I   & A_{00}^{-1} A_{01} \\
22340 & I \\
2235\end{array} \right)^{-1}
2236\left( \begin{array}{cc}
2237A_{00}^{-1}  & 0 \\
22380 & S^{-1} \\
2239\end{array} \right)
2240\left( \begin{array}{cc}
2241I   & 0 \\
2242A_{10}A_{00}^{-1} & I \\
2243\end{array} \right)^{-1} =
2244$$
2245
2246$$
2247\left( \begin{array}{cc}
2248I   & -A_{00}^{-1} A_{01} \\
22490 & I \\
2250\end{array} \right)
2251\left( \begin{array}{cc}
2252A_{00}^{-1}  & 0 \\
22530 & S^{-1} \\
2254\end{array} \right)
2255\left( \begin{array}{cc}
2256I   & 0 \\
2257-A_{10}A_{00}^{-1} & I \\
2258\end{array} \right) =
2259$$
2260
2261$$
2262\left( \begin{array}{cc}
2263A_{00}^{-1}   & 0 \\
22640 & I \\
2265\end{array} \right)
2266\left( \begin{array}{cc}
2267I   & -A_{01} \\
22680 & I \\
2269\end{array} \right)
2270\left( \begin{array}{cc}
2271A_{00} & 0 \\
22720 & S^{-1} \\
2273\end{array} \right)
2274\left( \begin{array}{cc}
2275I   & 0 \\
2276-A_{10} & I \\
2277\end{array} \right)
2278\left( \begin{array}{cc}
2279A_{00}^{-1}   & 0 \\
22800 & I \\
2281\end{array} \right).
2282$$
2283
2284The preconditioner is accessed with `-pc_fieldsplit_type` `schur` and is
2285implemented as
2286
2287$$
2288\left( \begin{array}{cc}
2289  \text{ksp}(A_{00},Ap_{00})   & 0 \\
22900 & I \\
2291\end{array} \right)
2292\left( \begin{array}{cc}
2293I   & -A_{01} \\
22940 & I \\
2295\end{array} \right)
2296$$
2297
2298$$
2299\left( \begin{array}{cc}
2300I  & 0 \\
2301  0 & \text{ksp}(\hat{S},\hat{S}p) \\
2302\end{array} \right)
2303\left( \begin{array}{cc}
2304I   & 0 \\
2305  -A_{10} \text{ksp}(A_{00},Ap_{00}) & I \\
2306\end{array} \right).
2307$$
2308
2309Where
2310$\hat{S} = A_{11} - A_{10} \text{ksp}(A_{00},Ap_{00}) A_{01}$ is
2311the approximate Schur complement.
2312
2313There are several variants of the Schur complement preconditioner
2314obtained by dropping some of the terms; these can be obtained with
2315`-pc_fieldsplit_schur_fact_type <diag,lower,upper,full>` or the
2316function `PCFieldSplitSetSchurFactType()`. Note that the `diag` form
2317uses the preconditioner
2318
2319$$
2320\left( \begin{array}{cc}
2321  \text{ksp}(A_{00},Ap_{00})   & 0 \\
2322  0 & -\text{ksp}(\hat{S},\hat{S}p) \\
2323\end{array} \right).
2324$$
2325
2326This is done to ensure the preconditioner is positive definite for a
2327a common class of problems, saddle points with a positive definite
2328$A_{00}$: for these, the Schur complement is negative definite.
2329
2330The effectiveness of the Schur complement preconditioner depends on the
2331availability of a good preconditioner $\hat Sp$ for the Schur
2332complement matrix. In general, you are responsible for supplying
2333$\hat Sp$ via
2334`PCFieldSplitSetSchurPre(pc,PC_FIELDSPLIT_SCHUR_PRE_USER,Sp)`.
2335Without a good problem-specific $\hat Sp$, you can use
2336some built-in options.
2337
2338Using `-pc_fieldsplit_schur_precondition user` on the command line
2339activates the matrix supplied programmatically, as explained above.
2340
2341With `-pc_fieldsplit_schur_precondition a11` (default)
2342$\hat Sp = A_{11}$ is used to build a preconditioner for
2343$\hat S$.
2344
2345Otherwise, `-pc_fieldsplit_schur_precondition self` will set
2346$\hat Sp = \hat S$ and use the Schur complement matrix itself to
2347build the preconditioner.
2348
2349The problem with the last approach is that $\hat S$ is used in
2350the unassembled, matrix-free form, and many preconditioners (e.g., ILU)
2351cannot be built out of such matrices. Instead, you can *assemble* an
2352approximation to $\hat S$ by inverting $A_{00}$, but only
2353approximately, to ensure the sparsity of $\hat Sp$ as much
2354as possible. Specifically, using
2355`-pc_fieldsplit_schur_precondition selfp` will assemble
2356$\hat Sp = A_{11} - A_{10} \text{inv}(A_{00}) A_{01}$.
2357
2358By default $\text{inv}(A_{00})$ is the inverse of the diagonal of
2359$A_{00}$, but using
2360`-fieldsplit_1_mat_schur_complement_ainv_type lump` will lump
2361$A_{00}$ first. Using
2362`-fieldsplit_1_mat_schur_complement_ainv_type blockdiag` will use the
2363inverse of the block diagonal of $A_{00}$. Option
2364`-mat_schur_complement_ainv_type` applies to any matrix of
2365`MatSchurComplement` type and here it is used with the prefix
2366`-fieldsplit_1` of the linear system in the second split.
2367
2368Finally, you can use the `PCLSC` preconditioner for the Schur
2369complement with `-pc_fieldsplit_type schur -fieldsplit_1_pc_type lsc`.
2370This uses for the preconditioner to $\hat{S}$ the operator
2371
2372$$
2373\text{ksp}(A_{10} A_{01},A_{10} A_{01}) A_{10} A_{00} A_{01} \text{ksp}(A_{10} A_{01},A_{10} A_{01})
2374$$
2375
2376Which, of course, introduces two additional inner solves for each
2377application of the Schur complement. The options prefix for this inner
2378`KSP` is `-fieldsplit_1_lsc_`. Instead of constructing the matrix
2379$A_{10} A_{01}$, users can provide their own matrix. This is
2380done by attaching the matrix/matrices to the $Sp$ matrix they
2381provide with
2382
2383```
2384PetscObjectCompose((PetscObject)Sp,"LSC_L",(PetscObject)L);
2385PetscObjectCompose((PetscObject)Sp,"LSC_Lp",(PetscObject)Lp);
2386```
2387
2388(sec_singular)=
2389
2390## Solving Singular Systems
2391
2392Sometimes one is required to solver singular linear systems. In this
2393case, the system matrix has a nontrivial null space. For example, the
2394discretization of the Laplacian operator with Neumann boundary
2395conditions has a null space of the constant functions. PETSc has tools
2396to help solve these systems. This approach is only guaranteed to work for left preconditioning (see `KSPSetPCSide()`); for example it
2397may not work in some situations with `KSPFGMRES`.
2398
2399First, one must know what the null space is and store it using an
2400orthonormal basis in an array of PETSc Vecs. The constant functions can
2401be handled separately, since they are such a common case. Create a
2402`MatNullSpace` object with the command
2403
2404```
2405MatNullSpaceCreate(MPI_Comm,PetscBool hasconstants,PetscInt dim,Vec *basis,MatNullSpace *nsp);
2406```
2407
2408Here, `dim` is the number of vectors in `basis` and `hasconstants`
2409indicates if the null space contains the constant functions. If the null
2410space contains the constant functions you do not need to include it in
2411the `basis` vectors you provide, nor in the count `dim`.
2412
2413One then tells the `KSP` object you are using what the null space is
2414with the call
2415
2416```
2417MatSetNullSpace(Mat Amat,MatNullSpace nsp);
2418```
2419
2420The `Amat` should be the *first* matrix argument used with
2421`KSPSetOperators()`, `SNESSetJacobian()`, or `TSSetIJacobian()`.
2422The PETSc solvers will now
2423handle the null space during the solution process.
2424
2425If the right-hand side of linear system is not in the range of `Amat`, that is it is not
2426orthogonal to the null space of `Amat` transpose, then the residual
2427norm of the Krylov iteration will not converge to zero; it will converge to a non-zero value while the
2428solution is converging to the least squares solution of the linear system. One can, if one desires,
2429apply `MatNullSpaceRemove()` with the null space of `Amat` transpose to the right-hand side before calling
2430`KSPSolve()`. Then the residual norm will converge to zero.
2431
2432If one chooses a direct solver (or an incomplete factorization) it may
2433still detect a zero pivot. You can run with the additional options or
2434`-pc_factor_shift_type NONZERO`
2435`-pc_factor_shift_amount  <dampingfactor>` to prevent the zero pivot.
2436A good choice for the `dampingfactor` is 1.e-10.
2437
2438If the matrix is non-symmetric and you wish to solve the transposed linear system
2439you must provide the null space of the transposed matrix with `MatSetTransposeNullSpace()`.
2440
2441(sec_externalsol)=
2442
2443## Using External Linear Solvers
2444
2445PETSc interfaces to several external linear solvers (also see {any}`acknowledgements`).
2446To use these solvers, one may:
2447
24481. Run `configure` with the additional options
2449   `--download-packagename` e.g. `--download-superlu_dist`
2450   `--download-parmetis` (SuperLU_DIST needs ParMetis) or
2451   `--download-mumps` `--download-scalapack` (MUMPS requires
2452   ScaLAPACK).
24532. Build the PETSc libraries.
24543. Use the runtime option: `-ksp_type preonly` (or equivalently `-ksp_type none`) `-pc_type <pctype>`
2455   `-pc_factor_mat_solver_type <packagename>`. For eg:
2456   `-ksp_type preonly` `-pc_type lu`
2457   `-pc_factor_mat_solver_type superlu_dist`.
2458
2459```{eval-rst}
2460.. list-table:: Options for External Solvers
2461   :name: tab-externaloptions
2462   :header-rows: 1
2463
2464   * - MatType
2465     - PCType
2466     - MatSolverType
2467     - Package
2468   * - ``seqaij``
2469     - ``lu``
2470     - ``MATSOLVERESSL``
2471     - ``essl``
2472   * - ``seqaij``
2473     - ``lu``
2474     - ``MATSOLVERLUSOL``
2475     -  ``lusol``
2476   * - ``seqaij``
2477     - ``lu``
2478     - ``MATSOLVERMATLAB``
2479     - ``matlab``
2480   * - ``aij``
2481     - ``lu``
2482     - ``MATSOLVERMUMPS``
2483     - ``mumps``
2484   * - ``aij``
2485     - ``cholesky``
2486     - -
2487     - -
2488   * - ``sbaij``
2489     - ``cholesky``
2490     - -
2491     - -
2492   * - ``seqaij``
2493     - ``lu``
2494     - ``MATSOLVERSUPERLU``
2495     - ``superlu``
2496   * - ``aij``
2497     - ``lu``
2498     - ``MATSOLVERSUPERLU_DIST``
2499     - ``superlu_dist``
2500   * - ``seqaij``
2501     - ``lu``
2502     - ``MATSOLVERUMFPACK``
2503     - ``umfpack``
2504   * - ``seqaij``
2505     - ``cholesky``
2506     - ``MATSOLVERCHOLMOD``
2507     - ``cholmod``
2508   * - ``seqaij``
2509     - ``lu``
2510     - ``MATSOLVERKLU``
2511     -  ``klu``
2512   * - ``dense``
2513     - ``lu``
2514     - ``MATSOLVERELEMENTAL``
2515     -  ``elemental``
2516   * - ``dense``
2517     - ``cholesky``
2518     - -
2519     - -
2520   * - ``seqaij``
2521     - ``lu``
2522     - ``MATSOLVERMKL_PARDISO``
2523     - ``mkl_pardiso``
2524   * - ``aij``
2525     - ``lu``
2526     - ``MATSOLVERMKL_CPARDISO``
2527     - ``mkl_cpardiso``
2528   * - ``aij``
2529     - ``lu``
2530     - ``MATSOLVERPASTIX``
2531     -  ``pastix``
2532   * - ``aij``
2533     - ``cholesky``
2534     - ``MATSOLVERBAS``
2535     -  ``bas``
2536   * - ``aijcusparse``
2537     - ``lu``
2538     - ``MATSOLVERCUSPARSE``
2539     - ``cusparse``
2540   * - ``aijcusparse``
2541     - ``cholesky``
2542     -  -
2543     -  -
2544   * - ``aij``
2545     - ``lu``, ``cholesky``
2546     - ``MATSOLVERPETSC``
2547     - ``petsc``
2548   * - ``baij``
2549     - -
2550     - -
2551     - -
2552   * - ``aijcrl``
2553     - -
2554     - -
2555     - -
2556   * - ``aijperm``
2557     - -
2558     - -
2559     - -
2560   * - ``seqdense``
2561     - -
2562     - -
2563     - -
2564   * - ``aij``
2565     - -
2566     - -
2567     - -
2568   * - ``baij``
2569     - -
2570     - -
2571     - -
2572   * - ``aijcrl``
2573     - -
2574     - -
2575     - -
2576   * - ``aijperm``
2577     - -
2578     - -
2579     - -
2580   * - ``seqdense``
2581     - -
2582     - -
2583     - -
2584```
2585
2586The default and available input options for each external software can
2587be found by specifying `-help` at runtime.
2588
2589As an alternative to using runtime flags to employ these external
2590packages, procedural calls are provided for some packages. For example,
2591the following procedural calls are equivalent to runtime options
2592`-ksp_type preonly` (or equivalently `-ksp_type none`) `-pc_type lu`
2593`-pc_factor_mat_solver_type mumps` `-mat_mumps_icntl_7 3`:
2594
2595```
2596KSPSetType(ksp, KSPPREONLY); // (or equivalently KSPSetType(ksp, KSPNONE))
2597KSPGetPC(ksp, &pc);
2598PCSetType(pc, PCLU);
2599PCFactorSetMatSolverType(pc, MATSOLVERMUMPS);
2600PCFactorSetUpMatSolverType(pc);
2601PCFactorGetMatrix(pc, &F);
2602icntl=7; ival = 3;
2603MatMumpsSetIcntl(F, icntl, ival);
2604```
2605
2606One can also create matrices with the appropriate capabilities by
2607calling `MatCreate()` followed by `MatSetType()` specifying the
2608desired matrix type from {any}`tab-externaloptions`. These
2609matrix types inherit capabilities from their PETSc matrix parents:
2610`MATSEQAIJ`, `MATMPIAIJ`, etc. As a result, the preallocation routines
2611`MatSeqAIJSetPreallocation()`, `MatMPIAIJSetPreallocation()`, etc.
2612and any other type specific routines of the base class are supported.
2613One can also call `MatConvert()` inplace to convert the matrix to and
2614from its base class without performing an expensive data copy.
2615`MatConvert()` cannot be called on matrices that have already been
2616factored.
2617
2618In {any}`tab-externaloptions`, the base class `aij` refers
2619to the fact that inheritance is based on `MATSEQAIJ` when constructed
2620with a single process communicator, and from `MATMPIAIJ` otherwise.
2621The same holds for `baij` and `sbaij`. For codes that are intended
2622to be run as both a single process or with multiple processes, depending
2623on the `mpiexec` command, it is recommended that both sets of
2624preallocation routines are called for these communicator morphing types.
2625The call for the incorrect type will simply be ignored without any harm
2626or message.
2627
2628(sec_pcmpi)=
2629
2630## Using PETSc's MPI parallel linear solvers from a non-MPI program
2631
2632Using PETSc's MPI linear solver server it is possible to use multiple MPI processes to solve a
2633a linear system when the application code, including the matrix generation, is run on a single
2634MPI process (with or without OpenMP). The application code must be built with MPI and must call
2635`PetscInitialize()` at the very beginning of the program and end with `PetscFinalize()`. The
2636application code may utilize OpenMP.
2637The code may create multiple matrices and `KSP` objects and call `KSPSolve()`, similarly the
2638code may utilize the `SNES` nonlinear solvers, the `TS` ODE integrators, and the `Tao` optimization algorithms
2639which use `KSP`.
2640
2641The program must then be launched using the standard approaches for launching MPI programs with the additional
2642PETSc option `-mpi_linear_solver_server`. The linear solves are controlled via the options database in the usual manner (using any options prefix
2643you may have provided via `KSPSetOptionsPrefix()`, for example `-ksp_type cg -ksp_monitor -pc_type bjacobi -ksp_view`. The solver options cannot be set via
2644the functional interface, for example `KSPSetType()` etc.
2645
2646The option `-mpi_linear_solver_server_view` will print
2647a summary of all the systems solved by the MPI linear solver server when the program completes. By default the linear solver server
2648will only parallelize the linear solve to the extent that it believes is appropriate to obtain speedup for the parallel solve, for example, if the
2649matrix has 1,000 rows and columns the solution will not be parallelized by default. One can use the option `-mpi_linear_solver_server_minimum_count_per_rank 5000`
2650to cause the linear solver server to allow as few as 5,000 unknowns per MPI process in the parallel solve.
2651
2652See `PCMPI`, `PCMPIServerBegin()`, and `PCMPIServerEnd()` for more details on the solvers.
2653
2654For help when anything goes wrong with the MPI linear solver server see `PCMPIServerBegin()`.
2655
2656Amdahl's law makes clear that parallelizing only a portion of a numerical code can only provide a limited improvement
2657in the computation time; thus it is crucial to understand what phases of a computation must be parallelized (via MPI, OpenMP, or some other model)
2658to ensure a useful increase in performance. One of the crucial phases is likely the generation of the matrix entries; the
2659use of `MatSetPreallocationCOO()` and `MatSetValuesCOO()` in an OpenMP code allows parallelizing the generation of the matrix.
2660
2661See {any}`sec_pcmpi_study` for a study of the use of `PCMPI` on a specific PETSc application.
2662
2663```{rubric} Footnotes
2664```
2665
2666[^id3]: See {any}`sec_amg` for information on using algebraic multigrid.
2667
2668[^id4]: This may seem an odd way to implement since it involves the "extra"
2669    multiply by $-A_{11}$. The reason is this is implemented this
2670    way is that this approach works for any number of blocks that may
2671    overlap.
2672
2673```{rubric} References
2674```
2675
2676```{eval-rst}
2677.. bibliography:: /petsc.bib
2678   :filter: docname in docnames
2679```
2680