xref: /petsc/doc/tutorials/physics/guide_to_stokes.md (revision 0baf8eba40dbc839082666f9f7396a225d6f663c) !
1---
2orphan: true
3---
4
5(tut_stokes)=
6
7# Guide to the Stokes Equations using Finite Elements
8
9The Stokes equations
10
11$$
12\begin{aligned}
13-\nabla \cdot \left(\mu \left(\nabla u + \nabla u^T \right)\right) + \nabla p + f &= 0 \\
14\nabla\cdot u &= 0 \end{aligned}
15$$
16
17describe slow flow of an incompressible fluid with velocity $u$, pressure $p$, and body force $f$.
18
19This guide accompanies <a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/snes/tutorials/ex62.c.html">SNES Example 62></a> and <a href="PETSC_DOC_OUT_ROOT_PLACEHOLDER/src/snes/tutorials/ex69.c.html">SNES Example 69><\a>.
20
21The Stokes equations for a fluid, a steady-state form of the Navier-Stokes equations, start with the balance of momentum, just as in elastostatics,
22
23$$
24\nabla \cdot \sigma + f = 0,
25$$
26
27where $\sigma$ is the stress tensor and $f$ is the body force, combined with the conservation of mass
28
29$$
30\nabla \cdot (\rho u) = 0,
31$$
32
33where $\rho$ is the density and $u$ is the fluid velocity. If we assume that the density is constant, making the fluid incompressible, and that the rheology is Newtonian, meaning that the viscous stress is linearly proportional to the local strain rate, then we have
34
35$$
36\begin{aligned}
37  \nabla \cdot \mu \left( \nabla u + \nabla u^T \right) - \nabla p + f &= 0 \\
38  \nabla \cdot u &= 0
39\end{aligned}
40$$
41
42where $p$ is the pressure, $\mu$ is the dynamic shear viscosity, with units $N\cdot s/m^2$ or $Pa\cdot s$. If we divide by the constant density, we would have the kinematic viscosity $\nu$ and a force per unit mass. The second equation demands that the velocity field be divergence-free, indicating that the flow is incompressible. The pressure in this case can be thought of as the Lagrange multiplier enforcing the incompressibility constraint. In the compressible case, we would need an equation of state to relate the pressure to the density, and perhaps temperature.
43
44We will discretize our Stokes equations with finite elements, so the first step is to write a variational weak form of the equations. We choose to use a Ritz-Galerkin setup, so let our velocity $u \in V$ and pressure $p \in Q$, so that
45
46$$
47\begin{aligned}
48  \left< \nabla v, \mu \left( \nabla u + \nabla u^T \right) \right> + \left< v, \frac{\partial\sigma}{\partial n} \right>_\Gamma - \left< \nabla\cdot v, p \right> - \left< v, f \right> &= 0 & \text{for all} \ v \in V\\
49  \left< q, -\nabla \cdot u \right> &= 0 & \text{for all} \ q \in Q
50\end{aligned}
51$$
52
53where integration by parts has added a boundary integral over the normal derivative of the stress (traction), and natural boundary conditions correspond to stress-free boundaries. We have multiplied the continuity equation by minus one in order to preserve symmetry.
54
55## Equation Definition
56
57The test functions $v, q$ and their derivatives are determined by the discretization, whereas the form of the integrand is determined by the physics. Given a quadrature rule to evaluate the form integral, we would only need the evaluation of the physics integrand at the quadrature points, given the values of the fields and their derivatives. The entire scheme is detailed in {cite}`knepleybrownruppsmith13`. The kernels paired with test functions we will call $f_0$ and those paired with gradients of test functions will be called $f_1$.
58
59For example, the kernel for the continuity equation, paired with the pressure test function, is called `f0_p` and can be seen here
60
61```{literalinclude} /../src/snes/tutorials/ex62.c
62:end-at: '}'
63:start-at: static void f0_p(
64```
65
66We use the components of the Jacobian of $u$ to build up its divergence. For the balance of momentum excluding body force, we test against the gradient of the test function, as seen in `f1_u`,
67
68```{literalinclude} /../src/snes/tutorials/ex62.c
69:end-at: '}'
70:start-at: static void f1_u(
71```
72
73Notice how the pressure $p$ is referred to using `u[uOff[1]]` so that we can have many fields with different numbers of components. `DMPlex` uses these point functions to construct the residual. A similar set of point functions is also used to build the Jacobian. The last piece of our physics specification is the construction of exact solutions using the Method of Manufactured Solutions (MMS).
74
75## MMS Solutions
76
77An MMS solution is chosen to elucidate some property of the problem, and to check that it is being solved accurately, since the error can be calculated explicitly. For our Stokes problem, we first choose a solution with quadratic velocity and linear pressure,
78
79$$
80u = \begin{pmatrix} x^2 + y^2 \\ 2 x^2 - 2 x y \end{pmatrix} \quad \mathrm{or} \quad \begin{pmatrix} 2 x^2 + y^2 + z^2 \\ 2 x^2 - 2xy \\ 2 x^2 - 2xz \end{pmatrix}
81$$
82
83```{literalinclude} /../src/snes/tutorials/ex62.c
84:append: '}'
85:end-at: return 0;
86:start-at: static PetscErrorCode quadratic_u
87```
88
89$$
90p = x + y - 1 \quad \mathrm{or} \quad x + y + z - \frac{3}{2}
91$$
92
93```{literalinclude} /../src/snes/tutorials/ex62.c
94:append: '}'
95:end-at: return 0;
96:start-at: static PetscErrorCode quadratic_p
97```
98
99By plugging these solutions into our equations, assuming that the velocity we choose is divergence-free, we can determine the body force necessary to make them satisfy the Stokes equations. For the quadratic solution above, we find
100
101$$
102f = \begin{pmatrix} 1 - 4\mu \\ 1 - 4\mu \end{pmatrix} \quad \mathrm{or} \quad \begin{pmatrix} 1 - 8\mu \\ 1 - 4\mu \\ 1 - 4\mu \end{pmatrix}
103$$
104
105which is implemented in our `f0_quadratic_u` pointwise function
106
107```{literalinclude} /../src/snes/tutorials/ex62.c
108:end-at: '}'
109:start-at: static void f0_quadratic_u
110```
111
112We let PETSc know about these solutions
113
114```{literalinclude} /../src/snes/tutorials/ex62.c
115:end-at: PetscCall(PetscDSSetExactSolution(ds, 1
116:start-at: PetscCall(PetscDSSetExactSolution(ds, 0
117```
118
119These solutions will be captured exactly by the $P_2-P_1$ finite element space. We can use the `-dmsnes_check` option to activate function space checks. It gives the $L_2$ error, or *discretization* error, of the exact solution, the residual computed using the interpolation of the exact solution into our finite element space, and uses a Taylor test to check that our Jacobian matches the residual. It should converge at order 2, or be exact in the case of linear equations like Stokes. Our $P_2-P_1$ runs in the PETSc test section at the bottom of the source file
120
121```{literalinclude} /../src/snes/tutorials/ex62.c
122:lines: 1-3
123:start-at: 'suffix: 2d_p2_p1_check'
124```
125
126```{literalinclude} /../src/snes/tutorials/ex62.c
127:lines: 1-3
128:start-at: 'suffix: 3d_p2_p1_check'
129```
130
131verify these claims, as we can see from the output files
132
133```{literalinclude} /../src/snes/tutorials/output/ex62_2d_p2_p1_check.out
134:language: none
135```
136
137```{literalinclude} /../src/snes/tutorials/output/ex62_3d_p2_p1_check.out
138:language: none
139```
140
141We can carry out the same tests for the $Q_2-Q_1$ element,
142
143```{literalinclude} /../src/snes/tutorials/ex62.c
144:lines: 1-2
145:start-at: 'suffix: 2d_q2_q1_check'
146```
147
148```{literalinclude} /../src/snes/tutorials/ex62.c
149:lines: 1-2
150:start-at: 'suffix: 3d_q2_q1_check'
151```
152
153The quadratic solution, however, cannot tell us whether our discretization is attaining the correct order of convergence, especially for higher order elements. Thus, we will define another solution based on trigonometric functions.
154
155$$
156u = \begin{pmatrix} \sin(\pi x) + \sin(\pi y) \\ -\pi \cos(\pi x) y \end{pmatrix} \quad \mathrm{or} \quad
157    \begin{pmatrix} 2 \sin(\pi x) + \sin(\pi y) + \sin(\pi z) \\ -\pi \cos(\pi x) y \\ -\pi \cos(\pi x) z \end{pmatrix}
158$$
159
160```{literalinclude} /../src/snes/tutorials/ex62.c
161:append: '}'
162:end-at: return 0;
163:start-at: static PetscErrorCode trig_u
164```
165
166$$
167p = \sin(2 \pi x) + \sin(2 \pi y) \quad \mathrm{or} \quad \sin(2 \pi x) + \sin(2 \pi y) + \sin(2 \pi z)
168$$
169
170```{literalinclude} /../src/snes/tutorials/ex62.c
171:append: '}'
172:end-at: return 0;
173:start-at: static PetscErrorCode trig_p
174```
175
176$$
177f = \begin{pmatrix} 2 \pi \cos(2 \pi x) + \mu \pi^2 \sin(\pi x) + \mu \pi^2 \sin(\pi y) \\ 2 \pi \cos(2 \pi y) - \mu \pi^3 \cos(\pi x) y \end{pmatrix} \quad \mathrm{or} \quad
178\begin{pmatrix} 2 \pi \cos(2 \pi x) + 2\mu \pi^2 \sin(\pi x) + \mu \pi^2 \sin(\pi y) + \mu \pi^2 \sin(\pi z) \\ 2 \pi \cos(2 \pi y) - \mu \pi^3 cos(\pi x) y \\ 2 \pi \cos(2 \pi z) - \mu \pi^3 \cos(\pi x) z \end{pmatrix}
179$$
180
181```{literalinclude} /../src/snes/tutorials/ex62.c
182:append: '}'
183:end-at: '}'
184:start-at: static void f0_quadratic_u
185```
186
187We can now use `-snes_convergence_estimate` to determine the convergence exponent for the discretization. This options solves the problem on a series of refined meshes, calculates the error on each mesh, and determines the slope on a logarithmic scale. For example, we do this in two dimensions, refining our mesh twice using `-convest_num_refine 2` in the following test.
188
189```{literalinclude} /../src/snes/tutorials/ex62.c
190:end-before: 'test:'
191:start-at: 'suffix: 2d_p2_p1_conv'
192```
193
194However, the test needs an accurate linear solver. Sparse LU factorizations do not, in general, do full pivoting. Thus we must deal with the zero pressure block explicitly. We use the `PCFIELDSPLIT` preconditioner and the full Schur complement factorization, but we still need a preconditioner for the Schur complement $B^T A^{-1} B$. We can have PETSc construct that matrix automatically, but the cost rises steeply as the problem size increases. Instead, we use the fact that the Schur complement is spectrally equivalent to the pressure mass matrix $M_p$. We can make a preconditioning matrix, which has the diagonal blocks we will use to build the preconditioners, letting PETSc know that we get the off-diagonal blocks from the original system with `-pc_fieldsplit_off_diag_use_amat` and to build the Schur complement from the original matrix using `-pc_use_amat`,
195
196```{literalinclude} /../src/snes/tutorials/ex62.c
197:end-at: PetscCall(PetscDSSetJacobianPreconditioner(ds, 1
198:start-at: PetscCall(PetscDSSetJacobianPreconditioner(ds, 0
199```
200
201Putting this all together, and using exact solvers on the subblocks, we have
202
203```{literalinclude} /../src/snes/tutorials/ex62.c
204:end-before: 'test:'
205:start-at: 'suffix: 2d_p2_p1_conv'
206```
207
208and we see it converges, however it is superconverging in the pressure,
209
210```{literalinclude} /../src/snes/tutorials/output/ex62_2d_p2_p1_conv.out
211```
212
213If we refine the mesh using `-dm_refine 3`, the convergence rates become `[3.0, 2.1]`.
214
215## Dealing with Parameters
216
217Like most physical problems, the Stokes problem has a parameter, the dynamic shear viscosity, which determines what solution regime we are in. To handle these parameters in PETSc, we first define a C struct to hold them,
218
219```{literalinclude} /../src/snes/tutorials/ex62.c
220:end-at: '} Parameter;'
221:start-at: typedef struct
222```
223
224and then add a `PetscBag` object to our application context. We then setup the parameter object,
225
226```{literalinclude} /../src/snes/tutorials/ex62.c
227:append: '}'
228:end-at: PetscFunctionReturn(PETSC_SUCCESS);
229:start-at: static PetscErrorCode SetupParameters
230```
231
232which will allow us to set the value from the command line using `-mu`. The `PetscBag` can also be persisted to disk with `PetscBagLoad/View()`. We can make these values available as constant to our pointwise functions through the `PetscDS` object.
233
234```{literalinclude} /../src/snes/tutorials/ex62.c
235:end-at: '}'
236:start-after: /* Make constant values
237```
238
239## Investigating convergence
240
241In order to look at the convergence of some harder problems, we will examine `SNES ex69`. This example provides an exact solution to the variable viscosity Stokes equation. The sharp viscosity variation will allow us to investigate convergence of the solver and discretization. Briefly, a sharp viscosity variation is created across the unit square, imposed on a background pressure with given fundamental frequency. For example, we can create examples with period one half and viscosity $e^{2 B x}$ (solKx)
242
243```console
244$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_refine 5 -dm_view hdf5:$PETSC_DIR/sol.h5 -snes_view_solution hdf5:$PETSC_DIR/sol.h5::append -exact_vec_view hdf5:$PETSC_DIR/sol.h5::append -m 2 -n 2 -B 1"
245$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_refine 5 -dm_view hdf5:$PETSC_DIR/sol.h5 -snes_view_solution hdf5:$PETSC_DIR/sol.h5::append -exact_vec_view hdf5:$PETSC_DIR/sol.h5::append -m 2 -n 2 -B 3.75"
246```
247
248which are show in the figure below.
249
250```{eval-rst}
251.. list-table::
252
253  * - .. figure:: /images/tutorials/physics/ex69_sol_m_2_n_2_B_1.png
254
255         Solution for :math:`m=2`, :math:`n=2`, :math:`B=1`
256
257    - .. figure:: /images/tutorials/physics/ex69_sol_m_2_n_2_B_375.png
258
259         Solution for :math:`m=2`, :math:`n=2`, :math:`B=3.75`
260```
261
262### Debugging
263
264If we can provide the `PetscDS` object in our problem with the exact solution function, PETSc has good support for debugging our discretization and solver. We can use the `PetscConvEst` object to check the convergence behavior of our element automatically. For example, if we use the `-snes_convergence_estimate` option, PETSc will solve our nonlinear equations on a series of refined meshes, use our exact solution to calculate the error, and then fit this line on a log-log scale to get the convergence rate,
265
266```{literalinclude} /../src/snes/tutorials/ex69.c
267:end-before: 'test:'
268:start-at: 'suffix: p2p1_conv'
269```
270
271If we initially refine the mesh twice, `-dm_refine 2`, we get
272
273> L_2 convergence rate: [3.0, 2.2]
274
275which are the convergence rates we expect for the velocity and pressure using a $P_2-P_1$ discretization. For $Q_1-P_0$
276
277```{literalinclude} /../src/snes/tutorials/ex69.c
278:end-before: 'test:'
279:start-at: 'suffix: q1p0_conv'
280```
281
282we get
283
284> L_2 convergence rate: [2.0, 1.0]
285
286This is a sensitive check that everything is working correctly. However, if this is wrong, where can I start? More fine-grained checks are available using the `-dmsnes_check` option. Using this for our $P_2-P_1$ example (the `p2p1` test), we have
287
288```{literalinclude} /../src/snes/tutorials/output/ex69_p2p1.out
289```
290
291The first line records the discretization error for our exact solution. This means that we project our solution function into the finite element space and then calculate the $L_2$ norm of the difference between the exact solution and its projection. The norm is computed for each field separately. Next, PETSc calculates the residual using the projected exact solution as input. This should be small, and as the mesh is refined it should approach zero. Last, PETSc uses a Taylor test to try and determine how the error in the linear model scales as a function of the perturbation $h$. Thus, in a nonlinear situation we would expect
292
293> Taylor approximation converging at order 2.0
294
295In this case, since the viscosity does not depend on the velocity or pressure fields, we detect that the linear model is exact
296
297> Function appears to be linear
298
299Suppose that we have made an error in the Jacobian. For instance, let us accidentally flip the sign of the pressure term in the momentum Jacobian.
300
301```{literalinclude} /../src/snes/tutorials/ex69.c
302:end-at: '}'
303:start-at: static void stokes_momentum_pres_J
304```
305
306When we run, we get a failure of the nonlinear solver. Our checking reveals that the Jacobian is wrong because it is converging at order 1 instead of 2, meaning the linear term is not correct in our model.
307
308```console
309$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_monitor -ksp_monitor_true_residual -ksp_converged_reason"
310L_2 Error: [0.000439127, 0.0376629]
311L_2 Residual: 0.0453958
312Taylor approximation converging at order 1.00
313  0 SNES Function norm 1.170604545948e-01
314    0 KSP preconditioned resid norm 4.965098891419e-01 true resid norm 1.170604545948e-01 ||r(i)||/||b|| 1.000000000000e+00
315    1 KSP preconditioned resid norm 9.236805404733e-11 true resid norm 1.460082233654e-12 ||r(i)||/||b|| 1.247289051378e-11
316  Linear solve converged due to CONVERGED_ATOL iterations 1
317[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
318[0]PETSC ERROR:
319[0]PETSC ERROR: SNESSolve has not converged
320```
321
322In order to track down the error, we can use `-snes_test_jacobian` which computes a finite difference approximation to the Jacobian and compares that to the analytic Jacobian. We ignore the first test, which occurs during our testing of the Jacobian, and look at the test that happens during the first Newton iterate. We see that the relative error in the Frobenius norm is about one percent, which indicates we have a real problem.
323
324```console
325$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -snes_test_jacobian"
326L_2 Error: [0.000439127, 0.0376629]
327L_2 Residual: 0.0453958
328  ---------- Testing Jacobian -------------
329  Run with -snes_test_jacobian_view and optionally -snes_test_jacobian <threshold> to show difference
330    of hand-coded and finite difference Jacobian entries greater than <threshold>.
331  Testing hand-coded Jacobian, if (for double precision runs) ||J - Jfd||_F/||J||_F is
332    O(1.e-8), the hand-coded Jacobian is probably correct.
333  ||J - Jfd||_F/||J||_F = 136.793, ||J - Jfd||_F = 136.793
334  ---------- Testing Jacobian for preconditioner -------------
335  ||J - Jfd||_F/||J||_F = 136.793, ||J - Jfd||_F = 136.793
336Taylor approximation converging at order 1.00
337  0 SNES Function norm 1.170604545948e-01
338  ---------- Testing Jacobian -------------
339  ||J - Jfd||_F/||J||_F = 0.0119377, ||J - Jfd||_F = 1.63299
340  ---------- Testing Jacobian for preconditioner -------------
341  ||J - Jfd||_F/||J||_F = 0.008471, ||J - Jfd||_F = 1.15873
342    0 KSP preconditioned resid norm 4.965098891419e-01 true resid norm 1.170604545948e-01 ||r(i)||/||b|| 1.000000000000e+00
343    1 KSP preconditioned resid norm 9.236804064319e-11 true resid norm 1.460031196842e-12 ||r(i)||/||b|| 1.247245452699e-11
344  Linear solve converged due to CONVERGED_ATOL iterations 1
345[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
346[0]PETSC ERROR:
347[0]PETSC ERROR: SNESSolve has not converged
348```
349
350At this point, we could just go back and check the code. However, PETSc will also print out the differences between the analytic and approximate Jacobians. When we give the `-snes_test_jacobian_view` option, the code will print both Jacobians (which we omit) and then their difference, and will also do this for the preconditioning matrix (which we omit). It is clear from the output that the $u-p$ block of the Jacobian is wrong, and thus we know right where to look for our error. Moreover, if we look at the values in row 15, we see that the values just differ by a sign.
351
352```console
353$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -snes_test_jacobian"
354        Hand-coded minus finite-difference Jacobian with tolerance 1e-05 ----------
355Mat Object: 1 MPI process
356  type: seqaij
357row 0:
358row 1:
359row 2:
360row 3:
361row 4:
362row 5:
363row 6:
364row 7:
365row 8:
366row 9:
367row 10:
368row 11:
369row 12:
370row 13:
371row 14:
372row 15: (0, 0.166667)  (2, -0.166667)
373row 16: (0, 0.166667)  (2, -0.166667)  (5, 0.166667)  (8, -0.166667)
374row 17: (0, 0.166667)  (2, 0.166667)  (5, -0.166667)  (8, -0.166667)
375row 18: (0, 0.166667)  (5, -0.166667)
376row 19: (5, 0.166667)  (8, -0.166667)  (11, 0.166667)  (13, -0.166667)
377row 20: (5, 0.166667)  (8, 0.166667)  (11, -0.166667)  (13, -0.166667)
378row 21: (5, 0.166667)  (11, -0.166667)
379row 22: (5, 0.333333)  (8, -0.333333)
380row 23: (2, 0.166667)  (5, 0.166667)  (8, -0.166667)  (11, -0.166667)
381row 24: (2, 0.166667)  (3, -0.166667)  (5, 0.166667)  (8, -0.166667)
382row 25: (2, 0.333333)  (8, -0.333333)
383row 26: (2, 0.166667)  (3, -0.166667)  (8, 0.166667)  (10, -0.166667)
384row 27: (2, 0.166667)  (3, 0.166667)  (8, -0.166667)  (10, -0.166667)
385row 28: (3, 0.166667)  (10, -0.166667)
386row 29: (8, 0.333333)  (10, -0.333333)
387row 30: (3, 0.166667)  (8, 0.166667)  (10, -0.166667)  (13, -0.166667)
388row 31: (2, 0.166667)  (3, -0.166667)
389row 32: (8, 0.166667)  (10, -0.166667)  (13, 0.166667)  (14, -0.166667)
390row 33: (8, 0.166667)  (10, 0.166667)  (13, -0.166667)  (14, -0.166667)
391row 34: (10, 0.166667)  (14, -0.166667)
392row 35: (13, 0.166667)  (14, -0.166667)
393row 36: (8, 0.166667)  (10, -0.166667)  (11, 0.166667)  (13, -0.166667)
394row 37: (8, 0.333333)  (13, -0.333333)
395row 38: (11, 0.166667)  (13, -0.166667)
396          0 KSP preconditioned resid norm 4.965098891419e-01 true resid norm 1.170604545948e-01 ||r(i)||/||b|| 1.000000000000e+00
397          1 KSP preconditioned resid norm 9.236804067326e-11 true resid norm 1.460031196842e-12 ||r(i)||/||b|| 1.247245452699e-11
398        Linear solve converged due to CONVERGED_ATOL iterations 1
399      [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
400      [0]PETSC ERROR:
401      [0]PETSC ERROR: SNESSolve has not converged
402```
403
404Can we see that the Schur complement of Q1-P0 is ill-conditioned?
405
406### Optimizing the Solver
407
408In order to see exactly what solver we have employed, we can use the `-snes_view` option. When checking $P_2-P_1$ convergence, we use an exact solver, but it must have several parts in order to deal with the saddle-point in the Jacobian. Using the test system to provide our extra option, we get
409
410```console
411$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
412SNES Object: 1 MPI process
413  type: newtonls
414  maximum iterations=50, maximum function evaluations=10000
415  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
416  total number of linear solver iterations=1
417  total number of function evaluations=2
418  norm schedule ALWAYS
419  SNESLineSearch Object: 1 MPI process
420    type: bt
421      interpolation: cubic
422      alpha=1.000000e-04
423    maxstep=1.000000e+08, minlambda=1.000000e-12
424    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
425    maximum iterations=40
426  KSP Object: 1 MPI process
427    type: gmres
428      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
429      happy breakdown tolerance 1e-30
430    maximum iterations=10000, initial guess is zero
431    tolerances: relative=1e-09, absolute=1e-10, divergence=10000.
432    left preconditioning
433    using PRECONDITIONED norm type for convergence test
434  PC Object: 1 MPI process
435    type: fieldsplit
436      FieldSplit with Schur preconditioner, factorization FULL
437      Preconditioner for the Schur complement formed from A11
438      Split info:
439      Split number 0 Defined by IS
440      Split number 1 Defined by IS
441      KSP solver for A00 block
442        KSP Object: (fieldsplit_velocity_) 1 MPI process
443          type: gmres
444            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
445            happy breakdown tolerance 1e-30
446          maximum iterations=10000, initial guess is zero
447          tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
448          left preconditioning
449          using PRECONDITIONED norm type for convergence test
450        PC Object: (fieldsplit_velocity_) 1 MPI process
451          type: lu
452            out-of-place factorization
453            tolerance for zero pivot 2.22045e-14
454            matrix ordering: nd
455            factor fill ratio given 5., needed 1.15761
456              Factored matrix follows:
457                Mat Object: 1 MPI process
458                  type: seqaij
459                  rows=30, cols=30
460                  package used to perform factorization: petsc
461                  total: nonzeros=426, allocated nonzeros=426
462                    using I-node routines: found 17 nodes, limit used is 5
463          linear system matrix followed by preconditioner matrix:
464          Mat Object: 1 MPI process
465            type: seqaij
466            rows=30, cols=30
467            total: nonzeros=368, allocated nonzeros=368
468            total number of mallocs used during MatSetValues calls=0
469              using I-node routines: found 20 nodes, limit used is 5
470          Mat Object: (fieldsplit_velocity_) 1 MPI process
471            type: seqaij
472            rows=30, cols=30
473            total: nonzeros=368, allocated nonzeros=368
474            total number of mallocs used during MatSetValues calls=0
475              using I-node routines: found 20 nodes, limit used is 5
476      KSP solver for S = A11 - A10 inv(A00) A01
477        KSP Object: (fieldsplit_pressure_) 1 MPI process
478          type: gmres
479            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
480            happy breakdown tolerance 1e-30
481          maximum iterations=10000, initial guess is zero
482          tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
483          left preconditioning
484          using PRECONDITIONED norm type for convergence test
485        PC Object: (fieldsplit_pressure_) 1 MPI process
486          type: lu
487            out-of-place factorization
488            tolerance for zero pivot 2.22045e-14
489            matrix ordering: nd
490            factor fill ratio given 5., needed 1.2439
491              Factored matrix follows:
492                Mat Object: 1 MPI process
493                  type: seqaij
494                  rows=9, cols=9
495                  package used to perform factorization: petsc
496                  total: nonzeros=51, allocated nonzeros=51
497                    not using I-node routines
498          linear system matrix followed by preconditioner matrix:
499          Mat Object: (fieldsplit_pressure_) 1 MPI process
500            type: schurcomplement
501            rows=9, cols=9
502              has attached null space
503              Schur complement A11 - A10 inv(A00) A01
504              A11
505                Mat Object: 1 MPI process
506                  type: seqaij
507                  rows=9, cols=9
508                  total: nonzeros=41, allocated nonzeros=41
509                  total number of mallocs used during MatSetValues calls=0
510                    has attached null space
511                    not using I-node routines
512              A10
513                Mat Object: 1 MPI process
514                  type: seqaij
515                  rows=9, cols=30
516                  total: nonzeros=122, allocated nonzeros=122
517                  total number of mallocs used during MatSetValues calls=0
518                    not using I-node routines
519              KSP solver for A00 block viewable with the additional option -fc_fieldsplit_velocity_ksp_view
520              A01
521                Mat Object: 1 MPI process
522                  type: seqaij
523                  rows=30, cols=9
524                  total: nonzeros=122, allocated nonzeros=122
525                  total number of mallocs used during MatSetValues calls=0
526                    using I-node routines: found 20 nodes, limit used is 5
527          Mat Object: (fieldsplit_pressure_) 1 MPI process
528            type: seqaij
529            rows=9, cols=9
530            total: nonzeros=41, allocated nonzeros=41
531            total number of mallocs used during MatSetValues calls=0
532              not using I-node routines
533    linear system matrix followed by preconditioner matrix:
534    Mat Object: 1 MPI process
535      type: seqaij
536      rows=39, cols=39
537      total: nonzeros=653, allocated nonzeros=653
538      total number of mallocs used during MatSetValues calls=0
539        has attached null space
540        using I-node routines: found 24 nodes, limit used is 5
541    Mat Object: (prec_) 1 MPI process
542      type: seqaij
543      rows=39, cols=39
544      total: nonzeros=653, allocated nonzeros=653
545      total number of mallocs used during MatSetValues calls=0
546        using I-node routines: found 24 nodes, limit used is 5
547```
548
549Going through this piece-by-piece, we can see all the parts of our solver. At the top level, we have a `SNES` using Newton's method
550
551```console
552$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
553SNES Object: 1 MPI process
554  type: newtonls
555  maximum iterations=50, maximum function evaluations=10000
556  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
557  total number of linear solver iterations=1
558  total number of function evaluations=2
559  norm schedule ALWAYS
560  SNESLineSearch Object: 1 MPI process
561    type: bt
562      interpolation: cubic
563      alpha=1.000000e-04
564    maxstep=1.000000e+08, minlambda=1.000000e-12
565    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
566    maximum iterations=40
567```
568
569For each nonlinear step, we use `KSPGMRES` to solve the Newton equation, preconditioned by `PCFIELDSPLIT`. We split the problem into two blocks, with the split determined by our `DM`, and combine those blocks using a Schur complement. The Schur complement is faithful since we use the `FULL` factorization type.
570
571```console
572$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
573  KSP Object: 1 MPI process
574    type: gmres
575      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
576      happy breakdown tolerance 1e-30
577    maximum iterations=10000, initial guess is zero
578    tolerances: relative=1e-09, absolute=1e-10, divergence=10000.
579    left preconditioning
580    using PRECONDITIONED norm type for convergence test
581  PC Object: 1 MPI process
582    type: fieldsplit
583      FieldSplit with Schur preconditioner, factorization FULL
584      Preconditioner for the Schur complement formed from A11
585      Split info:
586      Split number 0 Defined by IS
587      Split number 1 Defined by IS
588```
589
590We form the preconditioner for the Schur complement from the (1,1) block of our preconditioning matrix, which we have set to be the viscosity-weighted mass matrix
591
592```{literalinclude} /../src/snes/tutorials/ex69.c
593:end-before: /*
594:start-at: static void stokes_identity_J_kx
595```
596
597The solver for the first block, representing the velocity, is GMRES/LU. Note that the prefix is `fieldsplit_velocity_`, constructed automatically from the name of the field in our DM. Also note that there are two matrices, one from our original matrix, and one from our preconditioning matrix, but they are identical. In an optimized, scalable solver, this block would likely be solved by multigrid, but here we use LU for verification purposes.
598
599```console
600$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
601      KSP solver for A00 block
602        KSP Object: (fieldsplit_velocity_) 1 MPI process
603          type: gmres
604            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
605            happy breakdown tolerance 1e-30
606          maximum iterations=10000, initial guess is zero
607          tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
608          left preconditioning
609          using PRECONDITIONED norm type for convergence test
610        PC Object: (fieldsplit_velocity_) 1 MPI process
611          type: lu
612            out-of-place factorization
613            tolerance for zero pivot 2.22045e-14
614            matrix ordering: nd
615            factor fill ratio given 5., needed 1.15761
616              Factored matrix follows:
617                Mat Object: 1 MPI process
618                  type: seqaij
619                  rows=30, cols=30
620                  package used to perform factorization: petsc
621                  total: nonzeros=426, allocated nonzeros=426
622                    using I-node routines: found 17 nodes, limit used is 5
623          linear system matrix followed by preconditioner matrix:
624          Mat Object: 1 MPI process
625            type: seqaij
626            rows=30, cols=30
627            total: nonzeros=368, allocated nonzeros=368
628            total number of mallocs used during MatSetValues calls=0
629              using I-node routines: found 20 nodes, limit used is 5
630          Mat Object: (fieldsplit_velocity_) 1 MPI process
631            type: seqaij
632            rows=30, cols=30
633            total: nonzeros=368, allocated nonzeros=368
634            total number of mallocs used during MatSetValues calls=0
635              using I-node routines: found 20 nodes, limit used is 5
636```
637
638The solver for the second block, with prefix `fieldsplit_pressure_`, is also GMRES/LU, however we cannot factor the Schur complement operator since we never explicitly assemble it. Thus we assemble the viscosity-weighted mass matrix on the pressure space as an approximation. Notice that the Schur complement has the velocity solver embedded in it.
639
640```console
641$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
642      KSP solver for S = A11 - A10 inv(A00) A01
643        KSP Object: (fieldsplit_pressure_) 1 MPI process
644          type: gmres
645            restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
646            happy breakdown tolerance 1e-30
647          maximum iterations=10000, initial guess is zero
648          tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
649          left preconditioning
650          using PRECONDITIONED norm type for convergence test
651        PC Object: (fieldsplit_pressure_) 1 MPI process
652          type: lu
653            out-of-place factorization
654            tolerance for zero pivot 2.22045e-14
655            matrix ordering: nd
656            factor fill ratio given 5., needed 1.2439
657              Factored matrix follows:
658                Mat Object: 1 MPI process
659                  type: seqaij
660                  rows=9, cols=9
661                  package used to perform factorization: petsc
662                  total: nonzeros=51, allocated nonzeros=51
663                    not using I-node routines
664          linear system matrix followed by preconditioner matrix:
665          Mat Object: (fieldsplit_pressure_) 1 MPI process
666            type: schurcomplement
667            rows=9, cols=9
668              has attached null space
669              Schur complement A11 - A10 inv(A00) A01
670              A11
671                Mat Object: 1 MPI process
672                  type: seqaij
673                  rows=9, cols=9
674                  total: nonzeros=41, allocated nonzeros=41
675                  total number of mallocs used during MatSetValues calls=0
676                    has attached null space
677                    not using I-node routines
678              A10
679                Mat Object: 1 MPI process
680                  type: seqaij
681                  rows=9, cols=30
682                  total: nonzeros=122, allocated nonzeros=122
683                  total number of mallocs used during MatSetValues calls=0
684                    not using I-node routines
685              KSP of A00
686                KSP Object: (fieldsplit_velocity_) 1 MPI process
687                  type: gmres
688                    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
689                    happy breakdown tolerance 1e-30
690                  maximum iterations=10000, initial guess is zero
691                  tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
692                  left preconditioning
693                  using PRECONDITIONED norm type for convergence test
694                PC Object: (fieldsplit_velocity_) 1 MPI process
695                  type: lu
696                    out-of-place factorization
697                    tolerance for zero pivot 2.22045e-14
698                    matrix ordering: nd
699                    factor fill ratio given 5., needed 1.15761
700                      Factored matrix follows:
701                        Mat Object: 1 MPI process
702                          type: seqaij
703                          rows=30, cols=30
704                          package used to perform factorization: petsc
705                          total: nonzeros=426, allocated nonzeros=426
706                            using I-node routines: found 17 nodes, limit used is 5
707                  linear system matrix followed by preconditioner matrix:
708                  Mat Object: 1 MPI process
709                    type: seqaij
710                    rows=30, cols=30
711                    total: nonzeros=368, allocated nonzeros=368
712                    total number of mallocs used during MatSetValues calls=0
713                      using I-node routines: found 20 nodes, limit used is 5
714                  Mat Object: (fieldsplit_velocity_) 1 MPI process
715                    type: seqaij
716                    rows=30, cols=30
717                    total: nonzeros=368, allocated nonzeros=368
718                    total number of mallocs used during MatSetValues calls=0
719                      using I-node routines: found 20 nodes, limit used is 5
720              A01
721                Mat Object: 1 MPI process
722                  type: seqaij
723                  rows=30, cols=9
724                  total: nonzeros=122, allocated nonzeros=122
725                  total number of mallocs used during MatSetValues calls=0
726                    using I-node routines: found 20 nodes, limit used is 5
727          Mat Object: (fieldsplit_pressure_) 1 MPI process
728            type: seqaij
729            rows=9, cols=9
730            total: nonzeros=41, allocated nonzeros=41
731            total number of mallocs used during MatSetValues calls=0
732              not using I-node routines
733```
734
735Finally, the SNES viewer reports the system matrix and preconditioning matrix
736
737```console
738$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view"
739    linear system matrix followed by preconditioner matrix:
740    Mat Object: 1 MPI process
741      type: seqaij
742      rows=39, cols=39
743      total: nonzeros=653, allocated nonzeros=653
744      total number of mallocs used during MatSetValues calls=0
745        has attached null space
746        using I-node routines: found 24 nodes, limit used is 5
747    Mat Object: (prec_) 1 MPI process
748      type: seqaij
749      rows=39, cols=39
750      total: nonzeros=653, allocated nonzeros=653
751      total number of mallocs used during MatSetValues calls=0
752        using I-node routines: found 24 nodes, limit used is 5
753```
754
755We see that they have the same nonzero pattern, even though the preconditioning matrix only contains the diagonal blocks. This is because zeros were inserted to define the nonzero structure. We can remove these nonzeros by telling the DM not to insert zero at preallocation time, and also telling the matrix itself to ignore the zeros from the assembly process.
756
757```console
758$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_view -dm_preallocate_only -prec_mat_ignore_zero_entries"
759    linear system matrix followed by preconditioner matrix:
760    Mat Object: 1 MPI process
761      type: seqaij
762      rows=39, cols=39
763      total: nonzeros=653, allocated nonzeros=653
764      total number of mallocs used during MatSetValues calls=0
765        has attached null space
766        using I-node routines: found 24 nodes, limit used is 5
767    Mat Object: (prec_) 1 MPI process
768      type: seqaij
769      rows=39, cols=39
770      total: nonzeros=409, allocated nonzeros=653
771      total number of mallocs used during MatSetValues calls=0
772        using I-node routines: found 29 nodes, limit used is 5
773```
774
775We can see a sparsity portrait of the system and preconditioning matrices if the installation supports X-windows visualization
776
777```console
778$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-ksp_view_mat draw -prec_mat_view draw -draw_pause -1"
779$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-ksp_view_mat draw -prec_mat_view draw -draw_save $PETSC_DIR/mat.png"
780$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_preallocate_only -mat_ignore_zero_entries -prec_mat_ignore_zero_entries -ksp_view_mat draw -prec_mat_view draw -draw_save $PETSC_DIR/mat_sparse.png"
781```
782
783```{eval-rst}
784.. list-table::
785
786  * - .. figure:: /images/tutorials/physics/stokes_p2p1_sys_mat.png
787
788         System matrix
789
790    - .. figure:: /images/tutorials/physics/stokes_p2p1_sys_mat_sparse.png
791
792         System matrix with sparse stencil
793
794  * - .. figure:: /images/tutorials/physics/stokes_p2p1_prec_mat.png
795
796         Preconditioning matrix
797
798    - .. figure:: /images/tutorials/physics/stokes_p2p1_prec_mat_sparse.png
799
800         Preconditioning matrix with sparse stencil
801```
802
803If we want to check the convergence of the solver, we can also do that using options. Both the linear and nonlinear solvers converge in a single iteration, which is exactly what we want. In order to have this happen, we must have the tolerance on both the outer KSP solver and the inner Schur complement solver be low enough. Notice that the sure complement solver is used twice, and converges in seven iterates each time.
804
805```console
806$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason"
8070 SNES Function norm 1.170604545948e-01
808  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 7
809  0 KSP preconditioned resid norm 4.965098891419e-01 true resid norm 1.170604545948e-01 ||r(i)||/||b|| 1.000000000000e+00
810  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 7
811  1 KSP preconditioned resid norm 9.236813926190e-11 true resid norm 1.460072673561e-12 ||r(i)||/||b|| 1.247280884579e-11
812Linear solve converged due to CONVERGED_ATOL iterations 1
8131 SNES Function norm 1.460070661322e-12
814```
815
816We can look at the scalability of the solve by refining the mesh. We see that the Schur complement solve looks robust to grid refinement.
817
818```console
819$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_refine 2 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason"
8200 SNES Function norm 3.503062983054e-02
821  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
822  0 KSP preconditioned resid norm 9.943095979973e-01 true resid norm 3.503062983054e-02 ||r(i)||/||b|| 1.000000000000e+00
823  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
824  1 KSP preconditioned resid norm 1.148772629230e-10 true resid norm 2.693482255004e-13 ||r(i)||/||b|| 7.688934706664e-12
825Linear solve converged due to CONVERGED_RTOL iterations 1
8261 SNES Function norm 2.693649920420e-13
827$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_refine 4 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason"
8280 SNES Function norm 8.969202737759e-03
829  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 6
830  0 KSP preconditioned resid norm 3.322375727167e+00 true resid norm 8.969202737759e-03 ||r(i)||/||b|| 1.000000000000e+00
831  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 6
832  1 KSP preconditioned resid norm 6.112282404006e-10 true resid norm 8.543800889926e-14 ||r(i)||/||b|| 9.525708292843e-12
833Linear solve converged due to CONVERGED_RTOL iterations 1
8341 SNES Function norm 8.543893996362e-14
835```
836
837Starting off with an exact solver allows us to check that the discretization, equations, and boundary conditions are correct. Moreover, choosing the Schur complement formulation, rather than a sparse direct solve, gives us a path to incremental boost the scalability. Our first step will be to replace the direct solve of the momentum operator, which has cost superlinear in $N$, with a more scalable alternative. Since the operator is still elliptic, despite the viscosity variation, we should be able to use some form of multigrid. We will start with algebraic multigrid because it handles coefficient variation well, even if the setup time is larger than the geometric variant.
838
839```console
840$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1" EXTRA_OPTIONS="-dm_refine 2 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_velocity_pc_type gamg -fieldsplit_velocity_ksp_converged_reason"
8410 SNES Function norm 3.503062983054e-02
842  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
843  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
844  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
845  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
846  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 10
847  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
848  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
849  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
850  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
851  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
852  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
853  0 KSP preconditioned resid norm 9.943097452179e-01 true resid norm 3.503062983054e-02 ||r(i)||/||b|| 1.000000000000e+00
854  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
855  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
856  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
857  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
858  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 10
859  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
860  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
861  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
862  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 8
863  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
864  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
865  1 KSP preconditioned resid norm 1.503326145261e-05 true resid norm 1.089276827085e-06 ||r(i)||/||b|| 3.109498265814e-05
866  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 10
867  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
868  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
869  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
870  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
871  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
872  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
873  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
874  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
875  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 9
876  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 9
877  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 10
878  2 KSP preconditioned resid norm 1.353007845554e-10 true resid norm 6.056095141823e-11 ||r(i)||/||b|| 1.728799959098e-09
879Linear solve converged due to CONVERGED_RTOL iterations 2
8801 SNES Function norm 6.056096909907e-11
881```
882
883This looks alright, but the number of iterates grows with refinement. At 3 refinements, it is 16, 30 at 4 refinements, and 70 at 5 refinements. Increasing the number of smoother iterates to four, `-fieldsplit_velocity_mg_levels_ksp_max_it 4`, brings down the number of iterates, but not the growth. Using w-cycles and full multigrid does not help either. It is likely that the coarse grids made by MIS are inaccurate for the $P_2$ discretization.
884
885We can instead use geometric multigrid, and we would hope get more accurate coarse bases. The `-dm_refine_hierarchy` allows us to make a hierarchy of refined meshes and sets the number of multigrid levels automatically. Then all we need to specify is `-fieldsplit_velocity_pc_type mg`, as we see in the test
886
887```{literalinclude} /../src/snes/tutorials/ex69.c
888:end-before: 'test:'
889:start-at: 'suffix: p2p1_gmg'
890```
891
892This behaves well for the initial mesh,
893
894```console
895$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 2 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_velocity_ksp_converged_reason"
8960 SNES Function norm 3.503062983054e-02
897  0 KSP unpreconditioned resid norm 3.503062983054e-02 true resid norm 3.503062983054e-02 ||r(i)||/||b|| 1.000000000000e+00
898  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
899  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
900  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
901  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
902  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
903  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
904  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
905  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
906  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
907  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
908  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
909  1 KSP unpreconditioned resid norm 4.643855168829e-06 true resid norm 4.643855168807e-06 ||r(i)||/||b|| 1.325655630878e-04
910  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
911  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
912  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
913  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
914  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
915  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
916  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
917  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
918  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
919  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
920  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 9
921  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
922  2 KSP unpreconditioned resid norm 1.520240889941e-11 true resid norm 1.520239396618e-11 ||r(i)||/||b|| 4.339743258890e-10
923Linear solve converged due to CONVERGED_ATOL iterations 2
9241 SNES Function norm 1.520237877998e-11
925```
926
927and is also stable under refinement
928
929```console
930$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 4 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_velocity_ksp_converged_reason"
9310 SNES Function norm 3.503062983054e-02
932  0 KSP unpreconditioned resid norm 3.503062983054e-02 true resid norm 3.503062983054e-02 ||r(i)||/||b|| 1.000000000000e+00
933  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
934  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
935  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
936  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
937  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
938  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
939  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
940  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
941  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
942  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 8
943  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
944  1 KSP unpreconditioned resid norm 4.643855168829e-06 true resid norm 4.643855168807e-06 ||r(i)||/||b|| 1.325655630878e-04
945  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
946  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
947  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
948  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
949  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
950  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
951  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
952  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
953  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 4
954  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
955  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 9
956  Linear fieldsplit_velocity_ solve converged due to CONVERGED_RTOL iterations 5
957  2 KSP unpreconditioned resid norm 1.520240889941e-11 true resid norm 1.520239396618e-11 ||r(i)||/||b|| 4.339743258890e-10
958Linear solve converged due to CONVERGED_ATOL iterations 2
9591 SNES Function norm 1.520237877998e-11
960```
961
962Finally, we can back off the pressure solve. `ILU(0)` is good enough to maintain a constant number of iterates as we refine the grid. We could continue to refine our preconditioner by playing with the tolerance of the inner multigrid and Schur complement solves, trading fewer inner iterates for more outer iterates.
963
964```console
965$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 2 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_pressure_pc_type ilu"
9660 SNES Function norm 3.503062983054e-02
967  0 KSP unpreconditioned resid norm 3.503062983054e-02 true resid norm 3.503062983054e-02 ||r(i)||/||b|| 1.000000000000e+00
968  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
969  1 KSP unpreconditioned resid norm 4.643855785779e-06 true resid norm 4.643855785812e-06 ||r(i)||/||b|| 1.325655807011e-04
970  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
971  2 KSP unpreconditioned resid norm 1.521944777036e-11 true resid norm 1.521942998859e-11 ||r(i)||/||b|| 4.344606437913e-10
972Linear solve converged due to CONVERGED_ATOL iterations 2
9731 SNES Function norm 1.521943449163e-11
974$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 4 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_pressure_pc_type ilu"
9750 SNES Function norm 8.969202737759e-03
976  0 KSP unpreconditioned resid norm 8.969202737759e-03 true resid norm 8.969202737759e-03 ||r(i)||/||b|| 1.000000000000e+00
977  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
978  1 KSP unpreconditioned resid norm 2.234849111673e-05 true resid norm 2.234849111674e-05 ||r(i)||/||b|| 2.491692045566e-03
979  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
980  2 KSP unpreconditioned resid norm 1.205594722917e-10 true resid norm 1.205594316079e-10 ||r(i)||/||b|| 1.344148807121e-08
981  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
982  3 KSP unpreconditioned resid norm 1.461086575333e-15 true resid norm 2.284323415523e-15 ||r(i)||/||b|| 2.546852247977e-13
983Linear solve converged due to CONVERGED_ATOL iterations 3
9841 SNES Function norm 2.317901194143e-15
985$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 6 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_pressure_pc_type ilu"
9860 SNES Function norm 2.252260693635e-03
987  0 KSP unpreconditioned resid norm 2.252260693635e-03 true resid norm 2.252260693635e-03 ||r(i)||/||b|| 1.000000000000e+00
988  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 9
989  1 KSP unpreconditioned resid norm 1.220195757583e-05 true resid norm 1.220195757579e-05 ||r(i)||/||b|| 5.417648858445e-03
990  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
991  2 KSP unpreconditioned resid norm 2.683367607036e-09 true resid norm 2.683367591382e-09 ||r(i)||/||b|| 1.191410745197e-06
992  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 10
993  3 KSP unpreconditioned resid norm 5.510932827474e-13 true resid norm 5.511665167379e-13 ||r(i)||/||b|| 2.447170162386e-10
994Linear solve converged due to CONVERGED_ATOL iterations 3
9951 SNES Function norm 5.511916500930e-13
996```
997
998We can make the problem harder by increasing the wave number and size of the viscosity perturbation. If we set the $B$ parameter to 6.9, we have a factor of one million increase in viscosity across the cell. At this scale, we see that we lose enough accuracy in our Jacobian calculation to defeat our Taylor test, but we are still able to solve the problem efficiently.
999
1000```console
1001$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 2 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_pressure_pc_type ilu -m 2 -n 2 -B 6.9"
1002L_2 Error: [4.07817e-06, 0.0104694]
1003L_2 Residual: 0.0145403
1004Taylor approximation converging at order 1.00
10050 SNES Function norm 3.421266970274e-02
1006  0 KSP unpreconditioned resid norm 3.421266970274e-02 true resid norm 3.421266970274e-02 ||r(i)||/||b|| 1.000000000000e+00
1007  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 21
1008  1 KSP unpreconditioned resid norm 2.066264276201e-05 true resid norm 2.066264276201e-05 ||r(i)||/||b|| 6.039471032675e-04
1009  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 20
1010  2 KSP unpreconditioned resid norm 1.295461366009e-10 true resid norm 1.295461419342e-10 ||r(i)||/||b|| 3.786496144842e-09
1011  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 20
1012  3 KSP unpreconditioned resid norm 1.954355290546e-15 true resid norm 1.954135246291e-15 ||r(i)||/||b|| 5.711729786858e-14
1013Linear solve converged due to CONVERGED_ATOL iterations 3
10141 SNES Function norm 1.946196473520e-15
1015$ make -f ./gmakefile test globsearch="snes_tutorials-ex69_p2p1_gmg" EXTRA_OPTIONS="-dm_refine_hierarchy 6 -snes_monitor -ksp_monitor_true_residual -ksp_converged_reason -fieldsplit_pressure_ksp_converged_reason -fieldsplit_pressure_pc_type ilu -m 2 -n 2 -B 6.9"
1016L_2 Error: [1.52905e-09, 4.72606e-05]
1017L_2 Residual: 7.18836e-06
1018Taylor approximation converging at order 1.00
10190 SNES Function norm 2.252034794902e-03
1020  0 KSP unpreconditioned resid norm 2.252034794902e-03 true resid norm 2.252034794902e-03 ||r(i)||/||b|| 1.000000000000e+00
1021  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 19
1022  1 KSP unpreconditioned resid norm 1.843225742581e-05 true resid norm 1.843225742582e-05 ||r(i)||/||b|| 8.184712539768e-03
1023  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 19
1024  2 KSP unpreconditioned resid norm 1.410472862037e-09 true resid norm 1.410472860342e-09 ||r(i)||/||b|| 6.263104209294e-07
1025  Linear fieldsplit_pressure_ solve converged due to CONVERGED_RTOL iterations 19
1026  3 KSP unpreconditioned resid norm 1.051996270409e-14 true resid norm 1.064465321443e-14 ||r(i)||/||b|| 4.726682393419e-12
1027Linear solve converged due to CONVERGED_ATOL iterations 3
10281 SNES Function norm 1.063917948054e-14
1029```
1030
1031## Bibliography
1032
1033```{eval-rst}
1034.. bibliography:: /petsc.bib
1035   :filter: docname in docnames
1036```
1037