xref: /honee/doc/theory.md (revision 0fb1909ebd09dbfe42c9063e98290f6c2d28dbb1)
1# Theory and Background
2
3HONEE solves the time-dependent Navier-Stokes equations of compressible gas dynamics in a static Eulerian three-dimensional frame using unstructured high-order finite/spectral element spatial discretizations and explicit or implicit high-order time-stepping (available in PETSc).
4Moreover, the Navier-Stokes example has been developed using PETSc, so that the pointwise physics (defined at quadrature points) is separated from the parallelization and meshing concerns.
5
6## The Navier-Stokes Equations
7
8The mathematical formulation (from {cite}`shakib1991femcfd`) is given in what follows.
9The compressible Navier-Stokes equations in conservative form are
10
11$$
12\begin{aligned}
13\frac{\partial \rho}{\partial t} + \nabla \cdot \bm{U} &= 0 \\
14\frac{\partial \bm{U}}{\partial t} + \nabla \cdot \left( \frac{\bm{U} \otimes \bm{U}}{\rho} + P \bm{I}_3 -\bm\sigma \right) - \rho \bm{b}  &= 0 \\
15\frac{\partial E}{\partial t} + \nabla \cdot \left( \frac{(E + P)\bm{U}}{\rho} -\bm{u} \cdot \bm{\sigma} - k \nabla T \right) - \rho \bm{b} \cdot \bm{u} &= 0 \, , \\
16\end{aligned}
17$$ (eq-ns)
18
19where $\bm{\sigma} = \mu(\nabla \bm{u} + (\nabla \bm{u})^T + \lambda (\nabla \cdot \bm{u})\bm{I}_3)$ is the Cauchy (symmetric) stress tensor, with $\mu$ the dynamic viscosity coefficient, and $\lambda = - 2/3$ the Stokes hypothesis constant.
20In equations {eq}`eq-ns`, $\rho$ represents the volume mass density, $U$ the momentum density (defined as $\bm{U}=\rho \bm{u}$, where $\bm{u}$ is the vector velocity field), $E$ the total energy density (defined as $E = \rho e$, where $e$ is the total energy including thermal and kinetic but not potential energy), $\bm{I}_3$ represents the $3 \times 3$ identity matrix, $\bm{b}$ is a body force vector (e.g., gravity vector $\bm{g}$),  $k$ the thermal conductivity constant, $T$ represents the temperature, and $P$ the pressure, given by the following equation of state
21
22$$
23P = \left( {c_p}/{c_v} -1\right) \left( E - {\bm{U}\cdot\bm{U}}/{(2 \rho)} \right) \, ,
24$$ (eq-state)
25
26where $c_p$ is the specific heat at constant pressure and $c_v$ is the specific heat at constant volume (that define $\gamma = c_p / c_v$, the specific heat ratio).
27
28The system {eq}`eq-ns` can be rewritten in vector form
29
30$$
31\frac{\partial \bm{q}}{\partial t} + \nabla \cdot \bm{F}(\bm{q}) -S(\bm{q}) = 0 \, ,
32$$ (eq-vector-ns)
33
34for the state variables 5-dimensional vector
35
36$$
37\bm{q} =
38\begin{pmatrix}
39    \rho \\
40    \bm{U} \equiv \rho \bm{ u }\\
41    E \equiv \rho e
42\end{pmatrix}
43\begin{array}{l}
44    \leftarrow\textrm{ volume mass density}\\
45    \leftarrow\textrm{ momentum density}\\
46    \leftarrow\textrm{ energy density}
47\end{array}
48$$
49
50where the flux and the source terms, respectively, are given by
51
52$$
53\begin{aligned}
54\bm{F}(\bm{q}) &=
55\underbrace{\begin{pmatrix}
56    \bm{U}\\
57    {(\bm{U} \otimes \bm{U})}/{\rho} + P \bm{I}_3 \\
58    {(E + P)\bm{U}}/{\rho}
59\end{pmatrix}}_{\bm F_{\text{adv}}} +
60\underbrace{\begin{pmatrix}
610 \\
62-  \bm{\sigma} \\
63 - \bm{u}  \cdot \bm{\sigma} - k \nabla T
64\end{pmatrix}}_{\bm F_{\text{diff}}},\\
65S(\bm{q}) &=
66 \begin{pmatrix}
67    0\\
68    \rho \bm{b}\\
69    \rho \bm{b}\cdot \bm{u}
70\end{pmatrix}.
71\end{aligned}
72$$ (eq-ns-flux)
73
74### Finite Element Formulation (Spatial Discretization)
75
76Let the discrete solution be
77
78$$
79\bm{q}_N (\bm{x},t)^{(e)} = \sum_{k=1}^{P}\psi_k (\bm{x})\bm{q}_k^{(e)}
80$$
81
82with $P=p+1$ the number of nodes in the element $e$.
83We use tensor-product bases $\psi_{kji} = h_i(X_0)h_j(X_1)h_k(X_2)$.
84
85To obtain a finite element discretization, we first multiply the strong form {eq}`eq-vector-ns` by a test function $\bm v \in H^1(\Omega)$ and integrate,
86
87$$
88\int_{\Omega} \bm v \cdot \left(\frac{\partial \bm{q}_N}{\partial t} + \nabla \cdot \bm{F}(\bm{q}_N) - \bm{S}(\bm{q}_N) \right) \,dV = 0 \, , \; \forall \bm v \in \mathcal{V}_p\,,
89$$
90
91with $\mathcal{V}_p = \{ \bm v(\bm x) \in H^{1}(\Omega_e) \,|\, \bm v(\bm x_e(\bm X)) \in P_p(\bm{I}), e=1,\ldots,N_e \}$ a mapped space of polynomials containing at least polynomials of degree $p$ (with or without the higher mixed terms that appear in tensor product spaces).
92
93Integrating by parts on the divergence term, we arrive at the weak form,
94
95$$
96\begin{aligned}
97\int_{\Omega} \bm v \cdot \left( \frac{\partial \bm{q}_N}{\partial t} - \bm{S}(\bm{q}_N) \right)  \,dV
98- \int_{\Omega} \nabla \bm v \!:\! \bm{F}(\bm{q}_N)\,dV & \\
99+ \int_{\partial \Omega} \bm v \cdot \bm{F}(\bm q_N) \cdot \widehat{\bm{n}} \,dS
100  &= 0 \, , \; \forall \bm v \in \mathcal{V}_p \,,
101\end{aligned}
102$$ (eq-weak-vector-ns)
103
104where $\bm{F}(\bm q_N) \cdot \widehat{\bm{n}}$ is typically replaced with a boundary condition.
105
106:::{note}
107The notation $\nabla \bm v \!:\! \bm F$ represents contraction over both fields and spatial dimensions while a single dot represents contraction in just one, which should be clear from context, e.g., $\bm v \cdot \bm S$ contracts over fields while $\bm F \cdot \widehat{\bm n}$ contracts over spatial dimensions.
108:::
109
110### Time Discretization
111<!-- TODO: This should be reframed in terms of PETSc TS's F(t, u, \dot u) = G(t, u) rather than specifically about Explicit RK -->
112For the time discretization, we use two types of time stepping schemes through PETSc.
113
114
115#### Explicit Time-Stepping Method
116<!-- TODO:  This should talk about the mass operator and the options associated with it (i.e. lumped mass matrix )-->
117The following explicit formulation is solved with the adaptive Runge-Kutta-Fehlberg (RKF4-5) method by default (any explicit time-stepping scheme available in PETSc can be chosen at runtime)
118
119$$
120\bm{q}_N^{n+1} = \bm{q}_N^n + \Delta t \sum_{i=1}^{s} b_i k_i \, ,
121$$
122
123where
124
125$$
126\begin{aligned}
127    k_1 &= f(t^n, \bm{q}_N^n)\\
128    k_2 &= f(t^n + c_2 \Delta t, \bm{q}_N^n + \Delta t (a_{21} k_1))\\
129    k_3 &= f(t^n + c_3 \Delta t, \bm{q}_N^n + \Delta t (a_{31} k_1 + a_{32} k_2))\\
130    \vdots&\\
131    k_i &= f\left(t^n + c_i \Delta t, \bm{q}_N^n + \Delta t \sum_{j=1}^s a_{ij} k_j \right)\\
132\end{aligned}
133$$
134
135and with
136
137$$
138f(t^n, \bm{q}_N^n) = - [\nabla \cdot \bm{F}(\bm{q}_N)]^n + [S(\bm{q}_N)]^n \, .
139$$
140
141#### Implicit Time-Stepping Method
142
143This time stepping method which can be selected using the option `-implicit` is solved with Backward Differentiation Formula (BDF) method by default (similarly, any implicit time-stepping scheme available in PETSc can be chosen at runtime).
144The implicit formulation solves nonlinear systems for $\bm q_N$:
145
146$$
147\bm f(\bm q_N) \equiv \bm g(t^{n+1}, \bm{q}_N, \bm{\dot{q}}_N) = 0 \, ,
148$$ (eq-ts-implicit-ns)
149
150where the time derivative $\bm{\dot q}_N$ is defined by
151
152$$
153\bm{\dot{q}}_N(\bm q_N) = \alpha \bm q_N + \bm z_N
154$$
155
156in terms of $\bm z_N$ from prior state and $\alpha > 0$, both of which depend on the specific time integration scheme (backward difference formulas, generalized alpha, implicit Runge-Kutta, etc.).
157Each nonlinear system {eq}`eq-ts-implicit-ns` will correspond to a weak form, as explained below.
158In determining how difficult a given problem is to solve, we consider the Jacobian of {eq}`eq-ts-implicit-ns`,
159
160$$
161\frac{\partial \bm f}{\partial \bm q_N} = \frac{\partial \bm g}{\partial \bm q_N} + \alpha \frac{\partial \bm g}{\partial \bm{\dot q}_N}.
162$$
163
164The scalar "shift" $\alpha$ scales inversely with the time step $\Delta t$, so small time steps result in the Jacobian being dominated by the second term, which is a sort of "mass matrix", and typically well-conditioned independent of grid resolution with a simple preconditioner (such as Jacobi).
165In contrast, the first term dominates for large time steps, with a condition number that grows with the diameter of the domain and polynomial degree of the approximation space.
166Both terms are significant for time-accurate simulation and the setup costs of strong preconditioners must be balanced with the convergence rate of Krylov methods using weak preconditioners.
167
168More details of PETSc's time stepping solvers can be found in the [TS User Guide](https://petsc.org/release/docs/manual/ts/).
169
170### Stabilization
171We solve {eq}`eq-weak-vector-ns` using a Galerkin discretization (default) or a stabilized method, as is necessary for most real-world flows.
172
173Galerkin methods produce oscillations for transport-dominated problems (any time the cell Péclet number is larger than 1), and those tend to blow up for nonlinear problems such as the Euler equations and (low-viscosity/poorly resolved) Navier-Stokes, in which case stabilization is necessary.
174Our formulation follows {cite}`hughesetal2010`, which offers a comprehensive review of stabilization and shock-capturing methods for continuous finite element discretization of compressible flows.
175
176- **SUPG** (streamline-upwind/Petrov-Galerkin)
177
178  In this method, the weighted residual of the strong form {eq}`eq-vector-ns` is added to the Galerkin formulation {eq}`eq-weak-vector-ns`.
179  The weak form for this method is given as
180
181  $$
182  \begin{aligned}
183  \int_{\Omega} \bm v \cdot \left( \frac{\partial \bm{q}_N}{\partial t} - \bm{S}(\bm{q}_N) \right)  \,dV
184  - \int_{\Omega} \nabla \bm v \!:\! \bm{F}(\bm{q}_N)\,dV & \\
185  + \int_{\partial \Omega} \bm v \cdot \bm{F}(\bm{q}_N) \cdot \widehat{\bm{n}} \,dS & \\
186  + \int_{\Omega} \nabla\bm v \tcolon\left(\frac{\partial \bm F_{\text{adv}}}{\partial \bm q}\right) \bm\tau \left( \frac{\partial \bm{q}_N}{\partial t} \, + \,
187  \nabla \cdot \bm{F} \, (\bm{q}_N) - \bm{S}(\bm{q}_N) \right) \,dV &= 0
188  \, , \; \forall \bm v \in \mathcal{V}_p
189  \end{aligned}
190  $$ (eq-weak-vector-ns-supg)
191
192  This stabilization technique can be selected using the option `-stab supg`.
193
194- **SU** (streamline-upwind)
195
196  This method is a simplified version of *SUPG* {eq}`eq-weak-vector-ns-supg` which is developed for debugging/comparison purposes. The weak form for this method is
197
198  $$
199  \begin{aligned}
200  \int_{\Omega} \bm v \cdot \left( \frac{\partial \bm{q}_N}{\partial t} - \bm{S}(\bm{q}_N) \right)  \,dV
201  - \int_{\Omega} \nabla \bm v \!:\! \bm{F}(\bm{q}_N)\,dV & \\
202  + \int_{\partial \Omega} \bm v \cdot \bm{F}(\bm{q}_N) \cdot \widehat{\bm{n}} \,dS & \\
203  + \int_{\Omega} \nabla\bm v \tcolon\left(\frac{\partial \bm F_{\text{adv}}}{\partial \bm q}\right) \bm\tau \nabla \cdot \bm{F} \, (\bm{q}_N) \,dV
204  & = 0 \, , \; \forall \bm v \in \mathcal{V}_p
205  \end{aligned}
206  $$ (eq-weak-vector-ns-su)
207
208  This stabilization technique can be selected using the option `-stab su`.
209
210In both {eq}`eq-weak-vector-ns-su` and {eq}`eq-weak-vector-ns-supg`, $\bm\tau \in \mathbb R^{5\times 5}$ (field indices) is an intrinsic time scale matrix.
211The SUPG technique and the operator $\frac{\partial \bm F_{\text{adv}}}{\partial \bm q}$ (rather than its transpose) can be explained via an ansatz for subgrid state fluctuations $\tilde{\bm q} = -\bm\tau \bm r$ where $\bm r$ is a strong form residual.
212The forward variational form can be readily expressed by differentiating $\bm F_{\text{adv}}$ of {eq}`eq-ns-flux`
213
214$$
215\begin{aligned}
216\diff\bm F_{\text{adv}}(\diff\bm q; \bm q) &= \frac{\partial \bm F_{\text{adv}}}{\partial \bm q} \diff\bm q \\
217&= \begin{pmatrix}
218\diff\bm U \\
219(\diff\bm U \otimes \bm U + \bm U \otimes \diff\bm U)/\rho - (\bm U \otimes \bm U)/\rho^2 \diff\rho + \diff P \bm I_3 \\
220(E + P)\diff\bm U/\rho + (\diff E + \diff P)\bm U/\rho - (E + P) \bm U/\rho^2 \diff\rho
221\end{pmatrix},
222\end{aligned}
223$$
224
225where $\diff P$ is defined by differentiating {eq}`eq-state`.
226
227:::{dropdown} Stabilization scale $\bm\tau$
228A velocity vector $\bm u$ can be pulled back to the reference element as $\bm u_{\bm X} = \nabla_{\bm x}\bm X \cdot \bm u$, with units of reference length (non-dimensional) per second.
229To build intuition, consider a boundary layer element of dimension $(1, \epsilon)$, for which $\nabla_{\bm x} \bm X = \bigl(\begin{smallmatrix} 2 & \\ & 2/\epsilon \end{smallmatrix}\bigr)$.
230So a small normal component of velocity will be amplified (by a factor of the aspect ratio $1/\epsilon$) in this transformation.
231The ratio $\lVert \bm u \rVert / \lVert \bm u_{\bm X} \rVert$ is a covariant measure of (half) the element length in the direction of the velocity.
232A contravariant measure of element length in the direction of a unit vector $\hat{\bm n}$ is given by $\lVert \bigl(\nabla_{\bm X} \bm x\bigr)^T \hat{\bm n} \rVert$.
233While $\nabla_{\bm X} \bm x$ is readily computable, its inverse $\nabla_{\bm x} \bm X$ is needed directly in finite element methods and thus more convenient for our use.
234If we consider a parallelogram, the covariant measure is larger than the contravariant measure for vectors pointing between acute corners and the opposite holds for vectors between oblique corners.
235
236The cell Péclet number is classically defined by $\mathrm{Pe}_h = \lVert \bm u \rVert h / (2 \kappa)$ where $\kappa$ is the diffusivity (units of $m^2/s$).
237This can be generalized to arbitrary grids by defining the local Péclet number
238
239$$
240\mathrm{Pe} = \frac{\lVert \bm u \rVert^2}{\lVert \bm u_{\bm X} \rVert \kappa}.
241$$ (eq-peclet)
242
243For scalar advection-diffusion, the stabilization is a scalar
244
245$$
246\tau = \frac{\xi(\mathrm{Pe})}{\lVert \bm u_{\bm X} \rVert},
247$$ (eq-tau-advdiff)
248
249where $\xi(\mathrm{Pe}) = \coth \mathrm{Pe} - 1/\mathrm{Pe}$ approaches 1 at large local Péclet number.
250Note that $\tau$ has units of time and, in the transport-dominated limit, is proportional to element transit time in the direction of the propagating wave.
251For advection-diffusion, $\bm F(q) = \bm u q$, and thus the SU stabilization term is
252
253$$
254\nabla v \cdot \bm u \tau \bm u \cdot \nabla q = \nabla_{\bm X} v \cdot (\bm u_{\bm X} \tau \bm u_{\bm X}) \cdot \nabla_{\bm X} q .
255$$ (eq-su-stabilize-advdiff)
256
257where the term in parentheses is a rank-1 diffusivity tensor that has been pulled back to the reference element.
258See {cite}`hughesetal2010` equations 15-17 and 34-36 for further discussion of this formulation.
259
260#### Navier-Stokes $\tau$ definition
261
262For the Navier-Stokes and Euler equations, {cite}`whiting2003hierarchical` defines a $5\times 5$ diagonal stabilization $\mathrm{diag}(\tau_c, \tau_m, \tau_m, \tau_m, \tau_E)$ consisting of
2631. continuity stabilization $\tau_c$
2642. momentum stabilization $\tau_m$
2653. energy stabilization $\tau_E$
266
267The Navier-Stokes code in this example uses the following formulation for $\tau_c$, $\tau_m$, $\tau_E$:
268
269$$
270\begin{aligned}
271
272\tau_c &= \frac{C_c \mathcal{F}}{8\rho \trace(\bm g)} \\
273\tau_m &= \frac{C_m}{\mathcal{F}} \\
274\tau_E &= \frac{C_E}{\mathcal{F} c_v} \\
275\end{aligned}
276$$
277
278$$
279\mathcal{F} = \sqrt{ \rho^2 \left [ \left(\frac{2C_t}{\Delta t}\right)^2
280+ \bm u \cdot (\bm u \cdot  \bm g)\right]
281+ C_v \mu^2 \Vert \bm g \Vert_F ^2}
282$$
283
284where $\bm g = \nabla_{\bm x} \bm{X}^T \cdot \nabla_{\bm x} \bm{X}$ is the metric tensor and $\Vert \cdot \Vert_F$ is the Frobenius norm.
285This formulation is currently not available in the Euler code.
286
287#### Advection-Diffusion $\tau$ definition
288
289For Advection-Diffusion, we first examine a 1D definition given by:
290
291$$
292    \tau = \textrm{minreg}_2 \left\{\frac{\Delta t}{2 C_t},\ \frac{h}{aC_a}, \ \frac{h^2}{\kappa C_d} \right\}
293$$
294
295for $C_t$, $C_a$, $C_d$ being some scaling coefficients, $h$ is the element length, and $\textrm{minreg}_n \{x_j\} = (\sum_j x_j^{-n})^{-1/n}$.
296To make this definition compatible with higher dimensional domains, we use a similar system from the Navier-Stokes equations.
297This results in the following definition:
298
299$$
300\begin{aligned}
301    \tau &= \textrm{minreg}_2 \left \{ \frac{\Delta t}{2 C_t}, \frac{1}{C_a \sqrt{\bm u \cdot (\bm u \cdot  \bm g)}}, \frac{1}{C_d \kappa \Vert \bm g \Vert_F} \right\} \\
302         &= \left [ \left(\frac{2 C_t}{\Delta t}\right)^2 + C_a^2 \bm u \cdot (\bm u \cdot  \bm g) + \left(C_d \kappa\right)^2 \Vert \bm g \Vert_F^2\right]^{-1/2}
303\end{aligned}
304$$
305
306Note that $\bm g$ is scaled so that it is identity for a unit square, keeping this definition aligned with the traditional 1D definition, which uses the element length directly.
307The scaling coefficients are set via `-Ctau_t`, `-Ctau_a`, and `-Ctau_d`, respectively.
308
309#### Euler $\tau$ definition
310In the Euler code, we follow {cite}`hughesetal2010` in defining a $3\times 3$ diagonal stabilization according to spatial criterion 2 (equation 27) as follows.
311
312$$
313\tau_{ii} = c_{\tau} \frac{2 \xi(\mathrm{Pe})}{(\lambda_{\max \text{abs}})_i \lVert \nabla_{x_i} \bm X \rVert}
314$$ (eq-tau-conservative)
315
316where $c_{\tau}$ is a multiplicative constant reported to be optimal at 0.5 for linear elements, $\hat{\bm n}_i$ is a unit vector in direction $i$, and $\nabla_{x_i} = \hat{\bm n}_i \cdot \nabla_{\bm x}$ is the derivative in direction $i$.
317The flux Jacobian $\frac{\partial \bm F_{\text{adv}}}{\partial \bm q} \cdot \hat{\bm n}_i$ in each direction $i$ is a $5\times 5$ matrix with spectral radius $(\lambda_{\max \text{abs}})_i$ equal to the fastest wave speed.
318The complete set of eigenvalues of the Euler flux Jacobian in direction $i$ are (e.g., {cite}`toro2009`)
319
320$$
321\Lambda_i = [u_i - a, u_i, u_i, u_i, u_i+a],
322$$ (eq-eigval-advdiff)
323
324where $u_i = \bm u \cdot \hat{\bm n}_i$ is the velocity component in direction $i$ and $a = \sqrt{\gamma P/\rho}$ is the sound speed for ideal gasses.
325Note that the first and last eigenvalues represent nonlinear acoustic waves while the middle three are linearly degenerate, carrying a contact wave (temperature) and transverse components of momentum.
326The fastest wave speed in direction $i$ is thus
327
328$$
329\lambda_{\max \text{abs}} \Bigl( \frac{\partial \bm F_{\text{adv}}}{\partial \bm q} \cdot \hat{\bm n}_i \Bigr) = |u_i| + a
330$$ (eq-wavespeed)
331
332Note that this wave speed is specific to ideal gases as $\gamma$ is an ideal gas parameter; other equations of state will yield a different acoustic wave speed.
333
334:::
335
336Currently, this demo provides three types of problems/physical models that can be selected at run time via the option `-problem`.
337{ref}`problem-advection`, the problem of the transport of energy in a uniform vector velocity field, {ref}`problem-euler-vortex`, the exact solution to the Euler equations, and the so called {ref}`example-density-current` problem.
338
339### Divergence of Diffusive Flux Projection
340
341The strong residual in the SUPG operator in {eq}`eq-weak-vector-ns-supg` and {eq}`eq-weak-vector-ns-su` features the term $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$, the divergence of the diffusive flux.
342This term requires a second derivative to evaluate; first to evaluate $\bm \sigma$ and $\nabla T$ for $F_{\text{diff}}$, the second for the divergence of the flux.
343For linear elements, the flux is constant within each element so the second derivative is zero, leading to accuracy issues.
344Additionally, libCEED does not currently support calculating double-derivatives.
345To circumvent these issues, we (optionally) perform a projection operation to get $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$ at quadrature points.
346This was first proposed in {cite}`jansenDiffFluxProjection1999`.
347There are two methods of achieving this implemented in HONEE, denoted as the direct and indirect methods.
348
349#### Indirect Projection
350
351Indirect projection is the method presented in {cite}`jansenDiffFluxProjection1999`.
352Here, $\bm F_{\text{diff}}$ is $L^2$ projected onto the finite element space and then the divergence is taken from that FEM function.
353For linear basis functions, this leads to constant values of $\nabla \cdot \bm F_{\text{diff}}$ within each element.
354
355For compressible Navier-Stokes, this requires projecting 12 scalars-per-node: 4 conserved scalars (mass conservation does not have a diffusive flux term) in 3 dimensional directions.
356These 12 scalar finite element functions' derivatives are then evaluated at quadrature points and the divergence is calculated.
357This method can be selected with `-div_diff_flux_projection_method indirect`.
358
359#### Direct Projection
360In the direct projection method, we perform an $L^2$ projection of the divergence of the diffusive flux itself, $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$.
361Then $\nabla \cdot \bm F_{\text{diff}}$ itself is a function on the finite element space and can be interpolated onto quadrature points.
362
363To do this, look at the RHS of the $L^2$ projection of $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$:
364
365$$
366\int_{\Omega} \bm v \cdot \nabla \cdot \bm F_{\text{diff}}(\bm{q}_N) \,dV
367$$
368
369As noted, we can't calculate $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$ at quadrature points, so we apply integration-by-parts to achieve a calculable RHS:
370
371$$
372\int_{\partial \Omega} \bm v \cdot \bm{F}_{\text{diff}}(\bm q_N) \cdot \widehat{\bm{n}} \,dS
373- \int_{\Omega} \nabla \bm v \!:\! \bm{F}_{\text{diff}}(\bm{q}_N)\,dV
374$$
375
376This form is what is used for calculating the RHS of the projection.
377After the projection, $\nabla \cdot \bm F_{\text{diff}}(\bm{q}_N)$ is interpolated directly to quadrature points without any extra calculations necessary.
378For compressible Navier-Stokes, this means only projecting only 4 scalars-per-node.
379
380The projection can be enabled using `-div_diff_flux_projection_method direct`.
381
382#### General Information
383The $L^2$ projection in either method uses the standard mass matrix, which is rowsum lumped for performance by default.
384The linear solve for the projection can be controlled via `-div_diff_flux_projection_ksp*` flags.
385
386### Subgrid Stress Modeling
387
388When a fluid simulation is under-resolved (the smallest length scale resolved by the grid is much larger than the smallest physical scale, the [Kolmogorov length scale](https://en.wikipedia.org/wiki/Kolmogorov_microscales)), this is mathematically interpreted as filtering the Navier-Stokes equations.
389This is known as large-eddy simulation (LES), as only the "large" scales of turbulence are resolved.
390This filtering operation results in an extra stress-like term, $\bm{\tau}^r$, representing the effect of unresolved (or "subgrid" scale) structures in the flow.
391Denoting the filtering operation by $\overline \cdot$, the LES governing equations are:
392
393$$
394\frac{\partial \bm{\overline q}}{\partial t} + \nabla \cdot \bm{\overline F}(\bm{\overline q}) -S(\bm{\overline q}) = 0 \, ,
395$$ (eq-vector-les)
396
397where
398
399$$
400\bm{\overline F}(\bm{\overline q}) =
401\bm{F} (\bm{\overline q}) +
402\begin{pmatrix}
403    0\\
404     \bm{\tau}^r \\
405     \bm{u}  \cdot \bm{\tau}^r
406\end{pmatrix}
407$$ (eq-les-flux)
408
409More details on deriving the above expression, filtering, and large eddy simulation can be found in {cite}`popeTurbulentFlows2000`.
410To close the problem, the subgrid stress must be defined.
411For implicit LES, the subgrid stress is set to zero and the numerical properties of the discretized system are assumed to account for the effect of subgrid scale structures on the filtered solution field.
412For explicit LES, it is defined by a subgrid stress model.
413
414:::{list-table} SGS Model Options
415:header-rows: 1
416
417* - Option
418  - Description
419  - Default value
420  - Unit
421
422* - `-sgs_model_type`
423  - Type of subgrid stress model to use. Currently only `data_driven` is available
424  - `none`
425  - string
426:::
427
428(sgs-dd-model)=
429#### Data-Driven SGS Model
430
431The data-driven SGS model implemented here uses a small neural network to compute the SGS term.
432The SGS tensor is calculated at nodes using an $L^2$ projection of the velocity gradient and grid anisotropy tensor, and then interpolated onto quadrature points.
433More details regarding the theoretical background of the model can be found in {cite}`prakashDDSGS2022` and {cite}`prakashDDSGSAnisotropic2022`.
434
435The neural network itself consists of 1 hidden layer and 20 neurons, using Leaky ReLU as its activation function.
436The slope parameter for the Leaky ReLU function is set via `-sgs_model_dd_leakyrelu_alpha`.
437The outputs of the network are assumed to be normalized on a min-max scale, so they must be rescaled by the original min-max bounds.
438Parameters for the neural network are put into files in a directory found in `-sgs_model_dd_parameter_dir`.
439These files store the network weights (`w1.dat` and `w2.dat`), biases (`b1.dat` and `b2.dat`), and scaling parameters (`OutScaling.dat`).
440The first row of each files stores the number of columns and rows in each file.
441Note that the weight coefficients are assumed to be in column-major order.
442This is done to keep consistent with legacy file compatibility.
443
444:::{note}
445The data-driven model parameters in the examples directory are not accurate and are for regression testing only.
446:::
447
448##### Data-Driven Model Using External Libraries
449
450There are two different modes for using the data-driven model: fused and sequential.
451
452In fused mode, the input processing, model inference, and output handling were all done in a single CeedOperator.
453Fused mode is generally faster than the sequential mode, however fused mode requires that the model architecture be manually implemented into a libCEED QFunction.
454To use the fused mode, set `-sgs_model_dd_implementation fused`.
455
456Sequential mode has separate function calls/CeedOperators for input creation, model inference, and output handling.
457By separating the three steps of the model evaluation, the sequential mode allows for functions calling external libraries to be used for the model inference step.
458The use of these external libraries allows us to leverage the flexibility of those external libraries in their model architectures.
459
460PyTorch is currently the only external library implemented with the sequential mode.
461This is enabled with `USE_TORCH=1` during the build process, which will use the PyTorch accessible from the build environment's Python interpreter.
462To specify the path to the PyTorch model file, use `-sgs_model_dd_torch_model_path`.
463The hardware used to run the model inference is determined automatically from the libCEED backend chosen, but can be overridden with `-sgs_model_dd_torch_model_device`.
464Note that if you chose to run the inference on host while using a GPU libCEED backend (e.g. `/gpu/cuda`), then host-to-device transfers (and vice versa) will be done automatically.
465
466The sequential mode is available using a libCEED based inference evaluation via `-sgs_model_dd_implementation sequential_ceed`, but it is only for verification purposes.
467
468:::{list-table} Data-driven SGS Model Options
469:header-rows: 1
470
471* - Option
472  - Description
473  - Default value
474  - Unit
475
476* - `-sgs_model_dd_leakyrelu_alpha`
477  - Slope parameter for Leaky ReLU activation function. `0` corresponds to normal ReLU
478  - 0
479  -
480
481* - `-sgs_model_dd_parameter_dir`
482  - Path to directory with data-driven model parameters (weights, biases, etc.)
483  - `./dd_sgs_parameters`
484  - string
485
486* - `-sgs_model_dd_model_implementation`
487  - Which computational implementation to use for SGS DD model (`fused`, `sequential_ceed`, `sequential_torch`)
488  - `fused`
489  - string
490
491* - `-sgs_model_dd_torch_model_path`
492  - Path to the PyTorch `*.pt` file containing the DD inference model
493  -
494  - string
495
496* - `-sgs_model_dd_torch_model_device`
497  - What hardware to perform the model inference on (`cpu`, `cuda`, `hip`, `xpu`)
498  - Default matches the libCEED backend
499  - string
500:::
501
502
503## Boundary Conditions
504
505
506(bc-stg)=
507### Synthetic Turbulence Generation (STG)
508
509We use the STG method described in {cite}`shurSTG2014`.
510Below follows a re-description of the formulation to match the present notation, and then a description of the implementation and usage.
511
512#### Equation Formulation
513
514$$
515\bm{u}(\bm{x}, t) = \bm{\overline{u}}(\bm{x}) + \bm{C}(\bm{x}) \cdot \bm{v}'
516$$
517
518$$
519\begin{aligned}
520\bm{v}' &= 2 \sqrt{3/2} \sum^N_{n=1} \sqrt{q^n(\bm{x})} \bm{\sigma}^n \cos(\kappa^n \bm{d}^n \cdot \bm{\hat{x}}^n(\bm{x}, t) + \phi^n ) \\
521\bm{\hat{x}}^n &= \left[(x - U_0 t)\max(2\kappa_{\min}/\kappa^n, 0.1) , y, z  \right]^T
522\end{aligned}
523$$
524
525Here, we define the number of wavemodes $N$, set of random numbers $ \{\bm{\sigma}^n, \bm{d}^n, \phi^n\}_{n=1}^N$, the Cholesky decomposition of the Reynolds stress tensor $\bm{C}$ (such that $\bm{R} = \bm{CC}^T$ ), bulk velocity $U_0$, wavemode amplitude $q^n$, wavemode frequency $\kappa^n$, and $\kappa_{\min} = 0.5 \min_{\bm{x}} (\kappa_e)$.
526
527$$
528\kappa_e = \frac{2\pi}{\min(2d_w, 3.0 l_t)}
529$$
530
531where $l_t$ is the turbulence length scale, and $d_w$ is the distance to the nearest wall.
532
533
534The set of wavemode frequencies is defined by a geometric distribution:
535
536$$
537\kappa^n = \kappa_{\min} (1 + \alpha)^{n-1} \ , \quad \forall n=1, 2, ... , N
538$$
539
540The wavemode amplitudes $q^n$ are defined by a model energy spectrum $E(\kappa)$:
541
542$$
543q^n = \frac{E(\kappa^n) \Delta \kappa^n}{\sum^N_{n=1} E(\kappa^n)\Delta \kappa^n} \ ,\quad \Delta \kappa^n = \kappa^n - \kappa^{n-1}
544$$
545
546$$ E(\kappa) = \frac{(\kappa/\kappa_e)^4}{[1 + 2.4(\kappa/\kappa_e)^2]^{17/6}} f_\eta f_{\mathrm{cut}} $$
547
548$$
549f_\eta = \exp \left[-(12\kappa /\kappa_\eta)^2 \right], \quad
550f_\mathrm{cut} = \exp \left( - \left [ \frac{4\max(\kappa-0.9\kappa_\mathrm{cut}, 0)}{\kappa_\mathrm{cut}} \right]^3 \right)
551$$
552
553$\kappa_\eta$ represents turbulent dissipation frequency, and is given as $2\pi (\nu^3/\varepsilon)^{-1/4}$ with $\nu$ the kinematic viscosity and $\varepsilon$ the turbulent dissipation.
554$\kappa_\mathrm{cut}$ approximates the effective cutoff frequency of the mesh (viewing the mesh as a filter on solution over $\Omega$) and is given by:
555
556$$
557\kappa_\mathrm{cut} = \frac{2\pi}{ 2\min\{ [\max(h_y, h_z, 0.3h_{\max}) + 0.1 d_w], h_{\max} \} }
558$$
559
560The enforcement of the boundary condition is identical to the blasius inflow; it weakly enforces velocity, with the option of weakly enforcing either density or temperature using the `-weakT` flag.
561
562#### Initialization Data Flow
563
564Data flow for initializing function (which creates the context data struct) is given below:
565```{mermaid}
566flowchart LR
567    subgraph STGInflow.dat
568    y
569    lt[l_t]
570    eps
571    Rij[R_ij]
572    ubar
573    end
574
575    subgraph STGRand.dat
576    rand[RN Set];
577    end
578
579    subgraph User Input
580    u0[U0];
581    end
582
583    subgraph init[Create Context Function]
584    ke[k_e]
585    N;
586    end
587    lt --Calc-->ke --Calc-->kn
588    y --Calc-->ke
589
590    subgraph context[Context Data]
591    yC[y]
592    randC[RN Set]
593    Cij[C_ij]
594    u0 --Copy--> u0C[U0]
595    kn[k^n];
596    ubarC[ubar]
597    ltC[l_t]
598    epsC[eps]
599    end
600    ubar --Copy--> ubarC;
601    y --Copy--> yC;
602    lt --Copy--> ltC;
603    eps --Copy--> epsC;
604
605    rand --Copy--> randC;
606    rand --> N --Calc--> kn;
607    Rij --Calc--> Cij[C_ij]
608```
609
610This is done once at runtime.
611The spatially-varying terms are then evaluated at each quadrature point on-the-fly, either by interpolation (for $l_t$, $\varepsilon$, $C_{ij}$, and $\overline{\bm u}$) or by calculation (for $q^n$).
612
613The `STGInflow.dat` file is a table of values at given distances from the wall.
614These values are then interpolated to a physical location (node or quadrature point). It has the following format:
615```
616[Total number of locations] 14
617[d_w] [u_1] [u_2] [u_3] [R_11] [R_22] [R_33] [R_12] [R_13] [R_23] [sclr_1] [sclr_2] [l_t] [eps]
618```
619where each `[  ]` item is a number in scientific notation (ie. `3.1415E0`), and `sclr_1` and `sclr_2` are reserved for turbulence modeling variables.
620They are not used in this example.
621
622The `STGRand.dat` file is the table of the random number set, $\{\bm{\sigma}^n, \bm{d}^n, \phi^n\}_{n=1}^N$.
623It has the format:
624```
625[Number of wavemodes] 7
626[d_1] [d_2] [d_3] [phi] [sigma_1] [sigma_2] [sigma_3]
627```
628
629The following table is presented to help clarify the dimensionality of the numerous terms in the STG formulation.
630
631| Math                                           | Label    | $f(\bm{x})$?   | $f(n)$?   |
632| -----------------                              | -------- | -------------- | --------- |
633| $ \{\bm{\sigma}^n, \bm{d}^n, \phi^n\}_{n=1}^N$ | RN Set   | No             | Yes       |
634| $\bm{\overline{u}}$                            | ubar     | Yes            | No        |
635| $U_0$                                          | U0       | No             | No        |
636| $l_t$                                          | l_t      | Yes            | No        |
637| $\varepsilon$                                  | eps      | Yes            | No        |
638| $\bm{R}$                                       | R_ij     | Yes            | No        |
639| $\bm{C}$                                       | C_ij     | Yes            | No        |
640| $q^n$                                          | q^n      | Yes            | Yes       |
641| $\{\kappa^n\}_{n=1}^N$                         | k^n      | No             | Yes       |
642| $h_i$                                          | h_i      | Yes            | No        |
643| $d_w$                                          | d_w      | Yes            | No        |
644
645#### Runtime Options
646
647To use the STG boundary condition, the `-bc_inflow` option should be set to the boundary faces that need the inflow (see {ref}`bc-flags`).
648The `-stg_use` flag is then used to enable/disable applying STG to those faces.
649
650:::{list-table} STG Runtime Options
651:header-rows: 1
652
653* - Option
654  - Description
655  - Default value
656  - Unit
657
658* - `-stg_use`
659  - Enable STG for `bc_inflow` faces
660  - `false`
661  -
662
663* - `-stg_inflow_path`
664  - Path to the STGInflow file
665  - `./STGInflow.dat`
666  -
667
668* - `-stg_rand_path`
669  - Path to the STGRand file
670  - `./STGRand.dat`
671  -
672
673* - `-stg_alpha`
674  - Growth rate of the wavemodes
675  - `1.01`
676  -
677
678* - `-stg_u0`
679  - Convective velocity, $U_0$
680  - `0.0`
681  - `m/s`
682
683* - `-stg_mean_only`
684  - Only impose the mean velocity (no fluctutations)
685  - `false`
686  -
687
688* - `-stg_strong`
689  - Strongly enforce the STG inflow boundary condition
690  - `false`
691  -
692
693* - `-stg_fluctuating_IC`
694  - "Extrude" the fluctuations through the domain as an initial condition
695  - `false`
696  -
697
698* - `-stg_dx`
699  - Set the element size in the x direction. Default is calculated for box meshes, assuming equispaced elements.
700  -
701  - `m`
702
703* - `-stg_h_scale_factor`
704  - Scale element size for cutoff frequency calculation
705  - $1/p$
706  -
707:::
708
709### Internal Damping Layer (IDL)
710:::{note}
711IDL is not a boundary condition, but it's primary application is for use with STG.
712:::
713The STG inflow boundary condition creates large amplitude acoustic waves.
714We use an internal damping layer (IDL) to damp them out without disrupting the synthetic structures developing into natural turbulent structures.
715This implementation was inspired by {cite}`shurSTG2014`, but is implemented here as a ramped volumetric forcing term, similar to a sponge layer (see 8.4.2.4 in {cite}`colonius2023turbBC` for example).
716It takes the following form:
717
718$$
719S(\bm{q}) = -\sigma(\bm{x})\left.\frac{\partial \bm{q}}{\partial \bm{Y}}\right\rvert_{\bm{q}} \bm{Y}'
720$$
721
722where $\bm{Y}' = [P - P_\mathrm{ref}, \bm{0}, 0]^T$, and $\sigma(\bm{x})$ is a linear ramp starting at `-idl_start` with length `-idl_length` and an amplitude of inverse `-idl_decay_rate`.
723The damping is defined in terms of a pressure-primitive anomaly $\bm Y'$ converted to conservative source using $\partial \bm{q}/\partial \bm{Y}\rvert_{\bm{q}}$, which is linearized about the current flow state.
724$P_\mathrm{ref}$ has a default value equal to `-reference_pressure` flag, with an optional flag `-idl_pressure` to set it to a different value.
725
726:::{list-table} IDL Runtime Options
727:header-rows: 1
728
729* - Option
730  - Description
731  - Default value
732  - Unit
733
734* - `-idl_decay_time`
735  - Characteristic timescale of the pressure deviance decay. The timestep is good starting point
736  - `-1` (disabled)
737  - `s`
738
739* - `-idl_start`
740  - Start of IDL in the x direction
741  - `0`
742  - `m`
743
744* - `-idl_length`
745  - Length of IDL in the positive x direction
746  - `0`
747  - `m`
748
749* - `-idl_pressure`
750  - Pressure used for IDL reference pressure
751  -  `-reference_pressure`
752  - `Pa`
753:::
754
755
756<!-- TODO:  Move the Riemann/freestream description here-->
757