xref: /honee/doc/auxiliary.md (revision 0c608af58d6c66f4fcbe6fff5e0f907584234584)
1# Auxiliary Functionality
2This section documents functionality that is not apart of the core PDE solver, but is used for other miscellaneous tasks, such as statistics collection or in-situ machine learning.
3
4(aux-statistics)=
5## Statistics Collection
6For scale-resolving simulations (such as LES and DNS), statistics for a simulation are more often useful than time-instantaneous snapshots of the simulation itself.
7To make this process more computationally efficient, averaging in the spanwise direction, if physically correct, can help reduce the amount of simulation time needed to get converged statistics.
8
9First, let's more precisely define what we mean by spanwise average.
10Denote $\langle \phi \rangle$ as the Reynolds average of $\phi$, which in this case would be a average over the spanwise direction and time:
11
12$$
13\langle \phi \rangle(x,y) = \frac{1}{L_z + (T_f - T_0)}\int_0^{L_z} \int_{T_0}^{T_f} \phi(x, y, z, t) \mathrm{d}t \mathrm{d}z
14$$
15
16where $z$ is the spanwise direction, the domain has size $[0, L_z]$ in the spanwise direction, and $[T_0, T_f]$ is the range of time being averaged over.
17Note that here and in the code, **we assume the spanwise direction to be in the $z$ direction**.
18
19To discuss the details of the implementation we'll first discuss the spanwise integral, then the temporal integral, and lastly the statistics themselves.
20
21### Spanwise Integral
22The function $\langle \phi \rangle (x,y)$ is represented on a 2-D finite element grid, taken from the full domain mesh itself.
23If isoperiodicity is set, the periodic face is extracted as the spanwise statistics mesh.
24Otherwise the negative z face is used.
25We'll refer to this mesh as the *parent grid*, as for every "parent" point in the parent grid, there are many "child" points in the full domain.
26Define a function space on the parent grid as $\mathcal{V}_p^\mathrm{parent} = \{ \bm v(\bm x) \in H^{1}(\Omega_e^\mathrm{parent}) \,|\, \bm v(\bm x_e(\bm X)) \in P_p(\bm{I}), e=1,\ldots,N_e \}$.
27We enforce that the order of the parent FEM space is equal to the full domain's order.
28
29Many statistics are the product of 2 or more solution functions, which results in functions of degree higher than the parent FEM space, $\mathcal{V}_p^\mathrm{parent}$.
30To represent these higher-order functions on the parent FEM space, we perform an $L^2$ projection.
31Define the spanwise averaged function as:
32
33$$
34\langle \phi \rangle_z(x,y,t) = \frac{1}{L_z} \int_0^{L_z} \phi(x, y, z, t) \mathrm{d}z
35$$
36
37where the function $\phi$ may be the product of multiple solution functions and $\langle \phi \rangle_z$ denotes the spanwise average.
38The projection of a function $u$ onto the parent FEM space would look like:
39
40$$
41\bm M u_N = \int_0^{L_x} \int_0^{L_y} u \psi^\mathrm{parent}_N \mathrm{d}y \mathrm{d}x
42$$
43where $\bm M$ is the mass matrix for $\mathcal{V}_p^\mathrm{parent}$, $u_N$ the coefficients of the projected function, and $\psi^\mathrm{parent}_N$ the basis functions of the parent FEM space.
44Substituting the spanwise average of $\phi$ for $u$, we get:
45
46$$
47\bm M [\langle \phi \rangle_z]_N = \int_0^{L_x} \int_0^{L_y} \left [\frac{1}{L_z} \int_0^{L_z} \phi(x,y,z,t) \mathrm{d}z \right ] \psi^\mathrm{parent}_N(x,y) \mathrm{d}y \mathrm{d}x
48$$
49
50The triple integral in the right hand side is just an integral over the full domain
51
52$$
53\bm M [\langle \phi \rangle_z]_N = \frac{1}{L_z} \int_\Omega \phi(x,y,z,t) \psi^\mathrm{parent}_N(x,y) \mathrm{d}\Omega
54$$
55
56We need to evaluate $\psi^\mathrm{parent}_N$ at quadrature points in the full domain.
57To do this efficiently, **we assume and exploit the full domain grid to be a tensor product in the spanwise direction**.
58This assumption means quadrature points in the full domain have the same $(x,y)$ coordinate location as quadrature points in the parent domain.
59This also allows the use of the full domain quadrature weights for the triple integral.
60
61### Temporal Integral/Averaging
62To calculate the temporal integral, we do a running average using left-rectangle rule.
63At the beginning of each simulation, the time integral of a statistic is set to 0, $\overline{\phi} = 0$.
64Periodically, the integral is updated using left-rectangle rule:
65
66$$\overline{\phi}_\mathrm{new} = \overline{\phi}_{\mathrm{old}} + \phi(t_\mathrm{new}) \Delta T$$
67where $\phi(t_\mathrm{new})$ is the statistic at the current time and $\Delta T$ is the time since the last update.
68When stats are written out to file, this running sum is then divided by $T_f - T_0$ to get the time average.
69
70With this method of calculating the running time average, we can plug this into the $L^2$ projection of the spanwise integral:
71
72$$
73\bm M [\langle \phi \rangle]_N = \frac{1}{L_z + (T_f - T_0)} \int_\Omega \int_{T_0}^{T_f} \phi(x,y,z,t) \psi^\mathrm{parent}_N \mathrm{d}t \mathrm{d}\Omega
74$$
75where the integral $\int_{T_0}^{T_f} \phi(x,y,z,t) \mathrm{d}t$ is calculated on a running basis.
76
77
78### Running
79The commandline options below are given for each `<statsname>` depending on the statistics being collected.
80For example, the turbulent statistics use `turbulence`.
81
82As the simulation runs, it takes a running time average of the statistics at the full domain quadrature points.
83This running average is only updated at the interval specified by `-ts_monitor_spanstats_<statsname>_collect_interval` as number of timesteps.
84The $L^2$ projection problem is only solved when statistics are written to file, which is controlled by `-ts_monitor_spanstats_<statsname>_interval`.
85Note that the averaging is not reset after each file write.
86The average is always over the bounds $[T_0, T_f]$, where $T_f$ in this case would be the time the file was written at and $T_0$ is the solution time at the beginning of the run.
87
88:::{list-table} Spanwise Turbulent Statistics Runtime Options
89:header-rows: 1
90
91* - Option
92  - Description
93  - Default value
94
95* - `-ts_monitor_spanstats_<statsname>`
96  - Sets the `PetscViewer` for the statistics file writing, such as `cgns:output-%d.cgns` (requires PETSc `--download-cgns`). Also turns the statistics collection on.
97  -
98
99* - `-ts_monitor_spanstats_<statsname>_collect_interval`
100  - Number of timesteps between statistics collection
101  - `1`
102
103* - `-ts_monitor_spanstats_<statsname>_interval`
104  - Number of timesteps between statistics file writing (`-1` means only at end of run)
105  - `-1`
106
107* - `-ts_monitor_spanstats_<statsname>_cgns_batch_size`
108  - Number of frames written per CGNS file if the CGNS file name includes a format specifier (`%d`).
109  - `20`
110:::
111
112### Turbulent Statistics
113
114The commandline prefix `turbulence` (e.g. `-ts_monitor_spanstats_turbulence`) obtains statistics that are relevant to turbulent flow.
115The terms collected are listed below, with the mathematical definition on the left and the label (present in CGNS output files) is on the right.
116
117| Math                  | Label                           |
118| -----------------     | --------                        |
119| $\mean{\rho}$         | MeanDensity                     |
120| $\mean{p}$            | MeanPressure                    |
121| $\mean{p^2}$          | MeanPressureSquared             |
122| $\mean{p u_i}$        | MeanPressureVelocity[$i$]       |
123| $\mean{\rho T}$       | MeanDensityTemperature          |
124| $\mean{\rho T u_i}$   | MeanDensityTemperatureFlux[$i$] |
125| $\mean{\rho u_i}$     | MeanMomentum[$i$]               |
126| $\mean{\rho u_i u_j}$ | MeanMomentumFlux[$ij$]          |
127| $\mean{u_i}$          | MeanVelocity[$i$]               |
128
129where [$i$] are suffixes to the labels. So $\mean{\rho u_x u_y}$ would correspond to MeanMomentumFluxXY.
130This naming convention is chosen to align with the CGNS standard naming style.
131
132To get second-order statistics from these terms, simply use the identity:
133
134$$
135\mean{\phi' \theta'} = \mean{\phi \theta \rangle - \langle \phi} \mean{\theta}
136$$
137
138### Numerics ($\mathrm{Pe}$, $\mathrm{CFL}$)
139
140The commandline prefix `cflpe` (e.g. `-ts_monitor_spanstats_cflpe`) obtains statistics for CFL and Péclet number.
141These quantities have agreed-upon definitions for 1D, but in multiple dimensions their definitions depend on the definition of the grid length.
142Here, we define them as
143
144$$
145\Pe = \frac{\sqrt{\gbar{jk}^{-1} u_j u_k}}{\kappa}
146$$
147
148and
149
150$$
151\CFL = \dt\sqrt{\gbar{jk} u_j u_k}
152$$
153
154where $u_j$ is the (advection) velocity, $\kappa$ is the diffusion coefficient, $\gbar{jk}$ is the scaled element metric tensor.
155Note that these quantities do *not* account for the polynomial order of an element.
156The following statistics are computed:
157
158| Math              | Label          |
159| ----------------- | --------       |
160| $\mean{\CFL}$     | MeanCFL        |
161| $\mean{\CFL^2}$   | MeanCFLSquared |
162| $\mean{\CFL^3}$   | MeanCFLCubed   |
163| $\mean{\Pe}$      | MeanPe         |
164| $\mean{\Pe^2}$    | MeanPeSquared  |
165| $\mean{\Pe^3}$    | MeanPeCubed    |
166
167(aux-differential-filtering)=
168## Differential Filtering
169
170There is the option to filter the solution field using differential filtering.
171This was first proposed in {cite}`germanoDiffFilterLES1986`, using an inverse Hemholtz operator.
172The strong form of the differential equation is
173
174$$
175\overline{\phi} - \nabla \cdot (\beta (\bm{D}\bm{\Delta})^2 \nabla \overline{\phi} ) = \phi
176$$
177
178for $\phi$ the scalar solution field we want to filter, $\overline \phi$ the filtered scalar solution field, $\bm{\Delta} \in \mathbb{R}^{3 \times 3}$ a symmetric positive-definite rank 2 tensor defining the width of the filter, $\bm{D}$ is the filter width scaling tensor (also a rank 2 SPD tensor), and $\beta$ is a kernel scaling factor on the filter tensor.
179This admits the weak form:
180
181$$
182\int_\Omega \left( v \overline \phi + \beta \nabla v \cdot (\bm{D}\bm{\Delta})^2 \nabla \overline \phi \right) \,d\Omega
183- \cancel{\int_{\partial \Omega} \beta v \nabla \overline \phi \cdot (\bm{D}\bm{\Delta})^2 \bm{\hat{n}} \,d\partial\Omega} =
184\int_\Omega v \phi \, , \; \forall v \in \mathcal{V}_p
185$$
186
187The boundary integral resulting from integration-by-parts is crossed out, as we assume that $(\bm{D}\bm{\Delta})^2 = \bm{0} \Leftrightarrow \overline \phi = \phi$ at boundaries (this is reasonable at walls, but for convenience elsewhere).
188
189### Filter Width Tensor, Δ
190For homogenous filtering, $\bm{\Delta}$ is defined as the identity matrix.
191
192:::{note}
193It is common to denote a filter width dimensioned relative to the radial distance of the filter kernel.
194Note here we use the filter *diameter* instead, as that feels more natural (albeit mathematically less convenient).
195For example, under this definition a box filter would be defined as:
196
197$$
198B(\Delta; \bm{r}) =
199\begin{cases}
2001 & \Vert \bm{r} \Vert \leq \Delta/2 \\
2010 & \Vert \bm{r} \Vert > \Delta/2
202\end{cases}
203$$
204:::
205
206For inhomogeneous anisotropic filtering, we use the finite element grid itself to define $\bm{\Delta}$.
207This is set via `-diff_filter_grid_based_width`.
208Specifically, we use the filter width tensor defined in {cite}`prakashDDSGSAnisotropic2022`.
209For finite element grids, the filter width tensor is most conveniently defined by $\bm{\Delta} = \bm{g}^{-1/2}$ where $\bm g = \nabla_{\bm x} \bm{X} \cdot \nabla_{\bm x} \bm{X}$ is the metric tensor.
210
211### Filter Width Scaling Tensor, $\bm{D}$
212The filter width tensor $\bm{\Delta}$, be it defined from grid based sources or just the homogenous filtering, can be scaled anisotropically.
213The coefficients for that anisotropic scaling are given by `-diff_filter_width_scaling`, denoted here by $c_1, c_2, c_3$.
214The definition for $\bm{D}$ then becomes
215
216$$
217\bm{D} =
218\begin{bmatrix}
219    c_1 & 0        & 0        \\
220    0        & c_2 & 0        \\
221    0        & 0        & c_3 \\
222\end{bmatrix}
223$$
224
225In the case of $\bm{\Delta}$ being defined as homogenous, $\bm{D}\bm{\Delta}$ means that $\bm{D}$ effectively sets the filter width.
226
227The filtering at the wall may also be damped, to smoothly meet the $\overline \phi = \phi$ boundary condition at the wall.
228The selected damping function for this is the van Driest function {cite}`vandriestWallDamping1956`:
229
230$$
231\zeta = 1 - \exp\left(-\frac{y^+}{A^+}\right)
232$$
233
234where $y^+$ is the wall-friction scaled wall-distance ($y^+ = y u_\tau / \nu = y/\delta_\nu$), $A^+$ is some wall-friction scaled scale factor, and $\zeta$ is the damping coefficient.
235For this implementation, we assume that $\delta_\nu$ is constant across the wall and is defined by `-diff_filter_friction_length`.
236$A^+$ is defined by `-diff_filter_damping_constant`.
237
238To apply this scalar damping coefficient to the filter width tensor, we construct the wall-damping tensor from it.
239The construction implemented currently limits damping in the wall parallel directions to be no less than the original filter width defined by $\bm{\Delta}$.
240The wall-normal filter width is allowed to be damped to a zero filter width.
241It is currently assumed that the second component of the filter width tensor is in the wall-normal direction.
242Under these assumptions, $\bm{D}$ then becomes:
243
244$$
245\bm{D} =
246\begin{bmatrix}
247    \max(1, \zeta c_1) & 0         & 0                  \\
248    0                  & \zeta c_2 & 0                  \\
249    0                  & 0         & \max(1, \zeta c_3) \\
250\end{bmatrix}
251$$
252
253### Filter Kernel Scaling, β
254While we define $\bm{D}\bm{\Delta}$ to be of a certain physical filter width, the actual width of the implied filter kernel is quite larger than "normal" kernels.
255To account for this, we use $\beta$ to scale the filter tensor to the appropriate size, as is done in {cite}`bullExplicitFilteringExact2016`.
256To match the "size" of a normal kernel to our differential kernel, we attempt to have them match second order moments with respect to the prescribed filter width.
257To match the box and Gaussian filters "sizes", we use $\beta = 1/10$ and $\beta = 1/6$, respectively.
258$\beta$ can be set via `-diff_filter_kernel_scaling`.
259
260### Runtime Options
261
262:::{list-table} Differential Filtering Runtime Options
263:header-rows: 1
264
265* - Option
266  - Description
267  - Default value
268  - Unit
269
270* - `-diff_filter_monitor`
271  - Enable differential filter TSMonitor
272  - `false`
273  - boolean
274
275* - `-diff_filter_grid_based_width`
276  - Use filter width based on the grid size
277  - `false`
278  - boolean
279
280* - `-diff_filter_width_scaling`
281  - Anisotropic scaling for filter width in wall-aligned coordinates (snz)
282  - `1,1,1`
283  - `m`
284
285* - `-diff_filter_kernel_scaling`
286  - Scaling to make differential kernel size equivalent to other filter kernels
287  - `0.1`
288  - `m^2`
289
290* - `-diff_filter_wall_damping_function`
291  - Damping function to use at the wall for anisotropic filtering (`none`, `van_driest`)
292  - `none`
293  - string
294
295* - `-diff_filter_wall_damping_constant`
296  - Constant for the wall-damping function. $A^+$ for `van_driest` damping function.
297  - 25
298  -
299
300* - `-diff_filter_friction_length`
301  - Friction length associated with the flow, $\delta_\nu$. Used in wall-damping functions
302  - 0
303  - `m`
304:::
305
306(aux-smartsim)=
307## *In Situ* Data Analysis - SmartSim
308Data-analysis, including training of machine-learning models, normally uses *a priori* (already gathered) data stored on disk.
309This is computationally inefficient, particularly as the scale of the problem grows and the data that is saved to disk reduces to a small percentage of the total data generated by a simulation.
310One way of working around this is to perform whatever data analysis while the simulation is actively running.
311This is known as *in situ* (in place) data analysis.
312
313HONEE can facilitate *in situ* data analysis using [SmartSim](https://www.craylabs.org/docs/overview.html).
314HONEE will periodically place data into an in-memory database and a separate process can then read data from this database and perform analysis on it.
315SmartSim is responsible for orchestrating the running of HONEE and the data-analysis processes.
316The database used by SmartSim is [Redis](https://redis.com/modules/redis-ai/) and the library to connect to the database is called [SmartRedis](https://www.craylabs.org/docs/smartredis.html).
317More information about how to utilize this code in a SmartSim configuration can be found on [SmartSim's website](https://www.craylabs.org/docs/overview.html).
318While the primary intention of SmartSim is to enable machine learning interaction with normal HPC applications, any data-analysis process can be used e.g. data-compression, post-processing, etc.
319
320To use HONEE in a SmartSim *in situ* setup, HONEE must be built with SmartRedis enabled.
321This is done by specifying the installation directory of SmartRedis using the `SMARTREDIS_DIR` environment variable when building:
322
323```
324make SMARTREDIS_DIR=~/software/smartredis/install
325```
326
327SmartSim has the option of having the database either be colocated with the compute hardware or be located on separate nodes of a compute cluster.
328This is controlled by the SmartSim, but HONEE needs to know how many MPI ranks per database there are.
329This is set by `-smartsim_collocated_num_ranks`.
330
331### Runtime Options
332:::{list-table} SmartSim Runtime Options
333:header-rows: 1
334
335* - Option
336  - Description
337  - Default value
338  - Unit
339
340* - `-smartsim_collocated_num_ranks`
341  - Number of MPI ranks associated with each collocated database (i.e. ranks per node)
342  - `1`
343  -
344:::
345
346### Solution Data-Analysis
347
348The most basic functionality for *in situ* data analysis is to simply place the solution vector into the database.
349The solution vector added includes all periodic and strong boundaries, i.e. it includes the entire grid of the problem.
350This is enabled using the `-ts_monitor_smartsim_solution` flag.
351
352#### Runtime Options
353:::{list-table} *In Situ* Machine-Learning Training Runtime Options
354:header-rows: 1
355
356* - Option
357  - Description
358  - Default value
359  - Unit
360
361* - `-ts_monitor_smartsim_solution`
362  - Place solution data into the smartsim database
363  -
364  -
365
366* - `-ts_monitor_smartsim_solution_interval`
367  - Number of timesteps between writing solution data into SmartRedis database
368  - `1`
369  -
370
371* - `-ts_monitor_smartsim_solution_overwrite_data`
372  - Whether new solution data should overwrite old data on database
373  - `true`
374  - boolean
375:::
376
377(aux-in-situ-ml)=
378### SGS Data-Driven Model *In Situ* Training
379Currently the code is only setup to do *in situ* training for the SGS data-driven model.
380Training data is split into the model inputs and outputs.
381The model inputs are calculated as the same model inputs in the SGS Data-Driven model described in {ref}`sgs-dd-model`.
382The model outputs (or targets in the case of training) are the subgrid stresses.
383Both the inputs and outputs are computed from a filtered velocity field, which is calculated via {ref}`aux-differential-filtering`.
384The settings for the differential filtering used during training are described in {ref}`aux-differential-filtering`.
385The training will create multiple sets of data per each filter width defined in `-sgs_train_filter_widths`.
386Those scalar filter widths correspond to the scaling correspond to $\bm{D} = c \bm{I}$, where $c$ is the scalar filter width.
387
388The SGS *in situ* training can be enabled using the `-sgs_train_enable` flag.
389Data can be processed and placed into the database periodically.
390The interval between is controlled by `-sgs_train_write_data_interval`.
391There's also the choice of whether to add new training data on each database write or to overwrite the old data with new data.
392This is controlled by `-sgs_train_overwrite_data`.
393
394The database may also be located on the same node as a MPI rank (collocated) or located on a separate node (distributed).
395It's necessary to know how many ranks are associated with each collocated database, which is set by `-smartsim_collocated_database_num_ranks`.
396
397#### Runtime Options
398:::{list-table} *In Situ* Machine-Learning Training Runtime Options
399:header-rows: 1
400
401* - Option
402  - Description
403  - Default value
404  - Unit
405
406* - `-sgs_train_enable`
407  - Whether to enable *in situ* training of data-driven SGS model. Require building with SmartRedis.
408  - `false`
409  - boolean
410
411* - `-sgs_train_write_data_interval`
412  - Number of timesteps between writing training data into SmartRedis database
413  - `1`
414  -
415
416* - `-sgs_train_overwrite_data`
417  - Whether new training data should overwrite old data on database
418  - `true`
419  - boolean
420
421* - `-sgs_train_filter_widths`
422  - List of scalar values for different filter widths to calculate for training data
423  -
424  - `m`
425:::
426