xref: /petsc/doc/manual/regressor.md (revision 10632adc56aeb24e1cd352dce800f73ee42de22b)
1(ch_regressor)=
2
3# PetscRegressor: Regression Solvers
4
5The `PetscRegressor` component provides some basic infrastructure and a general API for supervised
6machine learning tasks at a higher level of abstraction than a purely algebraic "solvers" view.
7Methods are currently available for
8
9- {any}`sec_regressor_linear`
10
11Note that by "regressor" we mean an algorithm or implementation used to fit and apply a regression
12model, following standard parlance in the machine-learning community.
13Regressor here does NOT mean an independent (or predictor) variable, as it often does in the
14statistics community.
15
16(sec_regressor_usage)=
17
18## Basic Regressor Usage
19
20`PetscRegressor` supports supervised learning tasks:
21Given a matrix of observed data $X$ with size $n_{samples}$ by $n_{features}$,
22predict a vector of "target" values $y$ (of size $n_{samples}$), where the $i$th entry of $y$
23corresponds to the observation (or "sample") stored in the $i$th row of $X$.
24Traditionally, when the target consists of continuous values this is called "regression",
25and when it consists of discrete values (or "labels"), this task is called "classification";
26we use `PetscRegressor` to support both of these cases.
27
28Before a regressor can be used to make predictions, the model must be fitted using an initial set of training data.
29Once a fitted model has been obtained, it can be used to predict target values for new observations.
30Every `PetscRegressor` implementation provides a `Fit()` and a `Predict()` method to support this workflow.
31Fitting (or "training") a model is a relatively computationally intensive task that generally involves solving an
32optimization problem (often using `TAO` solvers) to determine the model parameters, whereas making predictions
33(or performing "inference") is generally much simpler.
34
35Here, we introduce a simple example to demonstrate `PetscRegressor` usage.
36Please read {any}`sec_regressor_solvers` for more in-depth discussion.
37The code presented {any}`below <regressor-ex3>` solves an ordinary linear
38regression problem, with various options for regularization.
39
40In the simplest usage of a regressor, the user provides a training (or "design") matrix
41(`Mat`) and a target vector (`Vec`) against which to fit the model.
42Once the regressor is fitted, the user can then obtain a vector of predicted values for a set of new observations.
43
44PETSc's default method for solving regression problems is ordinary least squares,
45`REGRESSOR_LINEAR_OLS`, which is a sub-type of linear regressor,
46`PETSCREGRESSORLINEAR`.
47By "linear" we mean that the model $f(x, \theta)$ is linear in its coefficients $\theta$
48but not necessarily linear in its features $x$.
49
50Note that data creation, option parsing, and cleaning stages are omitted here for
51clarity. The complete code is available in {ref}`ex3.c <regressor-ex3>`.
52
53(regressor-ex3)=
54:::{admonition} Listing: `src/ml/regressor/tests/ex3.c`
55```{literalinclude} /../src/ml/regressor/tests/ex3.c
56:prepend: '#include <petscregressor.h>'
57:start-at: int main
58:end-at: PetscFinalize
59:append: return 0;}
60```
61:::
62
63To create a `PetscRegressor` instance, one must first call `PetscRegressorCreate()`:
64
65```
66PetscRegressorCreate(MPI_Comm comm, PetscRegressor *regressor);
67```
68
69To choose a regressor type, the user can either call
70
71```
72PetscRegressorSetType(PetscRegressor regressor, PetscRegressorType type);
73```
74
75or use the command-line option `-regressor_type <method>`; details regarding the
76available methods are presented in {any}`sec_regressor_solvers`.
77The application code can specify the options used by underlying linear,
78nonlinear, and optimization solver methods used in fitting the model
79by calling
80
81```
82PetscRegressorSetFromOptions(regressor);
83```
84
85which interfaces with the PETSc options database and enables convenient
86runtime selection of the type of regression algorithm and setting various
87various solver or problem parameters.
88This routine can also control all inner solver options in the `KSP`, and `Tao`
89modules, as discussed in {any}`ch_ksp`, {any}`ch_tao`.
90
91After having set these routines and options, the user can fit (or "train") the regressor
92by calling
93
94```
95PetscRegressorFit(PetscRegressor regressor, Mat X, Vec y);
96```
97
98where `X` is training data, and `y` is target values.
99Finally, after fitting the regressor, the user can compute model
100predictions, that is, perform inference, for a data matrix of unlabeled observations
101using the fitted regressor:
102
103```
104PetscRegressorPredict(PetscRegressor regressor, Mat X, Vec y_predicted);
105```
106
107Finally, after the user is done using the regressor,
108the user should destroy its `PetscRegressor` context with
109
110```
111PetscRegressorDestroy(PetscRegressor *regressor);
112```
113
114(sec_regressor_solvers)=
115
116## Regression Solvers
117
118One can see the list of regressor types in Table
119{any}`tab-regressordefaults`. Currently, we only support one type,
120`PETSCREGRESSORLINEAR`, although we plan to add several others in the near future.
121
122```{eval-rst}
123.. list-table:: PETSc Regressor
124   :name: tab-regressordefaults
125   :header-rows: 1
126
127   * - Method
128     - PetscRegressorType
129     - Options Name
130   * - Linear
131     - ``PETSCREGRESSORLINEAR``
132     - ``linear``
133```
134
135If the particular method being employed is one that supports regularization,
136the user can set regularizer's weight via
137
138```
139PetscRegressorSetRegularizerWeight(PetscRegressor regressor, PetscReal weight);
140```
141
142or with the option `-regressor_regularizer_weight <weight>`.
143
144(sec_regressor_linear)=
145
146## Linear regressor
147
148The `PETSCREGRESSORLINEAR` (`-regressor_type linear`) implementation
149constructs a linear model to reduce the sum of squared differences
150between the actual target values ("observations") in the dataset and the target
151values estimated by the fitted model.
152By default, bound-constrained regularized Gauss-Newton `TAOBRGN` is used to solve the underlying optimization problem.
153
154Currently, linear regressor has three types, which are described
155in Table {any}`tab-lineartypes`.
156
157```{eval-rst}
158.. list-table:: Linear Regressor types
159   :name: tab-lineartypes
160   :header-rows: 1
161
162   * - Linear method
163     - ``PetscRegressorLinearType``
164     - Options Name
165   * - Ordinary
166     - ``REGRESSOR_LINEAR_OLS``
167     - ``ols``
168   * - Lasso
169     - ``REGRESSOR_LINEAR_LASSO``
170     - ``lasso``
171   * - Ridge
172     - ``REGRESSOR_LINEAR_RIDGE``
173     - ``ridge``
174```
175
176If one wishes, the user can (when appropriate) use `KSP` to solve the problem, instead of `Tao`,
177via
178
179```
180PetscRegressorLinearSetUseKSP(PetscRegressor regressor, PetscBool flg);
181```
182
183or with the option `-regressor_linear_use_ksp <true,false>`.
184
185Calculation of the intercept (also known as the "bias" or "offset") is performed
186separately from the rest of the model fitting process, because data sets are often
187already mean-centered and because it is generally undesirable to regularize the
188intercept term.
189By default, this step is omitted; if the user wishes to compute the intercept,
190this can be done by calling
191
192```
193PetscRegressorLinearSetFitIntercept(PetscRegressor regressor, PetscBool flg);
194```
195
196or by specifying the option `-regressor_linear_fit_intercept <true,false>`.
197
198For a fitted regression, one can obtain the intercept and
199a vector of the model coefficients from a linear regression model via
200
201```
202PetscRegressorLinearGetCoefficients(PetscRegressor regressor, Vec *coefficients);
203PetscRegressorLinearGetIntercept(PetscRegressor regressor, PetscScalar *intercept);
204```
205