1(ch_regressor)= 2 3# PetscRegressor: Regression Solvers 4 5The `PetscRegressor` component provides some basic infrastructure and a general API for supervised 6machine learning tasks at a higher level of abstraction than a purely algebraic "solvers" view. 7Methods are currently available for 8 9- {any}`sec_regressor_linear` 10 11Note that by "regressor" we mean an algorithm or implementation used to fit and apply a regression 12model, following standard parlance in the machine-learning community. 13Regressor here does NOT mean an independent (or predictor) variable, as it often does in the 14statistics community. 15 16(sec_regressor_usage)= 17 18## Basic Regressor Usage 19 20`PetscRegressor` supports supervised learning tasks: 21Given a matrix of observed data $X$ with size $n_{samples}$ by $n_{features}$, 22predict a vector of "target" values $y$ (of size $n_{samples}$), where the $i$th entry of $y$ 23corresponds to the observation (or "sample") stored in the $i$th row of $X$. 24Traditionally, when the target consists of continuous values this is called "regression", 25and when it consists of discrete values (or "labels"), this task is called "classification"; 26we use `PetscRegressor` to support both of these cases. 27 28Before a regressor can be used to make predictions, the model must be fitted using an initial set of training data. 29Once a fitted model has been obtained, it can be used to predict target values for new observations. 30Every `PetscRegressor` implementation provides a `Fit()` and a `Predict()` method to support this workflow. 31Fitting (or "training") a model is a relatively computationally intensive task that generally involves solving an 32optimization problem (often using `TAO` solvers) to determine the model parameters, whereas making predictions 33(or performing "inference") is generally much simpler. 34 35Here, we introduce a simple example to demonstrate `PetscRegressor` usage. 36Please read {any}`sec_regressor_solvers` for more in-depth discussion. 37The code presented {any}`below <regressor-ex3>` solves an ordinary linear 38regression problem, with various options for regularization. 39 40In the simplest usage of a regressor, the user provides a training (or "design") matrix 41(`Mat`) and a target vector (`Vec`) against which to fit the model. 42Once the regressor is fitted, the user can then obtain a vector of predicted values for a set of new observations. 43 44PETSc's default method for solving regression problems is ordinary least squares, 45`REGRESSOR_LINEAR_OLS`, which is a sub-type of linear regressor, 46`PETSCREGRESSORLINEAR`. 47By "linear" we mean that the model $f(x, \theta)$ is linear in its coefficients $\theta$ 48but not necessarily linear in its features $x$. 49 50Note that data creation, option parsing, and cleaning stages are omitted here for 51clarity. The complete code is available in {ref}`ex3.c <regressor-ex3>`. 52 53(regressor-ex3)= 54:::{admonition} Listing: `src/ml/regressor/tests/ex3.c` 55```{literalinclude} /../src/ml/regressor/tests/ex3.c 56:prepend: '#include <petscregressor.h>' 57:start-at: int main 58:end-at: PetscFinalize 59:append: return 0;} 60``` 61::: 62 63To create a `PetscRegressor` instance, one must first call `PetscRegressorCreate()`: 64 65``` 66PetscRegressorCreate(MPI_Comm comm, PetscRegressor *regressor); 67``` 68 69To choose a regressor type, the user can either call 70 71``` 72PetscRegressorSetType(PetscRegressor regressor, PetscRegressorType type); 73``` 74 75or use the command-line option `-regressor_type <method>`; details regarding the 76available methods are presented in {any}`sec_regressor_solvers`. 77The application code can specify the options used by underlying linear, 78nonlinear, and optimization solver methods used in fitting the model 79by calling 80 81``` 82PetscRegressorSetFromOptions(regressor); 83``` 84 85which interfaces with the PETSc options database and enables convenient 86runtime selection of the type of regression algorithm and setting various 87various solver or problem parameters. 88This routine can also control all inner solver options in the `KSP`, and `Tao` 89modules, as discussed in {any}`ch_ksp`, {any}`ch_tao`. 90 91After having set these routines and options, the user can fit (or "train") the regressor 92by calling 93 94``` 95PetscRegressorFit(PetscRegressor regressor, Mat X, Vec y); 96``` 97 98where `X` is training data, and `y` is target values. 99Finally, after fitting the regressor, the user can compute model 100predictions, that is, perform inference, for a data matrix of unlabeled observations 101using the fitted regressor: 102 103``` 104PetscRegressorPredict(PetscRegressor regressor, Mat X, Vec y_predicted); 105``` 106 107Finally, after the user is done using the regressor, 108the user should destroy its `PetscRegressor` context with 109 110``` 111PetscRegressorDestroy(PetscRegressor *regressor); 112``` 113 114(sec_regressor_solvers)= 115 116## Regression Solvers 117 118One can see the list of regressor types in Table 119{any}`tab-regressordefaults`. Currently, we only support one type, 120`PETSCREGRESSORLINEAR`, although we plan to add several others in the near future. 121 122```{eval-rst} 123.. list-table:: PETSc Regressor 124 :name: tab-regressordefaults 125 :header-rows: 1 126 127 * - Method 128 - PetscRegressorType 129 - Options Name 130 * - Linear 131 - ``PETSCREGRESSORLINEAR`` 132 - ``linear`` 133``` 134 135If the particular method being employed is one that supports regularization, 136the user can set regularizer's weight via 137 138``` 139PetscRegressorSetRegularizerWeight(PetscRegressor regressor, PetscReal weight); 140``` 141 142or with the option `-regressor_regularizer_weight <weight>`. 143 144(sec_regressor_linear)= 145 146## Linear regressor 147 148The `PETSCREGRESSORLINEAR` (`-regressor_type linear`) implementation 149constructs a linear model to reduce the sum of squared differences 150between the actual target values ("observations") in the dataset and the target 151values estimated by the fitted model. 152By default, bound-constrained regularized Gauss-Newton `TAOBRGN` is used to solve the underlying optimization problem. 153 154Currently, linear regressor has three types, which are described 155in Table {any}`tab-lineartypes`. 156 157```{eval-rst} 158.. list-table:: Linear Regressor types 159 :name: tab-lineartypes 160 :header-rows: 1 161 162 * - Linear method 163 - ``PetscRegressorLinearType`` 164 - Options Name 165 * - Ordinary 166 - ``REGRESSOR_LINEAR_OLS`` 167 - ``ols`` 168 * - Lasso 169 - ``REGRESSOR_LINEAR_LASSO`` 170 - ``lasso`` 171 * - Ridge 172 - ``REGRESSOR_LINEAR_RIDGE`` 173 - ``ridge`` 174``` 175 176If one wishes, the user can (when appropriate) use `KSP` to solve the problem, instead of `Tao`, 177via 178 179``` 180PetscRegressorLinearSetUseKSP(PetscRegressor regressor, PetscBool flg); 181``` 182 183or with the option `-regressor_linear_use_ksp <true,false>`. 184 185Calculation of the intercept (also known as the "bias" or "offset") is performed 186separately from the rest of the model fitting process, because data sets are often 187already mean-centered and because it is generally undesirable to regularize the 188intercept term. 189By default, this step is omitted; if the user wishes to compute the intercept, 190this can be done by calling 191 192``` 193PetscRegressorLinearSetFitIntercept(PetscRegressor regressor, PetscBool flg); 194``` 195 196or by specifying the option `-regressor_linear_fit_intercept <true,false>`. 197 198For a fitted regression, one can obtain the intercept and 199a vector of the model coefficients from a linear regression model via 200 201``` 202PetscRegressorLinearGetCoefficients(PetscRegressor regressor, Vec *coefficients); 203PetscRegressorLinearGetIntercept(PetscRegressor regressor, PetscScalar *intercept); 204``` 205