134b254c5SRichard Tran Mills(ch_regressor)= 234b254c5SRichard Tran Mills 334b254c5SRichard Tran Mills# PetscRegressor: Regression Solvers 434b254c5SRichard Tran Mills 5*572dc9e9SRichard Tran MillsThe `PetscRegressor` component provides some basic infrastructure and a general API for supervised 6*572dc9e9SRichard Tran Millsmachine learning tasks at a higher level of abstraction than a purely algebraic "solvers" view. 7*572dc9e9SRichard Tran MillsMethods are currently available for 834b254c5SRichard Tran Mills 934b254c5SRichard Tran Mills- {any}`sec_regressor_linear` 1034b254c5SRichard Tran Mills 11*572dc9e9SRichard Tran MillsNote that by "regressor" we mean an algorithm or implementation used to fit and apply a regression 12*572dc9e9SRichard Tran Millsmodel, following standard parlance in the machine-learning community. 13*572dc9e9SRichard Tran MillsRegressor here does NOT mean an independent (or predictor) variable, as it often does in the 1434b254c5SRichard Tran Millsstatistics community. 1534b254c5SRichard Tran Mills 1634b254c5SRichard Tran Mills(sec_regressor_usage)= 1734b254c5SRichard Tran Mills 1834b254c5SRichard Tran Mills## Basic Regressor Usage 1934b254c5SRichard Tran Mills 20*572dc9e9SRichard Tran Mills`PetscRegressor` supports supervised learning tasks: 21*572dc9e9SRichard Tran MillsGiven a matrix of observed data $X$ with size $n_{samples}$ by $n_{features}$, 22*572dc9e9SRichard Tran Millspredict a vector of "target" values $y$ (of size $n_{samples}$), where the $i$th entry of $y$ 23*572dc9e9SRichard Tran Millscorresponds to the observation (or "sample") stored in the $i$th row of $X$. 24*572dc9e9SRichard Tran MillsTraditionally, when the target consists of continuous values this is called "regression", 25*572dc9e9SRichard Tran Millsand when it consists of discrete values (or "labels"), this task is called "classification"; 26*572dc9e9SRichard Tran Millswe use `PetscRegressor` to support both of these cases. 27*572dc9e9SRichard Tran Mills 28*572dc9e9SRichard Tran MillsBefore a regressor can be used to make predictions, the model must be fitted using an initial set of training data. 29*572dc9e9SRichard Tran MillsOnce a fitted model has been obtained, it can be used to predict target values for new observations. 30*572dc9e9SRichard Tran MillsEvery `PetscRegressor` implementation provides a `Fit()` and a `Predict()` method to support this workflow. 31*572dc9e9SRichard Tran MillsFitting (or "training") a model is a relatively computationally intensive task that generally involves solving an 32*572dc9e9SRichard Tran Millsoptimization problem (often using `TAO` solvers) to determine the model parameters, whereas making predictions 33*572dc9e9SRichard Tran Mills(or performing "inference") is generally much simpler. 34*572dc9e9SRichard Tran Mills 3534b254c5SRichard Tran MillsHere, we introduce a simple example to demonstrate `PetscRegressor` usage. 3634b254c5SRichard Tran MillsPlease read {any}`sec_regressor_solvers` for more in-depth discussion. 37*572dc9e9SRichard Tran MillsThe code presented {any}`below <regressor-ex3>` solves an ordinary linear 38*572dc9e9SRichard Tran Millsregression problem, with various options for regularization. 3934b254c5SRichard Tran Mills 40*572dc9e9SRichard Tran MillsIn the simplest usage of a regressor, the user provides a training (or "design") matrix 41*572dc9e9SRichard Tran Mills(`Mat`) and a target vector (`Vec`) against which to fit the model. 42*572dc9e9SRichard Tran MillsOnce the regressor is fitted, the user can then obtain a vector of predicted values for a set of new observations. 4334b254c5SRichard Tran Mills 44*572dc9e9SRichard Tran MillsPETSc's default method for solving regression problems is ordinary least squares, 4534b254c5SRichard Tran Mills`REGRESSOR_LINEAR_OLS`, which is a sub-type of linear regressor, 4634b254c5SRichard Tran Mills`PETSCREGRESSORLINEAR`. 47*572dc9e9SRichard Tran MillsBy "linear" we mean that the model $f(x, \theta)$ is linear in its coefficients $\theta$ 48*572dc9e9SRichard Tran Millsbut not necessarily linear in its features $x$. 4934b254c5SRichard Tran Mills 50*572dc9e9SRichard Tran MillsNote that data creation, option parsing, and cleaning stages are omitted here for 51*572dc9e9SRichard Tran Millsclarity. The complete code is available in {ref}`ex3.c <regressor-ex3>`. 5234b254c5SRichard Tran Mills 5334b254c5SRichard Tran Mills(regressor-ex3)= 5434b254c5SRichard Tran Mills:::{admonition} Listing: `src/ml/regressor/tests/ex3.c` 5534b254c5SRichard Tran Mills```{literalinclude} /../src/ml/regressor/tests/ex3.c 5634b254c5SRichard Tran Mills:prepend: '#include <petscregressor.h>' 5734b254c5SRichard Tran Mills:start-at: int main 5834b254c5SRichard Tran Mills:end-at: PetscFinalize 5934b254c5SRichard Tran Mills:append: return 0;} 6034b254c5SRichard Tran Mills``` 6134b254c5SRichard Tran Mills::: 6234b254c5SRichard Tran Mills 63*572dc9e9SRichard Tran MillsTo create a `PetscRegressor` instance, one must first call `PetscRegressorCreate()`: 6434b254c5SRichard Tran Mills 6534b254c5SRichard Tran Mills``` 6634b254c5SRichard Tran MillsPetscRegressorCreate(MPI_Comm comm, PetscRegressor *regressor); 6734b254c5SRichard Tran Mills``` 6834b254c5SRichard Tran Mills 69*572dc9e9SRichard Tran MillsTo choose a regressor type, the user can either call 7034b254c5SRichard Tran Mills 7134b254c5SRichard Tran Mills``` 7234b254c5SRichard Tran MillsPetscRegressorSetType(PetscRegressor regressor, PetscRegressorType type); 7334b254c5SRichard Tran Mills``` 7434b254c5SRichard Tran Mills 75*572dc9e9SRichard Tran Millsor use the command-line option `-regressor_type <method>`; details regarding the 7634b254c5SRichard Tran Millsavailable methods are presented in {any}`sec_regressor_solvers`. 77*572dc9e9SRichard Tran MillsThe application code can specify the options used by underlying linear, 78*572dc9e9SRichard Tran Millsnonlinear, and optimization solver methods used in fitting the model 79*572dc9e9SRichard Tran Millsby calling 8034b254c5SRichard Tran Mills 8134b254c5SRichard Tran Mills``` 8234b254c5SRichard Tran MillsPetscRegressorSetFromOptions(regressor); 8334b254c5SRichard Tran Mills``` 8434b254c5SRichard Tran Mills 85*572dc9e9SRichard Tran Millswhich interfaces with the PETSc options database and enables convenient 86*572dc9e9SRichard Tran Millsruntime selection of the type of regression algorithm and setting various 87*572dc9e9SRichard Tran Millsvarious solver or problem parameters. 88*572dc9e9SRichard Tran MillsThis routine can also control all inner solver options in the `KSP`, and `Tao` 8934b254c5SRichard Tran Millsmodules, as discussed in {any}`ch_ksp`, {any}`ch_tao`. 9034b254c5SRichard Tran Mills 91*572dc9e9SRichard Tran MillsAfter having set these routines and options, the user can fit (or "train") the regressor 9234b254c5SRichard Tran Millsby calling 9334b254c5SRichard Tran Mills 9434b254c5SRichard Tran Mills``` 9534b254c5SRichard Tran MillsPetscRegressorFit(PetscRegressor regressor, Mat X, Vec y); 9634b254c5SRichard Tran Mills``` 9734b254c5SRichard Tran Mills 9834b254c5SRichard Tran Millswhere `X` is training data, and `y` is target values. 99*572dc9e9SRichard Tran MillsFinally, after fitting the regressor, the user can compute model 100*572dc9e9SRichard Tran Millspredictions, that is, perform inference, for a data matrix of unlabeled observations 101*572dc9e9SRichard Tran Millsusing the fitted regressor: 10234b254c5SRichard Tran Mills 10334b254c5SRichard Tran Mills``` 10434b254c5SRichard Tran MillsPetscRegressorPredict(PetscRegressor regressor, Mat X, Vec y_predicted); 10534b254c5SRichard Tran Mills``` 10634b254c5SRichard Tran Mills 107*572dc9e9SRichard Tran MillsFinally, after the user is done using the regressor, 108*572dc9e9SRichard Tran Millsthe user should destroy its `PetscRegressor` context with 10934b254c5SRichard Tran Mills 11034b254c5SRichard Tran Mills``` 11134b254c5SRichard Tran MillsPetscRegressorDestroy(PetscRegressor *regressor); 11234b254c5SRichard Tran Mills``` 11334b254c5SRichard Tran Mills 11434b254c5SRichard Tran Mills(sec_regressor_solvers)= 11534b254c5SRichard Tran Mills 11634b254c5SRichard Tran Mills## Regression Solvers 11734b254c5SRichard Tran Mills 118*572dc9e9SRichard Tran MillsOne can see the list of regressor types in Table 11934b254c5SRichard Tran Mills{any}`tab-regressordefaults`. Currently, we only support one type, 120*572dc9e9SRichard Tran Mills`PETSCREGRESSORLINEAR`, although we plan to add several others in the near future. 12134b254c5SRichard Tran Mills 12234b254c5SRichard Tran Mills```{eval-rst} 12334b254c5SRichard Tran Mills.. list-table:: PETSc Regressor 12434b254c5SRichard Tran Mills :name: tab-regressordefaults 12534b254c5SRichard Tran Mills :header-rows: 1 12634b254c5SRichard Tran Mills 12734b254c5SRichard Tran Mills * - Method 12834b254c5SRichard Tran Mills - PetscRegressorType 12934b254c5SRichard Tran Mills - Options Name 13034b254c5SRichard Tran Mills * - Linear 13134b254c5SRichard Tran Mills - ``PETSCREGRESSORLINEAR`` 13234b254c5SRichard Tran Mills - ``linear`` 13334b254c5SRichard Tran Mills``` 13434b254c5SRichard Tran Mills 135*572dc9e9SRichard Tran MillsIf the particular method being employed is one that supports regularization, 13634b254c5SRichard Tran Millsthe user can set regularizer's weight via 13734b254c5SRichard Tran Mills 13834b254c5SRichard Tran Mills``` 13934b254c5SRichard Tran MillsPetscRegressorSetRegularizerWeight(PetscRegressor regressor, PetscReal weight); 14034b254c5SRichard Tran Mills``` 14134b254c5SRichard Tran Mills 142*572dc9e9SRichard Tran Millsor with the option `-regressor_regularizer_weight <weight>`. 14334b254c5SRichard Tran Mills 14434b254c5SRichard Tran Mills(sec_regressor_linear)= 14534b254c5SRichard Tran Mills 14634b254c5SRichard Tran Mills## Linear regressor 14734b254c5SRichard Tran Mills 148*572dc9e9SRichard Tran MillsThe `PETSCREGRESSORLINEAR` (`-regressor_type linear`) implementation 14934b254c5SRichard Tran Millsconstructs a linear model to reduce the sum of squared differences 150*572dc9e9SRichard Tran Millsbetween the actual target values ("observations") in the dataset and the target 151*572dc9e9SRichard Tran Millsvalues estimated by the fitted model. 152*572dc9e9SRichard Tran MillsBy default, bound-constrained regularized Gauss-Newton `TAOBRGN` is used to solve the underlying optimization problem. 15334b254c5SRichard Tran Mills 15434b254c5SRichard Tran MillsCurrently, linear regressor has three types, which are described 15534b254c5SRichard Tran Millsin Table {any}`tab-lineartypes`. 15634b254c5SRichard Tran Mills 15734b254c5SRichard Tran Mills```{eval-rst} 15834b254c5SRichard Tran Mills.. list-table:: Linear Regressor types 15934b254c5SRichard Tran Mills :name: tab-lineartypes 16034b254c5SRichard Tran Mills :header-rows: 1 16134b254c5SRichard Tran Mills 16234b254c5SRichard Tran Mills * - Linear method 16334b254c5SRichard Tran Mills - ``PetscRegressorLinearType`` 16434b254c5SRichard Tran Mills - Options Name 16534b254c5SRichard Tran Mills * - Ordinary 16634b254c5SRichard Tran Mills - ``REGRESSOR_LINEAR_OLS`` 16734b254c5SRichard Tran Mills - ``ols`` 16834b254c5SRichard Tran Mills * - Lasso 16934b254c5SRichard Tran Mills - ``REGRESSOR_LINEAR_LASSO`` 17034b254c5SRichard Tran Mills - ``lasso`` 17134b254c5SRichard Tran Mills * - Ridge 17234b254c5SRichard Tran Mills - ``REGRESSOR_LINEAR_RIDGE`` 17334b254c5SRichard Tran Mills - ``ridge`` 17434b254c5SRichard Tran Mills``` 17534b254c5SRichard Tran Mills 17634b254c5SRichard Tran MillsIf one wishes, the user can (when appropriate) use `KSP` to solve the problem, instead of `Tao`, 17734b254c5SRichard Tran Millsvia 17834b254c5SRichard Tran Mills 17934b254c5SRichard Tran Mills``` 18034b254c5SRichard Tran MillsPetscRegressorLinearSetUseKSP(PetscRegressor regressor, PetscBool flg); 18134b254c5SRichard Tran Mills``` 18234b254c5SRichard Tran Mills 18334b254c5SRichard Tran Millsor with the option `-regressor_linear_use_ksp <true,false>`. 18434b254c5SRichard Tran Mills 185*572dc9e9SRichard Tran MillsCalculation of the intercept (also known as the "bias" or "offset") is performed 186*572dc9e9SRichard Tran Millsseparately from the rest of the model fitting process, because data sets are often 187*572dc9e9SRichard Tran Millsalready mean-centered and because it is generally undesirable to regularize the 188*572dc9e9SRichard Tran Millsintercept term. 189*572dc9e9SRichard Tran MillsBy default, this step is omitted; if the user wishes to compute the intercept, 190*572dc9e9SRichard Tran Millsthis can be done by calling 19134b254c5SRichard Tran Mills 19234b254c5SRichard Tran Mills``` 19334b254c5SRichard Tran MillsPetscRegressorLinearSetFitIntercept(PetscRegressor regressor, PetscBool flg); 19434b254c5SRichard Tran Mills``` 19534b254c5SRichard Tran Mills 196*572dc9e9SRichard Tran Millsor by specifying the option `-regressor_linear_fit_intercept <true,false>`. 19734b254c5SRichard Tran Mills 198*572dc9e9SRichard Tran MillsFor a fitted regression, one can obtain the intercept and 199*572dc9e9SRichard Tran Millsa vector of the model coefficients from a linear regression model via 20034b254c5SRichard Tran Mills 20134b254c5SRichard Tran Mills``` 20234b254c5SRichard Tran MillsPetscRegressorLinearGetCoefficients(PetscRegressor regressor, Vec *coefficients); 20334b254c5SRichard Tran MillsPetscRegressorLinearGetIntercept(PetscRegressor regressor, PetscScalar *intercept); 20434b254c5SRichard Tran Mills``` 205