gemseo.mlearning.regression.algos.polyreg module#

Polynomial regression model.

Polynomial regression is a particular case of the linear regression, where the input data is transformed before the regression is applied. This transform consists of creating a matrix of monomials by raising the input data to different powers up to a certain degree \(D\). In the case where there is only one input variable, the input data \((x_i)_{i=1, \dots, n}\in\mathbb{R}^n\) is transformed into the Vandermonde matrix:

\[\begin{split}\begin{pmatrix} x_1^1 & x_1^2 & \cdots & x_1^D\\ x_2^1 & x_2^2 & \cdots & x_2^D\\ \vdots & \vdots & \ddots & \vdots\\ x_n^1 & x_n^2 & \cdots & x_n^D\\ \end{pmatrix} = (x_i^d)_{i=1, \dots, n;\ d=1, \dots, D}.\end{split}\]

The output variable is expressed as a weighted sum of monomials:

\[y = w_0 + w_1 x^1 + w_2 x^2 + ... + w_D x^D,\]

where the coefficients \(w_1, w_2, ..., w_d\) and the intercept \(w_0\) are estimated by least square regression.

In the case of a multidimensional input, i.e. \(X = (x_{ij})_{i=1,\dots,n; j=1,\dots,m}\), where \(n\) is the number of samples and \(m\) is the number of input variables, the Vandermonde matrix is expressed through different combinations of monomials of degree \(d, (1 \leq d \leq D)\); e.g. for three variables \((x, y, z)\) and degree \(D=3\), the different terms are \(x\), \(y\), \(z\), \(x^2\), \(xy\), \(xz\), \(y^2\), \(yz\), \(z^2\), \(x^3\), \(x^2y\) etc. More generally, for \(m\) input variables, the total number of monomials of degree \(1 \leq d \leq D\) is given by \(P = \binom{m+D}{m} = \frac{(m+D)!}{m!D!}\). In the case of 3 input variables given above, the total number of monomial combinations of degree lesser than or equal to three is thus \(P = \binom{6}{3} = 20\). The linear regression has to identify the coefficients \(w_1, \dots, w_P\), in addition to the intercept \(w_0\).

Dependence#

The polynomial regression model relies on the LinearRegression and PolynomialFeatures classes of the scikit-learn library.

class PolynomialRegressor(data, settings_model=None, **settings)[source]#

Bases: LinearRegressor

Polynomial regression model.

Initialize self. See help(type(self)) for accurate signature.

Parameters:

data (Dataset) -- The training dataset.
settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If None, use **settings.
**settings (Any) -- The machine learning algorithm settings. These arguments are ignored when settings_model is not None.

Raises:

ValueError -- When both the variable and the group it belongs to have a transformer.

Settings#: alias of PolynomialRegressor_Settings

get_coefficients(as_dict=False)[source]#

Return the regression coefficients of the linear model.

Parameters:

as_dict (bool) --

If True, return the coefficients as a dictionary of Numpy arrays indexed by the names of the coefficients. Otherwise, return the coefficients as a Numpy array. For now the only valid value is False.

By default it is set to False.

Returns:

The regression coefficients of the linear model.

Raises:

NotImplementedError -- If the coefficients are required as a dictionary.

Return type:

DataType

SHORT_ALGO_NAME: ClassVar[str] = 'PolyReg'#

The short name of the machine learning algorithm, often an acronym.

Typically used for composite names, e.g. f"{algo.SHORT_ALGO_NAME}_{dataset.name}" or f"{algo.SHORT_ALGO_NAME}_{discipline.name}".