gemseo / mlearning / regression

# polyreg module¶

## Polynomial regression¶

Polynomial regression class is a particular case of the linear regression, where the input data is transformed before the regression is applied. This transform consists of creating a matrix of monomials (Vandermonde) by raising the input data to different powers up to a certain degree $$D$$. In the case where there is only one input variable, the input data $$(x_i)_{i=1, \dots, n}\in\mathbb{R}^n$$ is transformed into the Vandermonde matrix

$\begin{split}\begin{pmatrix} x_1^1 & x_1^2 & \cdots & x_1^D\\ x_2^1 & x_2^2 & \cdots & x_2^D\\ \vdots & \vdots & \ddots & \vdots\\ x_n^1 & x_n^2 & \cdots & x_n^D\\ \end{pmatrix} = (x_i^d)_{i=1, \dots, n;\ d=1, \dots, D}.\end{split}$

The output is expressed as a weighted sum of monomials:

$y = w_0 + w_1 x^1 + w_2 x^2 + ... + w_D x^D,$

where the coefficients $$(w_1, w_2, ..., w_d)$$ and the intercept $$w_0$$ are estimated by least square regression.

In the case of a multidimensional input, i.e. $$X = (x_{ij})_{i=1,\dots,n; j=1,\dots,m}$$, where $$n$$ is the number of samples and $$m$$ is the number of input variables, the Vandermonde matrix is expressed through different combinations of monomials of degree $$d, (1 \leq d \leq D)$$; e.g. for three variables $$(x, y, z)$$ and degree $$D=3$$, the different terms are $$x$$, $$y$$, $$z$$, $$x^2$$, $$xy$$, $$xz$$, $$y^2$$, $$yz$$, $$z^2$$, $$x^3$$, $$x^2y$$ etc. More generally, for m input variables, the total number of monomials of degree $$1 \leq d \leq D$$ is given by $$P = \binom{m+D}{m} = \frac{(m+D)!}{m!D!}$$. In the case of 3 input variables given above, the total number of monomial combinations of degree lesser than or equal to three is thus $$P = \binom{6}{3} = 20$$. The linear regression has to identify the coefficients $$(w_1, \dots, w_P)$$, in addition to the intercept $$w_0$$.

This concept is implemented through the PolynomialRegression class which inherits from the MLRegressionAlgo class.

### Dependence¶

The polynomial regression model relies on the LinearRegression class of the LinearRegression and PolynomialFeatures classes of the scikit-learn library.

class gemseo.mlearning.regression.polyreg.PolynomialRegression(data, degree, transformer=None, input_names=None, output_names=None, fit_intercept=True, penalty_level=0.0, l2_penalty_ratio=1.0, **parameters)[source]

Polynomial regression.

Constructor.

Parameters
• data (Dataset) – learning dataset.

• degree (int) – Degree of polynomial. Default: 2.

• transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.

• input_names (list(str)) – names of the input variables.

• output_names (list(str)) – names of the output variables.

• fit_intercept (bool) – if True, fit intercept. Default: True.

• penalty_level – penalty level greater or equal to 0. If 0, there is no penalty. Default: 0.

• l2_penalty_ratio (float) – penalty ratio related to the l2 regularization. If 1, the penalty is the Ridge penalty. If 0, this is the Lasso penalty. Between 0 and 1, the penalty is the ElasticNet penalty. Default: None.

ABBR = 'PolyReg'
LIBRARY = 'scikit-learn'
get_coefficients(as_dict=True)[source]

Return the regression coefficients of the linear fit as a numpy array or as a dict.

Parameters

as_dict (bool) – if True, returns coefficients as a dictionary. Default: True.

load_algo(directory)[source]