gemseo / mlearning / regression

linreg module

Linear regression

The linear regression surrogate discipline expresses the model output as a weighted sum of the model inputs:

\[y = w_0 + w_1x_1 + w_2x_2 + ... + w_dx_d + \alpha \left( \lambda \|w\|_2 + (1-\lambda) \|w\|_1 \right),\]

where the coefficients \((w_1, w_2, ..., w_d)\) and the intercept \(w_0\) are estimated by least square regression. They are are easily accessible via the arguments coefficients and intercept.

The penalty level \(\alpha\) is a non-negative parameter intended to prevent overfitting, while the penalty ratio \(\lambda\in [0, 1]\) expresses the ratio between \(\ell_2\)- and \(\ell_1\)-regularization. When \(\lambda=1\), there is no \(\ell_1\)-regularization, and a Ridge regression is performed. When \(\lambda=0\), there is no \(\ell_2\)-regularization, and a Lasso regression is performed. For \(\lambda\) between 0 and 1, an elastic net regression is performed.

One may also choose not to penalize the regression at all, by setting \(\alpha=0\). In this case, a simple least squares regression is performed.

This concept is implemented through the LinearRegression class which inherits from the MLRegressionAlgo class.

Dependence

The linear model relies on the LinearRegression, Ridge, Lasso and ElasticNet classes of the scikit-learn library.

class gemseo.mlearning.regression.linreg.LinearRegression(data, transformer=None, input_names=None, output_names=None, fit_intercept=True, penalty_level=0.0, l2_penalty_ratio=1.0, **parameters)[source]

Bases: gemseo.mlearning.regression.regression.MLRegressionAlgo

Linear regression

Constructor.

Parameters
  • data (Dataset) – learning dataset.

  • transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.

  • input_names (list(str)) – names of the input variables.

  • output_names (list(str)) – names of the output variables.

  • fit_intercept (bool) – if True, fit intercept. Default: True.

  • penalty_level (float) – penalty level greater or equal to 0. If 0, there is no penalty. Default: 0.

  • l2_penalty_ratio (float) – penalty ratio related to the l2 regularization. If 1, the penalty is the Ridge penalty. If 0, this is the Lasso penalty. Between 0 and 1, the penalty is the ElasticNet penalty. Default: None.

ABBR = 'LinReg'
LIBRARY = 'scikit-learn'
property coefficients

Return the regression coefficients of the linear fit.

get_coefficients(as_dict=True)[source]

Return the regression coefficients of the linear fit as a numpy array or as a dict.

Parameters

as_dict (bool) – if True, returns coefficients as a dictionary. Default: True.

get_intercept(as_dict=True)[source]

Returns the regression intercept of the linear fit as a numpy array or as a dict.

Parameters

as_dict (bool) – if True, returns intercept as a dictionary. Default: True.

property intercept

Return the regression intercepts of the linear fit.