gemseo.mlearning.regression.algos.linreg module#

Linear regression model.

The linear regression model expresses the output variables as a weighted sum of the input ones:

\[y = w_0 + w_1x_1 + w_2x_2 + ... + w_dx_d + \alpha \left( \lambda \|w\|_2 + (1-\lambda) \|w\|_1 \right),\]

where the coefficients \((w_1, w_2, ..., w_d)\) and the intercept \(w_0\) are estimated by least square regression. They are easily accessible via the arguments coefficients and intercept.

The penalty level \(\alpha\) is a non-negative parameter intended to prevent overfitting, while the penalty ratio \(\lambda\in [0, 1]\) expresses the ratio between \(\ell_2\)- and \(\ell_1\)-regularization. When \(\lambda=1\), there is no \(\ell_1\)-regularization, and a Ridge regression is performed. When \(\lambda=0\), there is no \(\ell_2\)-regularization, and a Lasso regression is performed. For \(\lambda\) between 0 and 1, an Elastic Net regression is performed.

One may also choose not to penalize the regression at all, by setting \(\alpha=0\). In this case, a simple least squares regression is performed.

Dependence#

The linear model relies on the LinearRegression, Ridge, Lasso and ElasticNet classes of the scikit-learn library.

class LinearRegressor(data, settings_model=None, **settings)[source]#

Bases: BaseRegressor

Linear regression model.

Parameters:
  • data (Dataset) -- The learning dataset.

  • settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If None, use **settings.

  • **settings (Any) -- The machine learning algorithm settings. These arguments are ignored when settings_model is not None.

Raises:

ValueError -- When both the variable and the group it belongs to have a transformer.

Settings#

alias of LinearRegressor_Settings

get_coefficients(as_dict=True)[source]#

Return the regression coefficients of the linear model.

Parameters:

as_dict (bool) --

If True, return the coefficients as a dictionary. Otherwise, return the coefficients as a numpy.array

By default it is set to True.

Returns:

The regression coefficients of the linear model.

Raises:

ValueError -- If the coefficients are required as a dictionary even though the transformers change the variables dimensions.

Return type:

RealArray | dict[str, list[dict[str, list[float]]]]

get_intercept(as_dict=True)[source]#

Return the regression intercepts of the linear model.

Parameters:

as_dict (bool) --

If True, return the intercepts as a dictionary. Otherwise, return the intercepts as a numpy.array

By default it is set to True.

Returns:

The regression intercepts of the linear model.

Raises:

ValueError -- If the coefficients are required as a dictionary even though the transformers change the variables dimensions.

Return type:

RealArray | dict[str, list[float]]

LIBRARY: ClassVar[str] = 'scikit-learn'#

The name of the library of the wrapped machine learning algorithm.

SHORT_ALGO_NAME: ClassVar[str] = 'LinReg'#

The short name of the machine learning algorithm, often an acronym.

Typically used for composite names, e.g. f"{algo.SHORT_ALGO_NAME}_{dataset.name}" or f"{algo.SHORT_ALGO_NAME}_{discipline.name}".

property coefficients: RealArray#

The regression coefficients of the linear model.

property intercept: RealArray#

The regression intercepts of the linear model.