supervised module¶
Supervised machine learning algorithm¶
Supervised machine learning is a task of learning relationships between input and output variables based on an input-output dataset. One usually distinguishes between to types of supervised machine learning algorithms, based on the nature of the outputs. For a continuous output variable, a regression is performed, while for a discrete output variable, a classification is performed.
Given a set of input variables \(x \in \mathbb{R}^{n_{\text{samples}}\times n_{\text{inputs}}}\) and a set of output variables \(y\in \mathbb{K}^{n_{\text{samples}}\times n_{\text{outputs}}}\), where \(n_{\text{inputs}}\) is the dimension of the input variable, \(n_{\text{outputs}}\) is the dimension of the output variable, \(n_{\text{samples}}\) is the number of training samples and \(\mathbb{K}\) is either \(\mathbb{R}\) or \(\mathbb{N}\) for regression and classification tasks respectively, a supervised learning algorithm seeks to find a function \(f: \mathbb{R}^{n_{\text{inputs}}} \to \mathbb{K}^{n_{\text{outputs}}}\) such that \(y=f(x)\).
In addition, we often want to impose some additional constraints on the function \(f\), mainly to ensure that it has a generalization capacity beyond the training data, i.e. it is able to correctly predict output values of new input values. This is called regularization. Assuming \(f\) is parametrized by a set of parameters \(\theta\), and denoting \(f_\theta\) the parametrized function, one typically seeks to minimize a function of the form
where \(\mu\) is a distance-like measure, typically a mean squared error or a cross entropy in the case of a regression, or a probability to be maximized in the case of a classification, and \(\Omega\) is a regularization term that limits the parameters from overfitting, typically some norm of its argument.
The supervised
module implements this concept
through the MLSupervisedAlgo
class based on a Dataset
.
-
class
gemseo.mlearning.core.supervised.
MLSupervisedAlgo
(data, transformer=None, input_names=None, output_names=None, **parameters)[source]¶ Bases:
gemseo.mlearning.core.ml_algo.MLAlgo
Supervised machine learning algorithm.
Inheriting classes should overload the
MLSupervisedAlgo._fit()
andMLSupervisedAlgo._predict()
methods.Constructor.
- Parameters
data (Dataset) – learning dataset.
transformer (dict(str)) – transformation strategy for data groups. If None, do not scale data. Default: None.
input_names (list(str)) – names of the input variables.
output_names (list(str)) – names of the output variables.
parameters – algorithm parameters.
-
ABBR
= 'MLSupervisedAlgo'¶
-
class
DataFormatters
[source]¶ Bases:
gemseo.mlearning.core.ml_algo.MLAlgo.DataFormatters
Decorators for supervised algorithms.
-
classmethod
format_dict
(predict)[source]¶ If input_data is passed as a dictionary, then convert it to ndarray, and convert output_data to dictionary. Else, do nothing.
- Parameters
predict – Method whose input_data and output_data are to be formatted.
-
classmethod
format_input_output
(predict)[source]¶ Format dict, samples and transform successively.
- Parameters
predict – Method whose input_data and output_data are to be formatted.
-
classmethod
format_samples
(predict)[source]¶ If input_data has shape (n_inputs,), reshape input_data to (1, n_inputs), and then reshape output data from (1, n_outputs) to (n_outputs,). If input_data has shape (n_samples, n_inputs), then do nothing.
- Parameters
predict – Method whose input_data and output_data are to be formatted.
-
classmethod
-
property
input_shape
¶ Dimension of input variables before applying transformers.
-
learn
(samples=None)[source]¶ Train machine learning algorithm on learning set, possibly filtered using the given parameters.
- Parameters
samples (list(int)) – indices of training samples.
-
property
output_shape
¶ Dimension of output variables before applying transformers.