supervised module¶
Supervised machine learning algorithm¶
Supervised machine learning is a task of learning relationships between input and output variables based on an inputoutput dataset. One usually distinguishes between to types of supervised machine learning algorithms, based on the nature of the outputs. For a continuous output variable, a regression is performed, while for a discrete output variable, a classification is performed.
Given a set of input variables \(x \in \mathbb{R}^{n_{\text{samples}}\times n_{\text{inputs}}}\) and a set of output variables \(y\in \mathbb{K}^{n_{\text{samples}}\times n_{\text{outputs}}}\), where \(n_{\text{inputs}}\) is the dimension of the input variable, \(n_{\text{outputs}}\) is the dimension of the output variable, \(n_{\text{samples}}\) is the number of training samples and \(\mathbb{K}\) is either \(\mathbb{R}\) or \(\mathbb{N}\) for regression and classification tasks respectively, a supervised learning algorithm seeks to find a function \(f: \mathbb{R}^{n_{\text{inputs}}} \to \mathbb{K}^{n_{\text{outputs}}}\) such that \(y=f(x)\).
In addition, we often want to impose some additional constraints on the function \(f\), mainly to ensure that it has a generalization capacity beyond the training data, i.e. it is able to correctly predict output values of new input values. This is called regularization. Assuming \(f\) is parametrized by a set of parameters \(\theta\), and denoting \(f_\theta\) the parametrized function, one typically seeks to minimize a function of the form
where \(\mu\) is a distancelike measure, typically a mean squared error or a cross entropy in the case of a regression, or a probability to be maximized in the case of a classification, and \(\Omega\) is a regularization term that limits the parameters from overfitting, typically some norm of its argument.
The supervised
module implements this concept
through the MLSupervisedAlgo
class based on a Dataset
.

class
gemseo.mlearning.core.supervised.
MLSupervisedAlgo
(data, transformer=None, input_names=None, output_names=None, **parameters)[source]¶ Bases:
gemseo.mlearning.core.ml_algo.MLAlgo
Supervised machine learning algorithm.
Inheriting classes should overload the
MLSupervisedAlgo._fit()
andMLSupervisedAlgo._predict()
methods.Constructor.
 Parameters
data (Dataset) – learning dataset.
transformer (dict(str)) – transformation strategy for data groups. If None, do not scale data. Default: None.
input_names (list(str)) – names of the input variables.
output_names (list(str)) – names of the output variables.
parameters – algorithm parameters.

ABBR
= 'MLSupervisedAlgo'¶

class
DataFormatters
[source]¶ Bases:
gemseo.mlearning.core.ml_algo.MLAlgo.DataFormatters
Decorators for supervised algorithms.

classmethod
format_dict
(predict)[source]¶ If input_data is passed as a dictionary, then convert it to ndarray, and convert output_data to dictionary. Else, do nothing.
 Parameters
predict – Method whose input_data and output_data are to be formatted.

classmethod
format_input_output
(predict)[source]¶ Format dict, samples and transform successively.
 Parameters
predict – Method whose input_data and output_data are to be formatted.

classmethod
format_samples
(predict)[source]¶ If input_data has shape (n_inputs,), reshape input_data to (1, n_inputs), and then reshape output data from (1, n_outputs) to (n_outputs,). If input_data has shape (n_samples, n_inputs), then do nothing.
 Parameters
predict – Method whose input_data and output_data are to be formatted.

classmethod

property
input_shape
¶ Dimension of input variables before applying transformers.

learn
(samples=None)[source]¶ Train machine learning algorithm on learning set, possibly filtered using the given parameters.
 Parameters
samples (list(int)) – indices of training samples.

property
output_shape
¶ Dimension of output variables before applying transformers.