Gaussian Mixtures

Load Iris dataset and create clusters.

from __future__ import annotations

from gemseo.api import configure_logger
from gemseo.api import load_dataset
from gemseo.core.dataset import Dataset
from gemseo.mlearning.api import create_clustering_model
from numpy import array

<RootLogger root (INFO)>

Create dataset

We import the Iris benchmark dataset through the API.

iris = load_dataset("IrisDataset")

# Extract inputs as a new dataset
data = iris.get_data_by_group(iris.PARAMETER_GROUP)
variables = iris.get_names(iris.PARAMETER_GROUP)

dataset = Dataset("sepal_and_petal")
dataset.set_from_array(data, variables)
['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

Create clustering model

We know that there are three classes of Iris plants. We will thus try to identify three clusters.

model = create_clustering_model("GaussianMixture", data=dataset, n_components=3)
GaussianMixture(n_components=3, var_names=None)
   based on the scikit-learn library
   built from 150 learning samples

Predict output

Once it is built, we can use it for prediction.

input_value = {
    "sepal_length": array([4.5]),
    "sepal_width": array([3.0]),
    "petal_length": array([1.0]),
    "petal_width": array([0.2]),
output_value = model.predict(input_value)

Plot clusters

Show cluster labels

    "gm_specy", model.labels.reshape((-1, 1)), group="labels", cache_as_input=False
dataset.plot("ScatterMatrix", kde=True, classifier="gm_specy")
/home/docs/checkouts/ UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared.
  sub_axes = scatter_matrix(

< object at 0x7f3b0b4e2280>

Total running time of the script: ( 0 minutes 0.633 seconds)

Gallery generated by Sphinx-Gallery