Load Iris dataset and create clusters.


from __future__ import annotations

from gemseo import configure_logger
from gemseo import create_benchmark_dataset
from gemseo.datasets.dataset import Dataset
from gemseo.mlearning import create_clustering_model
from gemseo.post.dataset.scatter_plot_matrix import ScatterMatrix
from numpy import array

<RootLogger root (INFO)>

Create dataset

We import the Iris benchmark dataset through the API.

iris = create_benchmark_dataset("IrisDataset")

# Extract inputs as a new dataset
data = iris.get_view(group_names=iris.PARAMETER_GROUP).to_numpy()
variables = iris.get_variable_names(iris.PARAMETER_GROUP)

dataset = Dataset.from_array(data, variables)
['petal_length', 'petal_width', 'sepal_length', 'sepal_width']

Create clustering model

We know that there are three classes of Iris plants. We will thus try to identify three clusters.

model = create_clustering_model("KMeans", data=dataset, n_clusters=3)
/home/docs/checkouts/readthedocs.org/user_builds/gemseo/envs/5.0.1/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
KMeans(n_clusters=3, random_state=0, var_names=None)
   based on the scikit-learn library
   built from 150 learning samples

Predict output

Once it is built, we can use it for prediction.

input_value = {
    "sepal_length": array([4.5]),
    "sepal_width": array([3.0]),
    "petal_length": array([1.0]),
    "petal_width": array([0.2]),
output_value = model.predict(input_value)

Plot clusters

Show cluster labels

dataset.add_variable("km_specy", model.labels.reshape((-1, 1)), "labels")
ScatterMatrix(dataset, kde=True, classifier="km_specy").execute(save=False, show=True)
plot kmeans
/home/docs/checkouts/readthedocs.org/user_builds/gemseo/envs/5.0.1/lib/python3.9/site-packages/gemseo/post/dataset/scatter_plot_matrix.py:137: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared.
  sub_axes = scatter_matrix(

[<Figure size 640x480 with 16 Axes>]

Total running time of the script: ( 0 minutes 0.825 seconds)

Gallery generated by Sphinx-Gallery