Note
Click here to download the full example code
K-means¶
Load Iris dataset and create clusters.
Import¶
from __future__ import division, unicode_literals
from numpy import array
from gemseo.api import configure_logger, load_dataset
from gemseo.core.dataset import Dataset
from gemseo.mlearning.api import create_clustering_model
configure_logger()
Out:
<RootLogger root (INFO)>
Create dataset¶
We import the Iris benchmark dataset through the API.
iris = load_dataset("IrisDataset")
# Extract inputs as a new dataset
data = iris.get_data_by_group(iris.PARAMETER_GROUP)
variables = iris.get_names(iris.PARAMETER_GROUP)
print(variables)
dataset = Dataset("sepal_and_petal")
dataset.set_from_array(data, variables)
Out:
['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
Create clustering model¶
We know that there are three classes of Iris plants. We will thus try to identify three clusters.
model = create_clustering_model("KMeans", data=dataset, n_clusters=3)
model.learn()
print(model)
Out:
KMeans(n_clusters=3, random_state=0, var_names=None)
built from 150 learning samples
Predict output¶
Once it is built, we can use it for prediction.
input_value = {
"sepal_length": array([4.5]),
"sepal_width": array([3.0]),
"petal_length": array([1.0]),
"petal_width": array([0.2]),
}
output_value = model.predict(input_value)
print(output_value)
Out:
0
Plot clusters¶
Show cluster labels
dataset.add_variable(
"km_specy", model.labels.reshape((-1, 1)), group="labels", cache_as_input=False
)
dataset.plot("ScatterMatrix", kde=True, classifier="km_specy")
Out:
/home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/gemseo/post/dataset/scatter_plot_matrix.py:135: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only
dataframe = dataframe.drop(varname, 1)
<gemseo.post.dataset.scatter_plot_matrix.ScatterMatrix object at 0x7f618ff40820>
Total running time of the script: ( 0 minutes 0.754 seconds)