High-level functions#

The gemseo.mlearning package includes high-level functions to create clustering models from model class names.

from __future__ import annotations

from gemseo import create_benchmark_dataset
from gemseo.mlearning import create_clustering_model
from gemseo.mlearning import get_clustering_models
from gemseo.mlearning import get_clustering_options

Available models#

Use the get_clustering_models() to list the available model class names:

get_clustering_models()
['GaussianMixture', 'KMeans']

Available model options#

Use the get_clustering_options() to get the options of a model from its class name:

get_clustering_options("GaussianMixture", pretty_print=False)
{'additionalProperties': False, 'description': 'The settings of the Gaussian mixture model.', 'properties': {'transformer': {'additionalProperties': True, 'description': 'The strategies to transform the variables.\n\nThe values are instances of :class:`.BaseTransformer`\nwhile the keys are the names of\neither the variables\nor the groups of variables,\ne.g. ``"inputs"`` or ``"outputs"``\nin the case of the regression algorithms.\nIf a group is specified,\nthe :class:`.BaseTransformer` will be applied\nto all the variables of this group.\nIf :attr:`.IDENTITY`, do not transform the variables.', 'title': 'Transformer', 'type': 'object'}, 'parameters': {'additionalProperties': True, 'description': 'Other parameters.', 'title': 'Parameters', 'type': 'object'}, 'var_names': {'default': [], 'description': 'The names of the variables.', 'items': {'type': 'string'}, 'title': 'Var Names', 'type': 'array'}, 'n_clusters': {'default': 5, 'description': 'The number of clusters of the clustering algorithm.', 'exclusiveMinimum': 0, 'title': 'N Clusters', 'type': 'integer'}, 'random_state': {'anyOf': [{'minimum': 0, 'type': 'integer'}, {'type': 'null'}], 'default': 0, 'description': 'The random state parameter.\n\nIf ``None``, use the global random state instance from ``numpy.random``.\nCreating the model multiple times will produce different results.\nIf ``int``, use a new random number generator seeded by this integer.\nThis will produce the same results.', 'title': 'Random State'}}, 'title': 'GaussianMixture_Settings', 'type': 'object'}

See also

The functions get_clustering_models() and get_clustering_options() can be very useful for the developers. As a user, it may be easier to consult this page to find out about the different algorithms and their options.

Creation#

Given a training dataset, e.g.

dataset = create_benchmark_dataset("IrisDataset")

use the create_clustering_model() function to create a clustering model from its class name and settings:

model = create_clustering_model("KMeans", data=dataset, n_clusters=3)
model.learn()

Total running time of the script: (0 minutes 0.011 seconds)

Gallery generated by Sphinx-Gallery