Random forest#

A RandomForestRegressor is a random forest model based on scikit-learn.

from __future__ import annotations

from matplotlib import pyplot as plt
from numpy import array

from gemseo import create_design_space
from gemseo import create_discipline
from gemseo import sample_disciplines
from gemseo.mlearning import create_regression_model

Problem#

In this example, we represent the function \(f(x)=(6x-2)^2\sin(12x-4)\) [FSK08] by the AnalyticDiscipline

discipline = create_discipline(
    "AnalyticDiscipline",
    name="f",
    expressions={"y": "(6*x-2)**2*sin(12*x-4)"},
)

and seek to approximate it over the input space

input_space = create_design_space()
input_space.add_variable("x", lower_bound=0.0, upper_bound=1.0)

To do this, we create a training dataset with 6 equispaced points:

training_dataset = sample_disciplines(
    [discipline], input_space, "y", algo_name="PYDOE_FULLFACT", n_samples=6
)

INFO - 16:16:13: *** Start Sampling execution ***
INFO - 16:16:13: Sampling
INFO - 16:16:13:    Disciplines: f
INFO - 16:16:13:    MDO formulation: MDF
INFO - 16:16:13: Running the algorithm PYDOE_FULLFACT:
INFO - 16:16:13:     17%|█▋        | 1/6 [00:00<00:00, 669.27 it/sec]
INFO - 16:16:13:     33%|███▎      | 2/6 [00:00<00:00, 1087.03 it/sec]
INFO - 16:16:13:     50%|█████     | 3/6 [00:00<00:00, 1421.96 it/sec]
INFO - 16:16:13:     67%|██████▋   | 4/6 [00:00<00:00, 1682.94 it/sec]
INFO - 16:16:13:     83%|████████▎ | 5/6 [00:00<00:00, 1913.29 it/sec]
INFO - 16:16:13:    100%|██████████| 6/6 [00:00<00:00, 2061.59 it/sec]
INFO - 16:16:13: *** End Sampling execution ***

Basics#

Training#

Then, we train an random forest regression model from these samples:

model = create_regression_model("RandomForestRegressor", training_dataset)
model.learn()

Prediction#

Once it is built, we can predict the output value of \(f\) at a new input point:

input_value = {"x": array([0.65])}
output_value = model.predict(input_value)
output_value

{'y': array([-0.88837697])}

but cannot predict its Jacobian value:

try:
    model.predict_jacobian(input_value)
except NotImplementedError:
    print("The derivatives are not available for RandomForestRegressor.")

The derivatives are not available for RandomForestRegressor.

Plotting#

You can see that the random forest model is pretty good on the left, but bad on the right:

test_dataset = sample_disciplines(
    [discipline], input_space, "y", algo_name="PYDOE_FULLFACT", n_samples=100
)
input_data = test_dataset.get_view(variable_names=model.input_names).to_numpy()
reference_output_data = test_dataset.get_view(variable_names="y").to_numpy().ravel()
predicted_output_data = model.predict(input_data).ravel()
plt.plot(input_data.ravel(), reference_output_data, label="Reference")
plt.plot(input_data.ravel(), predicted_output_data, label="Regression - Basics")
plt.grid()
plt.legend()
plt.show()

INFO - 16:16:13: *** Start Sampling execution ***
INFO - 16:16:13: Sampling
INFO - 16:16:13:    Disciplines: f
INFO - 16:16:13:    MDO formulation: MDF
INFO - 16:16:13: Running the algorithm PYDOE_FULLFACT:
INFO - 16:16:13:      1%|          | 1/100 [00:00<00:00, 3679.21 it/sec]
INFO - 16:16:13:      2%|▏         | 2/100 [00:00<00:00, 3542.49 it/sec]
INFO - 16:16:13:      3%|▎         | 3/100 [00:00<00:00, 3649.34 it/sec]
INFO - 16:16:13:      4%|▍         | 4/100 [00:00<00:00, 3690.54 it/sec]
INFO - 16:16:13:      5%|▌         | 5/100 [00:00<00:00, 3789.58 it/sec]
INFO - 16:16:13:      6%|▌         | 6/100 [00:00<00:00, 3873.45 it/sec]
INFO - 16:16:13:      7%|▋         | 7/100 [00:00<00:00, 3944.13 it/sec]
INFO - 16:16:13:      8%|▊         | 8/100 [00:00<00:00, 3974.70 it/sec]
INFO - 16:16:13:      9%|▉         | 9/100 [00:00<00:00, 4015.40 it/sec]
INFO - 16:16:13:     10%|█         | 10/100 [00:00<00:00, 4052.86 it/sec]
INFO - 16:16:13:     11%|█         | 11/100 [00:00<00:00, 4087.29 it/sec]
INFO - 16:16:13:     12%|█▏        | 12/100 [00:00<00:00, 4109.71 it/sec]
INFO - 16:16:13:     13%|█▎        | 13/100 [00:00<00:00, 4094.46 it/sec]
INFO - 16:16:13:     14%|█▍        | 14/100 [00:00<00:00, 4115.23 it/sec]
INFO - 16:16:13:     15%|█▌        | 15/100 [00:00<00:00, 4145.12 it/sec]
INFO - 16:16:13:     16%|█▌        | 16/100 [00:00<00:00, 4173.96 it/sec]
INFO - 16:16:13:     17%|█▋        | 17/100 [00:00<00:00, 4176.37 it/sec]
INFO - 16:16:13:     18%|█▊        | 18/100 [00:00<00:00, 4197.80 it/sec]
INFO - 16:16:13:     19%|█▉        | 19/100 [00:00<00:00, 4212.26 it/sec]
INFO - 16:16:13:     20%|██        | 20/100 [00:00<00:00, 4230.69 it/sec]
INFO - 16:16:13:     21%|██        | 21/100 [00:00<00:00, 4245.04 it/sec]
INFO - 16:16:13:     22%|██▏       | 22/100 [00:00<00:00, 4238.42 it/sec]
INFO - 16:16:13:     23%|██▎       | 23/100 [00:00<00:00, 4254.42 it/sec]
INFO - 16:16:13:     24%|██▍       | 24/100 [00:00<00:00, 4269.55 it/sec]
INFO - 16:16:13:     25%|██▌       | 25/100 [00:00<00:00, 4284.97 it/sec]
INFO - 16:16:13:     26%|██▌       | 26/100 [00:00<00:00, 4281.41 it/sec]
INFO - 16:16:13:     27%|██▋       | 27/100 [00:00<00:00, 4290.93 it/sec]
INFO - 16:16:13:     28%|██▊       | 28/100 [00:00<00:00, 4265.91 it/sec]
INFO - 16:16:13:     29%|██▉       | 29/100 [00:00<00:00, 4274.94 it/sec]
INFO - 16:16:13:     30%|███       | 30/100 [00:00<00:00, 4267.56 it/sec]
INFO - 16:16:13:     31%|███       | 31/100 [00:00<00:00, 4278.49 it/sec]
INFO - 16:16:13:     32%|███▏      | 32/100 [00:00<00:00, 4231.86 it/sec]
INFO - 16:16:13:     33%|███▎      | 33/100 [00:00<00:00, 4236.67 it/sec]
INFO - 16:16:13:     34%|███▍      | 34/100 [00:00<00:00, 4235.66 it/sec]
INFO - 16:16:13:     35%|███▌      | 35/100 [00:00<00:00, 4239.12 it/sec]
INFO - 16:16:13:     36%|███▌      | 36/100 [00:00<00:00, 4241.31 it/sec]
INFO - 16:16:13:     37%|███▋      | 37/100 [00:00<00:00, 4247.69 it/sec]
INFO - 16:16:13:     38%|███▊      | 38/100 [00:00<00:00, 4255.56 it/sec]
INFO - 16:16:13:     39%|███▉      | 39/100 [00:00<00:00, 4253.86 it/sec]
INFO - 16:16:13:     40%|████      | 40/100 [00:00<00:00, 4262.72 it/sec]
INFO - 16:16:13:     41%|████      | 41/100 [00:00<00:00, 4272.56 it/sec]
INFO - 16:16:13:     42%|████▏     | 42/100 [00:00<00:00, 4280.32 it/sec]
INFO - 16:16:13:     43%|████▎     | 43/100 [00:00<00:00, 4281.12 it/sec]
INFO - 16:16:13:     44%|████▍     | 44/100 [00:00<00:00, 4287.76 it/sec]
INFO - 16:16:13:     45%|████▌     | 45/100 [00:00<00:00, 4297.74 it/sec]
INFO - 16:16:13:     46%|████▌     | 46/100 [00:00<00:00, 4304.54 it/sec]
INFO - 16:16:13:     47%|████▋     | 47/100 [00:00<00:00, 4312.20 it/sec]
INFO - 16:16:13:     48%|████▊     | 48/100 [00:00<00:00, 4310.05 it/sec]
INFO - 16:16:13:     49%|████▉     | 49/100 [00:00<00:00, 4316.67 it/sec]
INFO - 16:16:13:     50%|█████     | 50/100 [00:00<00:00, 4324.74 it/sec]
INFO - 16:16:13:     51%|█████     | 51/100 [00:00<00:00, 4331.91 it/sec]
INFO - 16:16:13:     52%|█████▏    | 52/100 [00:00<00:00, 4332.70 it/sec]
INFO - 16:16:13:     53%|█████▎    | 53/100 [00:00<00:00, 4335.32 it/sec]
INFO - 16:16:13:     54%|█████▍    | 54/100 [00:00<00:00, 4338.77 it/sec]
INFO - 16:16:13:     55%|█████▌    | 55/100 [00:00<00:00, 4343.07 it/sec]
INFO - 16:16:13:     56%|█████▌    | 56/100 [00:00<00:00, 4348.36 it/sec]
INFO - 16:16:13:     57%|█████▋    | 57/100 [00:00<00:00, 4348.09 it/sec]
INFO - 16:16:13:     58%|█████▊    | 58/100 [00:00<00:00, 4350.70 it/sec]
INFO - 16:16:13:     59%|█████▉    | 59/100 [00:00<00:00, 4356.45 it/sec]
INFO - 16:16:13:     60%|██████    | 60/100 [00:00<00:00, 4358.70 it/sec]
INFO - 16:16:13:     61%|██████    | 61/100 [00:00<00:00, 4355.97 it/sec]
INFO - 16:16:13:     62%|██████▏   | 62/100 [00:00<00:00, 4358.08 it/sec]
INFO - 16:16:13:     63%|██████▎   | 63/100 [00:00<00:00, 4361.42 it/sec]
INFO - 16:16:13:     64%|██████▍   | 64/100 [00:00<00:00, 4366.72 it/sec]
INFO - 16:16:13:     65%|██████▌   | 65/100 [00:00<00:00, 4370.19 it/sec]
INFO - 16:16:13:     66%|██████▌   | 66/100 [00:00<00:00, 4366.93 it/sec]
INFO - 16:16:13:     67%|██████▋   | 67/100 [00:00<00:00, 4369.81 it/sec]
INFO - 16:16:13:     68%|██████▊   | 68/100 [00:00<00:00, 4374.02 it/sec]
INFO - 16:16:13:     69%|██████▉   | 69/100 [00:00<00:00, 4376.93 it/sec]
INFO - 16:16:13:     70%|███████   | 70/100 [00:00<00:00, 4375.51 it/sec]
INFO - 16:16:13:     71%|███████   | 71/100 [00:00<00:00, 4375.61 it/sec]
INFO - 16:16:13:     72%|███████▏  | 72/100 [00:00<00:00, 4376.79 it/sec]
INFO - 16:16:13:     73%|███████▎  | 73/100 [00:00<00:00, 4379.88 it/sec]
INFO - 16:16:13:     74%|███████▍  | 74/100 [00:00<00:00, 4383.75 it/sec]
INFO - 16:16:13:     75%|███████▌  | 75/100 [00:00<00:00, 4382.58 it/sec]
INFO - 16:16:13:     76%|███████▌  | 76/100 [00:00<00:00, 4384.81 it/sec]
INFO - 16:16:13:     77%|███████▋  | 77/100 [00:00<00:00, 4389.26 it/sec]
INFO - 16:16:13:     78%|███████▊  | 78/100 [00:00<00:00, 4393.59 it/sec]
INFO - 16:16:13:     79%|███████▉  | 79/100 [00:00<00:00, 4397.36 it/sec]
INFO - 16:16:13:     80%|████████  | 80/100 [00:00<00:00, 4394.59 it/sec]
INFO - 16:16:13:     81%|████████  | 81/100 [00:00<00:00, 4398.31 it/sec]
INFO - 16:16:13:     82%|████████▏ | 82/100 [00:00<00:00, 4401.95 it/sec]
INFO - 16:16:13:     83%|████████▎ | 83/100 [00:00<00:00, 4405.22 it/sec]
INFO - 16:16:13:     84%|████████▍ | 84/100 [00:00<00:00, 4403.69 it/sec]
INFO - 16:16:13:     85%|████████▌ | 85/100 [00:00<00:00, 4406.00 it/sec]
INFO - 16:16:13:     86%|████████▌ | 86/100 [00:00<00:00, 4407.67 it/sec]
INFO - 16:16:13:     87%|████████▋ | 87/100 [00:00<00:00, 4408.82 it/sec]
INFO - 16:16:13:     88%|████████▊ | 88/100 [00:00<00:00, 4411.89 it/sec]
INFO - 16:16:13:     89%|████████▉ | 89/100 [00:00<00:00, 4409.58 it/sec]
INFO - 16:16:13:     90%|█████████ | 90/100 [00:00<00:00, 4412.73 it/sec]
INFO - 16:16:13:     91%|█████████ | 91/100 [00:00<00:00, 4414.39 it/sec]
INFO - 16:16:13:     92%|█████████▏| 92/100 [00:00<00:00, 4416.83 it/sec]
INFO - 16:16:13:     93%|█████████▎| 93/100 [00:00<00:00, 4414.96 it/sec]
INFO - 16:16:13:     94%|█████████▍| 94/100 [00:00<00:00, 4417.18 it/sec]
INFO - 16:16:13:     95%|█████████▌| 95/100 [00:00<00:00, 4420.79 it/sec]
INFO - 16:16:13:     96%|█████████▌| 96/100 [00:00<00:00, 4424.42 it/sec]
INFO - 16:16:13:     97%|█████████▋| 97/100 [00:00<00:00, 4427.84 it/sec]
INFO - 16:16:13:     98%|█████████▊| 98/100 [00:00<00:00, 4426.32 it/sec]
INFO - 16:16:13:     99%|█████████▉| 99/100 [00:00<00:00, 4429.19 it/sec]
INFO - 16:16:13:    100%|██████████| 100/100 [00:00<00:00, 4380.38 it/sec]
INFO - 16:16:13: *** End Sampling execution ***

Settings#

Number of estimators#

The main hyperparameter of random forest regression is the number of trees in the forest (default: 100). Here is a comparison when increasing and decreasing this number:

model = create_regression_model(
    "RandomForestRegressor", training_dataset, n_estimators=10
)
model.learn()
predicted_output_data_1 = model.predict(input_data).ravel()
model = create_regression_model(
    "RandomForestRegressor", training_dataset, n_estimators=1000
)
model.learn()
predicted_output_data_2 = model.predict(input_data).ravel()
plt.plot(input_data.ravel(), reference_output_data, label="Reference")
plt.plot(input_data.ravel(), predicted_output_data, label="Regression - Basics")
plt.plot(input_data.ravel(), predicted_output_data_1, label="Regression - 10 trees")
plt.plot(input_data.ravel(), predicted_output_data_2, label="Regression - 1000 trees")
plt.grid()
plt.legend()
plt.show()

Others#

The RandomForestRegressor class of scikit-learn has a lot of settings (read more), and we have chosen to exhibit only n_estimators. However, any argument of RandomForestRegressor can be set using the dictionary parameters. For example, we can impose a minimum of two samples per leaf:

model = create_regression_model(
    "RandomForestRegressor", training_dataset, parameters={"min_samples_leaf": 2}
)
model.learn()
predicted_output_data_ = model.predict(input_data).ravel()
plt.plot(input_data.ravel(), reference_output_data, label="Reference")
plt.plot(input_data.ravel(), predicted_output_data, label="Regression - Basics")
plt.plot(input_data.ravel(), predicted_output_data_, label="Regression - 2 samples")
plt.grid()
plt.legend()
plt.show()

Total running time of the script: (0 minutes 0.931 seconds)

Gallery generated by Sphinx-Gallery