4D Derivative-Enhanced GP with KMeans Submodel Clustering

4D Derivative-Enhanced GP with KMeans Submodel Clustering#

This tutorial demonstrates a derivative-enhanced Gaussian Process (DEGP) for a high-dimensional (4D) function. Key features include:

Automatic Submodel Generation using KMeans clustering.
Sobol Sequence Sampling for low-discrepancy coverage of the 4D input space.
Specific Derivative Structure: Only main (diagonal) derivatives are used.
2D Slice Visualization for interpretability.

Configuration#

import numpy as np

# Random seed
random_seed = 1354
np.random.seed(random_seed)

# GP configuration
n_bases = 4
n_order = 2
num_points_train = 36
num_points_test = 5000
lower_bounds = [-5] * n_bases
upper_bounds = [5] * n_bases
n_clusters = 3
kernel = "SE"
kernel_type = "isotropic"
normalize = True
n_restart_optimizer = 15
swarm_size = 250

Example Function#

We use the Styblinski-Tang function, a common benchmark for optimization:

\[f(x_1, x_2, x_3, x_4) = \frac{1}{2} \sum_{i=1}^{4} \left( x_i^4 - 16x_i^2 + 5x_i \right)\]

import pyoti.sparse as oti

def styblinski_tang_function(X, alg=oti):
    x1, x2, x3, x4 = X[:,0], X[:,1], X[:,2], X[:,3]
    return 0.5 * (x1**4 - 16*x1**2 + 5*x1 +
                  x2**4 - 16*x2**2 + 5*x2 +
                  x3**4 - 16*x3**2 + 5*x3 +
                  x4**4 - 16*x4**2 + 5*x4)

Data Generation (Sobol Sequence)#

We use a Sobol sequence for low-discrepancy sampling of the 4D input space:

from scipy.stats import qmc
import jetgp.utils as utils

def generate_sobol_points(num_points):
    sampler = qmc.Sobol(d=n_bases, scramble=True)
    samples = sampler.random_base2(m=int(np.ceil(np.log2(num_points))))
    quasi_samples = samples[:num_points]
    return utils.scale_samples(quasi_samples, lower_bounds, upper_bounds)

X_train = generate_sobol_points(num_points_train)
print("Training points shape:", X_train.shape)

Training points shape: (36, 4)

Submodel Generation (KMeans)#

We cluster the training points using KMeans to create submodels. Each cluster becomes a submodel with derivatives at all points in that cluster.

No reordering needed! Since WDEGP supports non-contiguous indices, we can use the original cluster assignments directly.

from sklearn.cluster import KMeans

def cluster_to_submodel_indices(X_train, n_clusters, n_order, n_bases):
    """
    Cluster training points and create proper submodel_indices structure.

    No reordering needed - indices can be non-contiguous!

    Returns:
        submodel_indices: Structure for WDEGP where
            submodel_indices[submodel_idx][deriv_idx] = [list of point indices]
        cluster_indices: Raw indices for each cluster (for reference)
    """
    kmeans = KMeans(n_clusters=n_clusters, random_state=random_seed, n_init='auto')
    labels = kmeans.fit_predict(X_train)

    # Group original indices by cluster (no reordering!)
    cluster_indices = [[] for _ in range(n_clusters)]
    for i, label in enumerate(labels):
        cluster_indices[label].append(i)

    # Build submodel_indices with proper structure:
    # submodel_indices[submodel_idx][deriv_idx] = [indices for that derivative]
    # For 4D with main derivatives: 4 × 2 = 8 derivatives per submodel
    num_derivatives = n_bases * n_order
    submodel_indices = [
        [indices for _ in range(num_derivatives)]
        for indices in cluster_indices
    ]

    return submodel_indices, cluster_indices

submodel_indices, cluster_indices = cluster_to_submodel_indices(
    X_train, n_clusters, n_order, n_bases
)

print(f"Number of submodels: {len(submodel_indices)}")
print(f"Points per cluster: {[len(c) for c in cluster_indices]}")
print(f"Derivatives per submodel: {n_bases * n_order}")
print(f"\nsubmodel_indices structure (one entry per derivative):")
for i, sm_idx in enumerate(submodel_indices):
    print(f"  Submodel {i}: {len(sm_idx)} derivatives, point indices = {sm_idx[0]}")

Number of submodels: 3
Points per cluster: [10, 11, 15]
Derivatives per submodel: 8

submodel_indices structure (one entry per derivative):
  Submodel 0: 8 derivatives, point indices = [3, 4, 8, 12, 15, 16, 20, 24, 31, 32]
  Submodel 1: 8 derivatives, point indices = [1, 2, 6, 10, 13, 17, 21, 22, 26, 29, 34]
  Submodel 2: 8 derivatives, point indices = [0, 5, 7, 9, 11, 14, 18, 19, 23, 25, 27, 28, 30, 33, 35]

Explanation:

No reordering required: Indices can be non-contiguous (e.g., [0, 5, 12, 23])
The submodel_indices structure has one entry per derivative: - submodel_indices[submodel_idx][deriv_idx] = list of training point indices for that derivative - For 4D with 2 orders: 4 × 2 = 8 entries per submodel

Prepare Submodel Data (Main Derivatives Only)#

For this 4D problem, we use only main (diagonal) derivatives: \(\frac{\partial f}{\partial x_i}\) and \(\frac{\partial^2 f}{\partial x_i^2}\) for \(i = 1, 2, 3, 4\).

This gives us 8 derivatives per submodel (4 dimensions × 2 orders).

import pyoti.sparse as oti
import jetgp.utils as utils

def prepare_submodel_data(X_train, submodel_indices, cluster_indices):
    """
    Prepare derivative data for WDEGP with main derivatives only.

    derivative_specs structure:
        derivative_specs[submodel_idx][deriv_idx] = derivative spec

    For 4D with main derivatives (8 total = 4 dims × 2 orders):
        [0]: [[[1,1]]]  -> df/dx1
        [1]: [[[2,1]]]  -> df/dx2
        [2]: [[[3,1]]]  -> df/dx3
        [3]: [[[4,1]]]  -> df/dx4
        [4]: [[[1,2]]]  -> d2f/dx1^2
        [5]: [[[2,2]]]  -> d2f/dx2^2
        [6]: [[[3,2]]]  -> d2f/dx3^2
        [7]: [[[4,2]]]  -> d2f/dx4^2
    """
    # Build derivative specifications for main derivatives
    # Structure: one spec per derivative (n_bases × n_order total)
    main_derivatives = []
    for order in range(n_order):
        ders = []
        for dim in range(n_bases):
            ders.append([[dim+1, order+1]])
        main_derivatives.append(ders)

    # Each submodel uses the same derivative types
    derivative_specs = [main_derivatives for _ in submodel_indices]


    # Compute function values at ALL training points (shared by all submodels)
    y_function_values = styblinski_tang_function(X_train, alg=np).reshape(-1, 1)

    print(f"Function values shape: {y_function_values.shape}")
    print(f"\nDerivative specs structure:")
    print(f"  Total derivatives per submodel: {len(main_derivatives)}")
    for i, spec in enumerate(main_derivatives):
        order = i // n_bases + 1
        dim = i % n_bases + 1
        print(f"    [{i}] Order {order}, dim {dim}: {spec}")

    # Prepare data for each submodel
    submodel_data = []

    for k, indices in enumerate(cluster_indices):
        # Create OTI array for automatic differentiation
        X_sub_oti = oti.array(X_train[indices])

        # Add dual components for each dimension
        for i in range(n_bases):
            for j in range(X_sub_oti.shape[0]):
                X_sub_oti[j, i] += oti.e(i+1, order=n_order)

        # Evaluate function with OTI to get derivatives
        y_with_derivatives = oti.array([
            styblinski_tang_function(x, alg=oti)[0]
            for x in X_sub_oti
        ])

        # Build submodel data: [func_values, deriv_0, deriv_1, ...]
        current_data = [y_function_values]

        for deriv_spec in derivative_specs[k]:
            for spec in deriv_spec:
                deriv_values = y_with_derivatives.get_deriv(spec).reshape(-1, 1)
                current_data.append(deriv_values)

        submodel_data.append(current_data)

        if k == 0:
            print(f"\nSubmodel 0 data structure:")
            print(f"  Total arrays: {len(current_data)} (1 func + {len(main_derivatives)} derivs)")
            print(f"    [0] Function values: shape {current_data[0].shape}")
            for idx, arr in enumerate(current_data[1:]):
                order = idx // n_bases + 1
                dim = idx % n_bases + 1
                print(f"    [{idx+1}] Order {order}, dim {dim}: shape {arr.shape}")

    return submodel_data, derivative_specs

submodel_data, derivative_specs = prepare_submodel_data(
    X_train, submodel_indices, cluster_indices
)
print(f"\nNumber of submodels: {len(submodel_data)}")

Function values shape: (36, 1)

Derivative specs structure:
  Total derivatives per submodel: 2
    [0] Order 1, dim 1: [[[1, 1]], [[2, 1]], [[3, 1]], [[4, 1]]]
    [1] Order 1, dim 2: [[[1, 2]], [[2, 2]], [[3, 2]], [[4, 2]]]

Submodel 0 data structure:
  Total arrays: 9 (1 func + 2 derivs)
    [0] Function values: shape (36, 1)
    [1] Order 1, dim 1: shape (10, 1)
    [2] Order 1, dim 2: shape (10, 1)
    [3] Order 1, dim 3: shape (10, 1)
    [4] Order 1, dim 4: shape (10, 1)
    [5] Order 2, dim 1: shape (10, 1)
    [6] Order 2, dim 2: shape (10, 1)
    [7] Order 2, dim 3: shape (10, 1)
    [8] Order 2, dim 4: shape (10, 1)

Number of submodels: 3

Explanation:

The data structures are:

derivative_specs: derivative_specs[submodel][deriv_idx] = derivative specification
- One entry per derivative (8 total for 4D × 2 orders)
- Example: [[[1,1]]] means derivative w.r.t. dimension 1, order 1
submodel_data: submodel_data[submodel] = [func_values, deriv_0, deriv_1, ..., deriv_7]
- First element is function values at ALL training points
- Each subsequent element corresponds to one derivative in derivative_specs

Build and Optimize GP#

from jetgp.wdegp.wdegp import wdegp

print("Building WDEGP model...")
print(f"  Training points: {X_train.shape[0]}")
print(f"  Submodels: {len(submodel_indices)}")
print(f"  Derivatives per submodel: {n_bases * n_order} (main only)")

gp_model = wdegp(
    X_train, submodel_data, n_order, n_bases,
     derivative_specs, derivative_locations = submodel_indices, normalize=normalize,
    kernel=kernel, kernel_type=kernel_type
)

print("\nOptimizing hyperparameters...")
params = gp_model.optimize_hyperparameters(
    optimizer='jade',
    pop_size=100,
    n_generations=15,
    local_opt_every=None,
    debug=True
)
print("\nGP model built and optimized.")

Building WDEGP model...
  Training points: 36
  Submodels: 3
  Derivatives per submodel: 8 (main only)

Optimizing hyperparameters...

Gen 1: best f=833.1430958969249

Gen 2: best f=833.1430958969249

Gen 3: best f=833.1430958969249

Gen 4: best f=833.1430958969249

Gen 5: best f=814.1692957094556

Gen 6: best f=785.9341142684408

Gen 7: best f=785.9341142684408

Gen 8: best f=759.816677316089

Gen 9: best f=759.816677316089

Gen 10: best f=745.4057774985336

Gen 11: best f=745.4057774985336

Gen 12: best f=734.8498681056285

Gen 13: best f=734.8498681056285

Gen 14: best f=721.6775294703954

Gen 15: best f=721.6775294703954

GP model built and optimized.

Evaluate Model Performance#

# Generate test points
X_test = generate_sobol_points(num_points_test)

# Predict
y_pred = gp_model.predict(X_test, params, calc_cov=False)
y_true = styblinski_tang_function(X_test, alg=np)

# Compute error metrics
rmse = np.sqrt(np.mean((y_true - y_pred.flatten())**2))
nrmse = rmse / (y_true.max() - y_true.min())
mae = np.mean(np.abs(y_true - y_pred.flatten()))

print(f"Test set size: {num_points_test}")
print(f"RMSE: {rmse:.6f}")
print(f"NRMSE: {nrmse:.6f}")
print(f"MAE: {mae:.6f}")

Test set size: 5000
RMSE: 28.998336
NRMSE: 0.059993
MAE: 10.629328

2D Slice Visualization#

Since we cannot visualize 4D directly, we create a 2D slice by fixing \(x_3 = x_4 = 0\):

import matplotlib.pyplot as plt

# Evaluate on 2D slice (x3=0, x4=0)
grid_points = 50
x1x2_lin = np.linspace(lower_bounds[0], upper_bounds[0], grid_points)
X1_grid, X2_grid = np.meshgrid(x1x2_lin, x1x2_lin)
X_slice = np.column_stack([X1_grid.ravel(), X2_grid.ravel()])
X_slice = np.hstack([X_slice, np.zeros((X_slice.shape[0], 2))])  # x3=0, x4=0

y_slice_pred = gp_model.predict(X_slice, params, calc_cov=False)
y_slice_true = styblinski_tang_function(X_slice, alg=np)

# Contour plots
fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharex=True, sharey=True)

# Project training points to 2D (showing x1, x2 coordinates)
X_train_proj = X_train[:, :2]

# True function
c1 = axes[0].contourf(X1_grid, X2_grid, y_slice_true.reshape(X1_grid.shape),
                      levels=50, cmap="viridis")
fig.colorbar(c1, ax=axes[0])
axes[0].set_title("True Function Slice ($x_3=x_4=0$)")
axes[0].scatter(X_train_proj[:, 0], X_train_proj[:, 1],
                c="red", edgecolor="k", s=50, label="Training pts (projected)")

# GP prediction
c2 = axes[1].contourf(X1_grid, X2_grid, y_slice_pred.reshape(X1_grid.shape),
                      levels=50, cmap="viridis")
fig.colorbar(c2, ax=axes[1])
axes[1].set_title("GP Prediction Slice")
axes[1].scatter(X_train_proj[:, 0], X_train_proj[:, 1],
                c="red", edgecolor="k", s=50)

# Absolute error
error_grid = np.abs(y_slice_true - y_slice_pred.flatten())
c3 = axes[2].contourf(X1_grid, X2_grid, error_grid.reshape(X1_grid.shape),
                      levels=50, cmap="magma")
fig.colorbar(c3, ax=axes[2])
axes[2].set_title("Absolute Prediction Error")
axes[2].scatter(X_train_proj[:, 0], X_train_proj[:, 1],
                c="red", edgecolor="k", s=50)

for ax in axes:
    ax.set_xlabel("$x_1$")
    ax.set_ylabel("$x_2$")
axes[0].legend(loc='upper right')
plt.tight_layout()
plt.show()

../../_images/4d_WDEGP_sparse_ders_7_0.png

Cluster Visualization#

Visualize how KMeans partitioned the training points:

fig, ax = plt.subplots(figsize=(8, 6))

colors = plt.cm.tab10(np.linspace(0, 1, n_clusters))

for k, indices in enumerate(cluster_indices):
    X_cluster = X_train[indices, :2]
    ax.scatter(X_cluster[:, 0], X_cluster[:, 1],
               c=[colors[k]], s=100, edgecolor='k',
               label=f'Cluster {k} ({len(indices)} pts)')

ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_title("KMeans Clustering of Training Points (projected to $x_1$-$x_2$)")
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

../../_images/4d_WDEGP_sparse_ders_8_0.png

—

Summary#

This tutorial demonstrated WDEGP for a 4D problem with:

Key features:

KMeans clustering to automatically partition training points into submodels
No reordering required: Non-contiguous indices are supported directly
Main derivatives only: \(\frac{\partial f}{\partial x_i}\) and \(\frac{\partial^2 f}{\partial x_i^2}\) (8 derivatives per submodel)
Sobol sequence sampling for space-filling design in 4D

Data structures:

submodel_indices[submodel][deriv_idx] = point indices for that derivative (one entry per derivative)
derivative_specs[submodel][deriv_idx] = derivative spec (e.g., [[[1, 1]]] for df/dx1)
submodel_data[submodel] = [func_values, deriv_0, deriv_1, ..., deriv_7]

Advantages of this approach:

No data reordering: Use cluster indices directly without reorganizing training data
Scales to higher dimensions by using only main derivatives (avoiding combinatorial explosion)
KMeans provides automatic, data-driven submodel partitioning
Each submodel captures local behavior within its cluster