Sparse Derivative-Enhanced GP Tutorial: 2D Example

Sparse Derivative-Enhanced GP Tutorial: 2D Example#

This tutorial demonstrates a 2D sparse, derivative-enhanced Gaussian Process (DEGP) model using pyOTI-based automatic differentiation. It focuses on constructing a small 2D grid, strategically assigning derivative points, and visualizing model accuracy.

Key features#

2D grid-based training data generation
Non-contiguous index support: No reordering needed for derivative placement
Automatic differentiation using pyOTI
Contour visualization of the true function, GP prediction, and error
Highlighting of derivative points in the plots

Note

This example uses the Six-Hump Camelback function as a test case.

—

Import required packages#

import numpy as np
import itertools
import matplotlib.pyplot as plt
import pyoti.sparse as oti
from jetgp.wdegp.wdegp import wdegp
import jetgp.utils as utils

—

Configuration#

All configuration parameters are defined as simple variables. This includes derivative order, kernel type, and other hyperparameters.

n_order = 2
n_bases = 2
normalize_data = True
kernel = "SE"
kernel_type = "anisotropic"
n_restarts = 15
swarm_size = 200
np.random.seed(42)

Define the Six-Hump Camelback function#

def six_hump_camelback(X, alg=np):
    """
    Six-hump camelback function, a 2D benchmark function with multiple local minima.
    """
    x1 = X[:, 0]
    x2 = X[:, 1]
    return (
        (4 - 2.1 * x1**2 + (x1**4) / 3.0) * x1**2
        + x1 * x2
        + (-4 + 4 * x2**2) * x2**2
    )

—

Generate 2D training grid and derivative point assignment#

We construct a \(4 \times 4\) grid and select interior points for derivative computation.

No reordering needed! WDEGP supports non-contiguous indices, so we can use the interior point indices directly.

def create_training_grid(lb=-1.0, ub=1.0, num_points=4):
    """
    Create a 2D training grid with derivatives at interior points.

    No reordering needed - indices can be non-contiguous!
    """
    # Uniform 2D grid
    x_vals = np.linspace(lb, ub, num_points)
    y_vals = np.linspace(lb, ub, num_points)
    X = np.array(list(itertools.product(x_vals, y_vals)))

    # Interior points where we want derivatives (no reordering needed!)
    # In a 4x4 grid, interior points are [5, 6, 9, 10]
    derivative_indices = [5, 6, 9, 10]

    # Build submodel_indices structure:
    # submodel_indices[submodel][deriv_idx] = [list of point indices]
    # We have 4 derivatives total (2 dims × 2 orders), so 4 entries
    num_derivatives = n_bases * n_order  # 2 × 2 = 4
    submodel_indices = [
        [derivative_indices for _ in range(num_derivatives)]
    ]

    # Build derivative_specs structure:
    # derivative_specs[submodel][deriv_idx] = derivative spec
    # For 2D with all derivatives (4 total):
    derivative_specs = [
     [
         [[[1, 1]], [[2, 1]]],  # 1st order derivatives
         [[[1, 2]], [[2, 2]]]   # 2nd order derivatives
     ]
     ]

    return X, submodel_indices, derivative_specs, derivative_indices

X_train, submodel_indices, derivative_specs, derivative_indices = create_training_grid()

print(f"Total training points: {len(X_train)}")
print(f"Derivative indices (non-contiguous): {derivative_indices}")
print(f"\nsubmodel_indices: {len(submodel_indices[0])} entries (one per derivative)")
print(f"\nderivative_specs:")
deriv_names = ["df/dx1", "df/dx2", "d2f/dx1^2", "d2f/dx2^2"]
for i, spec in enumerate(derivative_specs[0]):
    print(f"  [{i}] {deriv_names[i]}: {spec}")

Total training points: 16
Derivative indices (non-contiguous): [5, 6, 9, 10]

submodel_indices: 4 entries (one per derivative)

derivative_specs:
  [0] df/dx1: [[[1, 1]], [[2, 1]]]
  [1] df/dx2: [[[1, 2]], [[2, 2]]]

Data structure explanation:

submodel_indices[0] has 4 entries (one per derivative)
submodel_indices[0][i] = [5, 6, 9, 10]: Point indices for derivative i
derivative_specs[0][i]: The derivative specification for derivative i

Data structure explanation:

submodel_indices[0][0] = [5, 6, 9, 10]: 1st order derivatives at these points
submodel_indices[0][1] = [5, 6, 9, 10]: 2nd order derivatives at same points
derivative_specs[0][0] = [[[1,1]], [[2,1]]]: 1st order specs (df/dx1, df/dx2)
derivative_specs[0][1] = [[[1,2]], [[2,2]]]: 2nd order specs (d²f/dx1², d²f/dx2²)

—

Visualize grid and derivative points#

def visualize_grid(X_train, derivative_indices, lb=-1.0, ub=1.0):
    """Visualize the training grid with derivative points highlighted."""
    fig, ax = plt.subplots(figsize=(8, 8))

    # All points
    ax.scatter(X_train[:, 0], X_train[:, 1],
               c='lightgray', s=100, edgecolor='k', label='Function only')

    # Derivative points (non-contiguous indices!)
    ax.scatter(X_train[derivative_indices, 0], X_train[derivative_indices, 1],
               c='red', s=150, edgecolor='k', label=f'Derivative pts {derivative_indices}')

    # Label all points with their index
    for i, p in enumerate(X_train):
        label = f"{i}*" if i in derivative_indices else str(i)
        ax.text(p[0] + 0.03, p[1] + 0.03, label, fontsize=10)

    ax.set_title("Training Grid\n(* = derivative points, indices are non-contiguous)")
    ax.set_xlabel("$x_1$")
    ax.set_ylabel("$x_2$")
    ax.legend(loc='upper left')
    ax.grid(True, alpha=0.3)
    ax.set_aspect('equal')

    plt.tight_layout()
    plt.show()

visualize_grid(X_train, derivative_indices)

../../_images/DEGP_2d_sparse_hybrid_4_0.png

—

Prepare training data with derivatives#

def prepare_training_data(X_train, submodel_indices, derivative_specs, func, n_order=2):
    """
    Prepare function values and derivatives for WDEGP.

    Returns:
        submodel_data: List of [func_values, deriv_0, deriv_1, ...]
        where each deriv corresponds to one entry in derivative_specs
    """
    # Function values at ALL training points
    y_function_values = func(X_train, alg=np).reshape(-1, 1)

    submodel_data = []

    for k, (sm_indices, sm_deriv_specs) in enumerate(zip(submodel_indices, derivative_specs)):
        # Get the point indices for derivatives (same for all derivs in this example)
        point_indices = sm_indices[0]

        # Create OTI array for automatic differentiation
        X_sub = oti.array(X_train[point_indices])

        # Add OTI dual components for each dimension
        for dim in range(n_bases):
            for j in range(X_sub.shape[0]):
                X_sub[j, dim] += oti.e(dim + 1, order=n_order)

        # Evaluate function with OTI to get derivatives automatically
        y_with_derivs = oti.array([
            func(x, alg=oti)[0] for x in X_sub
        ])

        # Build submodel data: [func_values, deriv_0, deriv_1, ...]
        # Each derivative corresponds to one entry in derivative_specs
        current_data = [y_function_values]

        # Extract derivatives for each order
        for order_idx, order_specs in enumerate(sm_deriv_specs):
             for deriv_spec in order_specs:
                 deriv_values = y_with_derivs.get_deriv(deriv_spec).reshape(-1, 1)
                 current_data.append(deriv_values)

        submodel_data.append(current_data)

        print(f"Submodel {k} data structure:")
        print(f"  Function values: shape {current_data[0].shape} (all {len(X_train)} points)")
        deriv_names = ["df/dx1", "df/dx2", "d2f/dx1^2", "d2f/dx2^2"]
        for idx, arr in enumerate(current_data[1:]):
            print(f"  [{idx}] {deriv_names[idx]}: shape {arr.shape} ({len(point_indices)} points)")

    return submodel_data

submodel_data = prepare_training_data(
    X_train, submodel_indices, derivative_specs, six_hump_camelback, n_order
)
print(f"\nTotal submodels: {len(submodel_data)}")
print(f"Arrays per submodel: {len(submodel_data[0])}")

Submodel 0 data structure:
  Function values: shape (16, 1) (all 16 points)
  [0] df/dx1: shape (4, 1) (4 points)
  [1] df/dx2: shape (4, 1) (4 points)
  [2] d2f/dx1^2: shape (4, 1) (4 points)
  [3] d2f/dx2^2: shape (4, 1) (4 points)

Total submodels: 1
Arrays per submodel: 5

—

Build and optimize the GP model#

print("Building WDEGP model...")
print(f"  Training points: {len(X_train)}")
print(f"  Derivative points: {len(derivative_indices)} (at indices {derivative_indices})")
print(f"  Derivatives per point: {n_bases * n_order} (2 dims × 2 orders)")

gp_model = wdegp(
    X_train,
    submodel_data,
    n_order,
    n_bases,
    derivative_specs,
    derivative_locations = submodel_indices,
    normalize=normalize_data,
    kernel=kernel,
    kernel_type=kernel_type,
)

print("\nOptimizing hyperparameters...")
params = gp_model.optimize_hyperparameters(
    optimizer='jade',
    pop_size=100,
    n_generations=15,
    local_opt_every=None,
    debug=True
)

print("\nOptimized parameters:")
print(params)

Building WDEGP model...
  Training points: 16
  Derivative points: 4 (at indices [5, 6, 9, 10])
  Derivatives per point: 4 (2 dims × 2 orders)

Optimizing hyperparameters...

Gen 1: best f=47.91385918475584
Gen 2: best f=18.045345256799862
Gen 3: best f=18.045345256799862
Gen 4: best f=18.045345256799862
Gen 5: best f=15.348355135768827
Gen 6: best f=15.348355135768827
Gen 7: best f=15.348355135768827

Gen 8: best f=13.233077630609856
Gen 9: best f=11.936236154792113
Gen 10: best f=11.936236154792113
Gen 11: best f=11.516952793850749
Gen 12: best f=11.516952793850749
Gen 13: best f=11.516952793850749
Gen 14: best f=11.516952793850749

Gen 15: best f=11.516952793850749

Optimized parameters:
[ -0.26883153  -0.09019859   0.94748321 -14.09571385]

—

Evaluate model performance on test grid#

def evaluate_gp(gp_model, func, params, lb=-1.0, ub=1.0, N=50):
    """Evaluate GP on a dense test grid."""
    x_lin = np.linspace(lb, ub, N)
    y_lin = np.linspace(lb, ub, N)
    X1, X2 = np.meshgrid(x_lin, y_lin)
    X_test = np.column_stack([X1.ravel(), X2.ravel()])

    y_pred = gp_model.predict(X_test, params, calc_cov=False)
    y_true = func(X_test, alg=np)

    # Error metrics
    rmse = np.sqrt(np.mean((y_true - y_pred.flatten())**2))
    nrmse = rmse / (y_true.max() - y_true.min())
    mae = np.mean(np.abs(y_true - y_pred.flatten()))

    return X1, X2, y_true.reshape(N, N), y_pred.reshape(N, N), nrmse, rmse, mae

X1, X2, y_true_grid, y_pred_grid, nrmse, rmse, mae = evaluate_gp(
    gp_model, six_hump_camelback, params
)
print(f"NRMSE: {nrmse:.6f}")
print(f"RMSE:  {rmse:.6f}")
print(f"MAE:   {mae:.6f}")

NRMSE: 0.012538
RMSE:  0.053467
MAE:   0.034333

—

Visualize results#

def visualize_results(X1, X2, y_true, y_pred, X_train, derivative_indices, nrmse):
    """Create contour plots of true function, prediction, and error."""
    abs_err = np.abs(y_true - y_pred)
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))

    titles = ["True Function", "GP Prediction", f"Absolute Error (NRMSE={nrmse:.4f})"]
    data = [y_true, y_pred, abs_err]
    cmaps = ["viridis", "viridis", "magma"]

    for ax, Z, title, cmap in zip(axes, data, titles, cmaps):
        c = ax.contourf(X1, X2, Z, levels=50, cmap=cmap)
        fig.colorbar(c, ax=ax)

        # All training points
        ax.scatter(X_train[:, 0], X_train[:, 1],
                   c='red', edgecolor='k', s=40, label='Training pts')

        # Derivative points (highlighted)
        ax.scatter(X_train[derivative_indices, 0], X_train[derivative_indices, 1],
                   c='white', edgecolor='k', s=100, marker='s',
                   label=f'Deriv pts {derivative_indices}')

        ax.set_title(title)
        ax.set_xlabel("$x_1$")
        ax.set_ylabel("$x_2$")
        ax.legend(loc='upper right', fontsize=8)
        ax.grid(alpha=0.3)

    plt.tight_layout()
    plt.show()

visualize_results(X1, X2, y_true_grid, y_pred_grid, X_train, derivative_indices, nrmse)

../../_images/DEGP_2d_sparse_hybrid_8_0.png

—

Verify derivative interpolation#

Since this is a single submodel, return_deriv=True is available for direct derivative predictions.

print("=" * 70)
print("Derivative verification using return_deriv=True")
print("=" * 70)

# Get predictions with derivatives at the derivative training points
X_deriv_pts = X_train[derivative_indices]
y_pred_with_derivs = gp_model.predict(X_deriv_pts, params, calc_cov=False, return_deriv=True)

print(f"\nPrediction shape: {y_pred_with_derivs.shape}")
print("  Row 0: function values")
print("  Row 1: df/dx1 (1st order, dim 1)")
print("  Row 2: df/dx2 (1st order, dim 2)")
print("  Row 3: d²f/dx1² (2nd order, dim 1)")
print("  Row 4: d²f/dx2² (2nd order, dim 2)")

# Compare with training data
print("\n1st order derivatives (df/dx1):")
pred_d1_x1 = y_pred_with_derivs[1, :]
true_d1_x1 = submodel_data[0][1].flatten()  # 1st deriv array
for i, idx in enumerate(derivative_indices):
    rel_err = abs(pred_d1_x1[i] - true_d1_x1[i]) / abs(true_d1_x1[i]) if true_d1_x1[i] != 0 else 0
    print(f"  Point {idx}: Pred={pred_d1_x1[i]:.6f}, True={true_d1_x1[i]:.6f}, RelErr={rel_err:.2e}")

print("\n1st order derivatives (df/dx2):")
pred_d1_x2 = y_pred_with_derivs[2, :]
true_d1_x2 = submodel_data[0][2].flatten()
for i, idx in enumerate(derivative_indices):
    rel_err = abs(pred_d1_x2[i] - true_d1_x2[i]) / abs(true_d1_x2[i]) if true_d1_x2[i] != 0 else 0
    print(f"  Point {idx}: Pred={pred_d1_x2[i]:.6f}, True={true_d1_x2[i]:.6f}, RelErr={rel_err:.2e}")

======================================================================
Derivative verification using return_deriv=True
======================================================================
Making predictions for all derivatives that are common among submodels

Prediction shape: (5, 4)
  Row 0: function values
  Row 1: df/dx1 (1st order, dim 1)
  Row 2: df/dx2 (1st order, dim 2)
  Row 3: d²f/dx1² (2nd order, dim 1)
  Row 4: d²f/dx2² (2nd order, dim 2)

1st order derivatives (df/dx1):
  Point 5: Pred=5.323457, True=-2.697119, RelErr=2.97e+00
  Point 6: Pred=5.323457, True=-2.030453, RelErr=3.62e+00
  Point 9: Pred=5.323457, True=2.030453, RelErr=1.62e+00
  Point 10: Pred=5.323457, True=2.697119, RelErr=9.74e-01

1st order derivatives (df/dx2):
  Point 5: Pred=-2.697119, True=1.740741, RelErr=2.55e+00
  Point 6: Pred=-2.030453, True=-2.407407, RelErr=1.57e-01
  Point 9: Pred=2.030453, True=2.407407, RelErr=1.57e-01
  Point 10: Pred=2.697119, True=-1.740741, RelErr=2.55e+00

—

Summary#

This tutorial demonstrated a 2D sparse DEGP model with selective derivative placement.

Key features:

No reordering needed: Non-contiguous indices ([5, 6, 9, 10]) are used directly
Interior derivative points: Derivatives at interior grid points improve accuracy in nonlinear regions
Direct derivative verification: return_deriv=True provides predicted derivatives

Data structures used:

# submodel_indices[submodel][deriv_idx] = [point indices]
# One entry per derivative (4 total for 2D × 2 orders)
submodel_indices = [
    [d_indices, d_indices, d_indices, d_indices]
]

# derivative_specs[submodel][deriv_idx] = derivative spec
derivative_specs = [
    [
        [[[1, 1]]],  # df/dx1
        [[[2, 1]]],  # df/dx2
        [[[1, 2]]],  # d²f/dx1²
        [[[2, 2]]]   # d²f/dx2²
    ]
]

Advantages:

Efficient use of derivative information at strategic locations
No data reorganization required
Direct access to derivative predictions via return_deriv=True