Derivative-Enhanced Gaussian Process (DEGP)

Contents

Derivative-Enhanced Gaussian Process (DEGP)#

Overview#

The Derivative-Enhanced Gaussian Process (DEGP) extends standard Gaussian Process (GP) regression by incorporating derivative information at training points. This enables the model to:

  • Capture local nonlinear behavior more accurately

  • Improve predictions in regions where function samples are sparse

  • Refine uncertainty estimates and predictive gradients using higher-order information

This tutorial demonstrates DEGP on 1D and 2D functions, first using first-order derivatives only, and then including second-order derivatives for improved curvature representation.

The examples also illustrate how DEGP predictions can be validated at training points and visualized over a range of inputs.

Example 1: 1D First-Order Derivatives Only#

Overview#

Learn a 1D function \(f(x) = \sin(x)\) using both the function values and its first derivative \(f'(x) = \cos(x)\). Incorporating derivatives allows the GP to respect the slope of the function, leading to more accurate interpolation.

Step 1: Import required packages#

import numpy as np
from jetgp.full_degp.degp import degp
import matplotlib.pyplot as plt

print("Modules imported successfully.")
Modules imported successfully.

Explanation: We import numpy for numerical operations, matplotlib.pyplot for visualization, and the DEGP class from the full_degp package.

Step 2: Define training data#

X_train = np.array([[0.0], [0.5], [1.0]])
y_func = np.sin(X_train)
y_deriv1 = np.cos(X_train)
y_train = [y_func, y_deriv1]

print("X_train:", X_train)
print("y_train:", y_train)
X_train: [[0. ]
 [0.5]
 [1. ]]
y_train: [array([[0.        ],
       [0.47942554],
       [0.84147098]]), array([[1.        ],
       [0.87758256],
       [0.54030231]])]

Explanation: Here, we define three training points and compute their function values and first derivatives. y_train is a list containing both values and derivatives, which the DEGP model will use for training.

Step 3: Define derivative indices and locations#

der_indices = [[[[1, 1]]]]

# derivative_locations must be provided - one entry per derivative
derivative_locations = []
for i in range(len(der_indices)):
   for j in range(len(der_indices[i])):
         derivative_locations.append([k for k in range(len(X_train))])

print("der_indices:", der_indices)
print("derivative_locations:", derivative_locations)
der_indices: [[[[1, 1]]]]
derivative_locations: [[0, 1, 2]]

Explanation: der_indices tells the DEGP model which derivatives to include. In 1D, the first order derivative is labeled as [[1,1]]. derivative_locations specifies which training points have each derivative—here, all points have the first derivative.

Step 4: Initialize DEGP model#

model = degp(X_train, y_train, n_order=1, n_bases=1,
             der_indices=der_indices,
             derivative_locations=derivative_locations,
             normalize=True,
             kernel="SE", kernel_type="anisotropic")

print("DEGP model initialized.")
DEGP model initialized.

Explanation: - n_order=1 specifies first-order derivatives - kernel="SE" uses a Squared Exponential kernel - kernel_type="anisotropic" allows the model to learn separate length scales for each dimension - normalize=True ensures that both function values and derivatives are scaled for numerical stability

Step 5: Optimize hyperparameters#

params =  model.optimize_hyperparameters(
     optimizer='pso',
     pop_size = 100,
     n_generations = 15,
     local_opt_every = 15,
     debug = False
     )
print("Optimized hyperparameters:", params)
Stopping: maximum iterations reached --> 15
Optimized hyperparameters: [-0.79592036  0.75114571 -8.35496602]

Explanation: Hyperparameters of the kernel (length scale, variance, noise, etc.) are optimized by maximizing the marginal log likelihood. Particle Swarm Optimization (PSO) is used here for robustness.

Step 6: Validate predictions at training points#

y_train_pred = model.predict(X_train, params, calc_cov=False, return_deriv=True)

# Output shape is [n_derivatives + 1, n_points]
# Row 0: function values, Row 1: first derivative
y_func_pred = y_train_pred[0, :]
y_deriv_pred = y_train_pred[1, :]

abs_error_func = np.abs(y_func_pred.flatten() - y_func.flatten())
abs_error_deriv = np.abs(y_deriv_pred.flatten() - y_deriv1.flatten())

print("Absolute error (function) at training points:", abs_error_func)
print("Absolute error (derivative) at training points:", abs_error_deriv)
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]]]
Absolute error (function) at training points: [8.64208705e-12 3.13638004e-14 1.59205982e-12]
Absolute error (derivative) at training points: [1.26565425e-14 9.01723141e-13 1.78046466e-12]

Explanation: We first check predictions at training points to ensure that the DEGP model exactly interpolates function values and derivatives, as expected for noiseless training data. The output has shape [n_derivatives + 1, n_points] where row 0 contains function predictions and subsequent rows contain derivative predictions.

Step 7: Visualize predictions#

X_test = np.linspace(0, 1, 100).reshape(-1, 1)
y_test_pred = model.predict(X_test, params, calc_cov=False, return_deriv=True)

# Row 0: function, Row 1: first derivative
y_func_test = y_test_pred[0, :]
y_deriv_test = y_test_pred[1, :]

plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.plot(X_test, np.sin(X_test), 'k--', label='True f(x)')
plt.plot(X_test, y_func_test, 'r-', label='GP prediction')
plt.scatter(X_train, y_func, c='b', s=50, label='Training points')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Function and GP Prediction')
plt.legend()

plt.subplot(1,2,2)
plt.plot(X_test, np.cos(X_test), 'k--', label="True f'(x)")
plt.plot(X_test, y_deriv_test, 'r-', label='GP derivative prediction')
plt.scatter(X_train, y_deriv1, c='b', s=50, label='Training points')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.title('Derivative and GP Prediction')
plt.legend()

plt.tight_layout()
plt.show()
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]]]
../_images/degp_getting_started_6_1.png

Explanation: The plots compare DEGP predictions with the true function and derivatives.

Summary#

This example demonstrates first-order DEGP on a 1D function. By incorporating derivative information at training points, the GP achieves:

  • Exact interpolation of both function values and slopes at training locations

  • Improved accuracy between training points

  • Better extrapolation behavior near boundaries

Example 2: 1D First- and Second-Order Derivatives#

Overview#

Improve the GP model by incorporating second-order derivatives. This gives the GP information about curvature, enhancing interpolation and derivative predictions.

Step 1: Define training data#

X_train = np.array([[0.0], [0.5], [1.0]])
y_func = np.sin(X_train)
y_deriv1 = np.cos(X_train)
y_deriv2 = -np.sin(X_train)
y_train = [y_func, y_deriv1, y_deriv2]

print("X_train:", X_train)
print("y_train:", y_train)
X_train: [[0. ]
 [0.5]
 [1. ]]
y_train: [array([[0.        ],
       [0.47942554],
       [0.84147098]]), array([[1.        ],
       [0.87758256],
       [0.54030231]]), array([[-0.        ],
       [-0.47942554],
       [-0.84147098]])]

Explanation: The second derivative is included in the training data. DEGP can now enforce both slope and curvature.

Step 2: Define derivative indices and locations#

der_indices = [[[[1, 1]], [[1, 2]]]]

# All derivatives at all training points
derivative_locations = []
for i in range(len(der_indices)):
   for j in range(len(der_indices[i])):
         derivative_locations.append([k for k in range(len(X_train))])

print("der_indices:", der_indices)
print("derivative_locations:", derivative_locations)
der_indices: [[[[1, 1]], [[1, 2]]]]
derivative_locations: [[0, 1, 2], [0, 1, 2]]

Explanation: The first order derivative is labeled [[1,1]] and the second order derivative [[1,2]]. derivative_locations specifies that both derivatives are available at all training points. —

Step 3: Initialize DEGP model for second-order derivatives#

model = degp(X_train, y_train, n_order=2, n_bases=1,
             der_indices=der_indices,
             derivative_locations=derivative_locations,
             normalize=True,
             kernel="SE", kernel_type="anisotropic")

print("DEGP model (2nd order) initialized.")
DEGP model (2nd order) initialized.

Explanation: Setting n_order=2 instructs the model to incorporate both first- and second-order derivatives.

Step 4: Optimize hyperparameters#

params = model.optimize_hyperparameters(
     optimizer='pso',
     pop_size = 100,
     n_generations = 15,
     local_opt_every = 15,
     debug = True
     )
print("Optimized hyperparameters:", params)
New best for swarm at iteration 1: [-0.60812005  0.34549871 -6.32207447] -34.006062561208275
Best after iteration 1: [-0.60812005  0.34549871 -6.32207447] -34.006062561208275
Best after iteration 2: [-0.60812005  0.34549871 -6.32207447] -34.006062561208275
New best for swarm at iteration 3: [-0.86011993  1.37439028 -7.12325797] -36.28910006317677
Best after iteration 3: [-0.86011993  1.37439028 -7.12325797] -36.28910006317677
New best for swarm at iteration 4: [-0.77097232  0.61780118 -6.30445074] -39.14169411769196
Best after iteration 4: [-0.77097232  0.61780118 -6.30445074] -39.14169411769196
New best for swarm at iteration 5: [-0.86584625  1.06557634 -8.50315447] -40.858224603571486
New best for swarm at iteration 5: [-0.77207861  0.63088356 -7.54244312] -42.00212158203552
Best after iteration 5: [-0.77207861  0.63088356 -7.54244312] -42.00212158203552
Best after iteration 6: [-0.77207861  0.63088356 -7.54244312] -42.00212158203552
Best after iteration 7: [-0.77207861  0.63088356 -7.54244312] -42.00212158203552
New best for swarm at iteration 8: [-0.79554057  0.65481256 -7.92234433] -42.127663086996684
Best after iteration 8: [-0.79554057  0.65481256 -7.92234433] -42.127663086996684
New best for swarm at iteration 9: [-0.7950314   0.71319527 -7.52673956] -43.308113707113925
New best for swarm at iteration 9: [-0.79121866  0.67783742 -7.50496889] -43.53304629142166
Best after iteration 9: [-0.79121866  0.67783742 -7.50496889] -43.53304629142166
Best after iteration 10: [-0.79121866  0.67783742 -7.50496889] -43.53304629142166
Best after iteration 11: [-0.79121866  0.67783742 -7.50496889] -43.53304629142166
New best for swarm at iteration 12: [-0.79496483  0.65926043 -7.55058814] -44.18980550636026
Best after iteration 12: [-0.79496483  0.65926043 -7.55058814] -44.18980550636026
Best after iteration 13: [-0.79496483  0.65926043 -7.55058814] -44.18980550636026
Best after iteration 14: [-0.79496483  0.65926043 -7.55058814] -44.18980550636026
Best after iteration 15: [-0.79496483  0.65926043 -7.55058814] -44.18980550636026
Stopping: maximum iterations reached --> 15
Optimized hyperparameters: [-0.79496483  0.65926043 -7.55058814]

Explanation: Hyperparameters of the kernel are optimized by maximizing the marginal log likelihood. Particle Swarm Optimization (PSO) is used here for robustness.

Step 5: Validate predictions at training points#

y_train_pred = model.predict(X_train, params, calc_cov=False, return_deriv=True)

# Output shape is [n_derivatives + 1, n_points]
# Row 0: function, Row 1: 1st derivative, Row 2: 2nd derivative
y_func_pred = y_train_pred[0, :]
y_deriv1_pred = y_train_pred[1, :]
y_deriv2_pred = y_train_pred[2, :]

abs_error_func = np.abs(y_func_pred.flatten() - y_func.flatten())
abs_error_deriv1 = np.abs(y_deriv1_pred.flatten() - y_deriv1.flatten())
abs_error_deriv2 = np.abs(y_deriv2_pred.flatten() - y_deriv2.flatten())

print("Absolute error (function) at training points:", abs_error_func)
print("Absolute error (1st derivative) at training points:", abs_error_deriv1)
print("Absolute error (2nd derivative) at training points:", abs_error_deriv2)
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[1, 2]]]
Warning: Cholesky decomposition failed via scipy, using standard np solve instead.
Absolute error (function) at training points: [2.39336595e-09 1.44918683e-09 5.23176102e-09]
Absolute error (1st derivative) at training points: [5.27460076e-10 9.93388150e-10 1.51192014e-09]
Absolute error (2nd derivative) at training points: [1.27135123e-09 2.15520962e-09 2.29917752e-10]

Explanation: With second derivatives included, DEGP predictions should match both slopes and curvature at training points.

Step 6: Visualize predictions#

X_test = np.linspace(0, 1, 100).reshape(-1, 1)
y_test_pred = model.predict(X_test, params, calc_cov=False, return_deriv=True)

# Row 0: function, Row 1: 1st derivative, Row 2: 2nd derivative
y_func_test = y_test_pred[0, :]
y_deriv1_test = y_test_pred[1, :]
y_deriv2_test = y_test_pred[2, :]

plt.figure(figsize=(18,5))

plt.subplot(1,3,1)
plt.plot(X_test, np.sin(X_test), 'k--', label='True f(x)')
plt.plot(X_test, y_func_test, 'r-', label='GP prediction')
plt.scatter(X_train, y_func, c='b', s=50, label='Training points')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Function')
plt.legend()

plt.subplot(1,3,2)
plt.plot(X_test, np.cos(X_test), 'k--', label="True f'(x)")
plt.plot(X_test, y_deriv1_test, 'r-', label='GP derivative prediction')
plt.scatter(X_train, y_deriv1, c='b', s=50, label='Training points')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.title('1st Derivative')
plt.legend()

plt.subplot(1,3,3)
plt.plot(X_test, -np.sin(X_test), 'k--', label="True f''(x)")
plt.plot(X_test, y_deriv2_test, 'r-', label='GP 2nd derivative prediction')
plt.scatter(X_train, y_deriv2, c='b', s=50, label='Training points')
plt.xlabel('x')
plt.ylabel("f''(x)")
plt.title('2nd Derivative')
plt.legend()

plt.tight_layout()
plt.show()
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[1, 2]]]
Warning: Cholesky decomposition failed via scipy, using standard np solve instead.
../_images/degp_getting_started_12_1.png

Explanation: The three-panel plot shows function, first derivative, and second derivative predictions. Including higher-order derivatives improves the model’s ability to interpolate curvature between points.

Summary#

This example demonstrates second-order DEGP on a 1D function. By incorporating both first and second derivatives:

  • The GP captures local curvature information

  • Predictions are more accurate in regions with high curvature

  • Derivative predictions are smoother and more reliable

Example 3: 2D First-Order Derivatives#

Overview#

Learn a 2D function \(f(x,y) = \sin(x)\cos(y)\) using function values and first derivatives with respect to \(x\) and \(y\).

Step 1: Import required packages#

import numpy as np
from jetgp.full_degp.degp import degp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

print("Modules imported successfully for 2D example.")
Modules imported successfully for 2D example.

Explanation: Additional imports for 3D visualization are included.

Step 2: Define training data#

X1 = np.array([0.0, 0.5, 1.0])
X2 = np.array([0.0, 0.5, 1.0])
X1_grid, X2_grid = np.meshgrid(X1, X2)
X_train = np.column_stack([X1_grid.flatten(), X2_grid.flatten()])

y_func = np.sin(X_train[:,0]) * np.cos(X_train[:,1])
y_deriv_x = np.cos(X_train[:,0]) * np.cos(X_train[:,1])
y_deriv_y = -np.sin(X_train[:,0]) * np.sin(X_train[:,1])
y_train = [y_func.reshape(-1,1), y_deriv_x.reshape(-1,1), y_deriv_y.reshape(-1,1)]

print("X_train shape:", X_train.shape)
print("y_train shapes:", [v.shape for v in y_train])
X_train shape: (9, 2)
y_train shapes: [(9, 1), (9, 1), (9, 1)]

Explanation: We define a 3×3 2D grid. Function values and first derivatives in both dimensions are included, allowing DEGP to learn the local slopes along each axis.

Step 3: Define derivative indices and locations#

der_indices = [[[[1, 1]], [[2, 1]]]]

# All derivatives at all training points
derivative_locations = []
for i in range(len(der_indices)):
   for j in range(len(der_indices[i])):
         derivative_locations.append([k for k in range(len(X_train))])

print("der_indices:", der_indices)
print("derivative_locations:", derivative_locations)
der_indices: [[[[1, 1]], [[2, 1]]]]
derivative_locations: [[0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8]]

Explanation: The derivative indices specify first derivatives with respect to \(x\) ([[1,1]]) and \(y\) ([[2,1]]). derivative_locations indicates both partial derivatives are available at all 9 training points.

Step 4: Initialize DEGP model#

model = degp(X_train, y_train, n_order=1, n_bases=2,
             der_indices=der_indices,
             derivative_locations=derivative_locations,
             normalize=True,
             kernel="SE", kernel_type="anisotropic")

print("2D DEGP model initialized.")
2D DEGP model initialized.

Explanation: Here, (n_bases=2) which specifices the dimensionality of the problem.

Step 5: Optimize hyperparameters#

params = model.optimize_hyperparameters(
     optimizer='pso',
     pop_size = 100,
     n_generations = 15,
     local_opt_every = 15,
     debug = False
     )
print("Optimized hyperparameters:", params)
Stopping: maximum iterations reached --> 15
Optimized hyperparameters: [ -0.67306813  -0.7377549    0.68515504 -11.73930474]

Explanation: Hyperparameter optimization for 2D problems may take longer due to increased dimensionality.

Step 6: Validate predictions at training points#

y_train_pred = model.predict(X_train, params, calc_cov=False, return_deriv=True)

# Output shape is [n_derivatives + 1, n_points]
# Row 0: function, Row 1: df/dx, Row 2: df/dy
y_func_pred = y_train_pred[0, :]
y_deriv_x_pred = y_train_pred[1, :]
y_deriv_y_pred = y_train_pred[2, :]

print("Predictions at training points (function):", y_func_pred)
print("Predictions at training points (deriv x):", y_deriv_x_pred)
print("Predictions at training points (deriv y):", y_deriv_y_pred)
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[2, 1]]]
Predictions at training points (function): [-3.73273079e-11  4.79425538e-01  8.41470985e-01  3.07638914e-11
  4.20735492e-01  7.38460263e-01  1.52025892e-10  2.59034724e-01
  4.54648713e-01]
Predictions at training points (deriv x): [1.         0.87758256 0.54030231 0.87758256 0.77015115 0.47415988
 0.54030231 0.47415988 0.29192658]
Predictions at training points (deriv y): [-1.52834267e-11 -3.98698087e-12  6.64496811e-12  5.64822289e-12
 -2.29848847e-01 -4.03422680e-01 -1.72769171e-11 -4.03422680e-01
 -7.08073418e-01]

Explanation: Predictions are split into function values and derivatives for each dimension.

Step 7: Compute absolute errors#

abs_error_func = np.abs(y_func_pred.flatten() - y_func)
abs_error_dx = np.abs(y_deriv_x_pred.flatten() - y_deriv_x)
abs_error_dy = np.abs(y_deriv_y_pred.flatten() - y_deriv_y)

print("Absolute error (function) at training points:", abs_error_func)
print("Absolute error (deriv x) at training points:", abs_error_dx)
print("Absolute error (deriv y) at training points:", abs_error_dy)
Absolute error (function) at training points: [3.73273079e-11 1.10396636e-10 2.25108487e-10 3.07638914e-11
 8.05978617e-11 1.23871691e-10 1.52025892e-10 3.17317284e-11
 5.29657429e-11]
Absolute error (deriv x) at training points: [1.71652692e-11 9.02500297e-13 3.28419514e-11 4.80211426e-11
 3.73590048e-12 3.72767373e-11 3.61644048e-11 1.67603709e-11
 1.36312073e-11]
Absolute error (deriv y) at training points: [1.52834267e-11 3.98698087e-12 6.64496811e-12 5.64822289e-12
 2.31470398e-12 6.94777569e-12 1.72769171e-11 1.79119497e-11
 1.23127064e-11]

Explanation: Verification that the model exactly interpolates training data.

Step 8: Generate test predictions#

x1_test = np.linspace(0, 1, 50)
x2_test = np.linspace(0, 1, 50)
X1_test, X2_test = np.meshgrid(x1_test, x2_test)
X_test = np.column_stack([X1_test.flatten(), X2_test.flatten()])

y_test_pred = model.predict(X_test, params, calc_cov=False, return_deriv=True)

# Row 0: function, Row 1: df/dx, Row 2: df/dy
y_func_test = y_test_pred[0, :].reshape(50, 50)
y_deriv_x_test = y_test_pred[1, :].reshape(50, 50)
y_deriv_y_test = y_test_pred[2, :].reshape(50, 50)
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[2, 1]]]

Explanation: A dense 50×50 test grid is created for visualization purposes.

Step 9: Visualize 2D predictions#

import matplotlib.pyplot as plt

# Compute absolute errors for function and first derivatives
abs_error_func = np.abs(y_func_test - (np.sin(X1_test) * np.cos(X2_test)))
abs_error_dx   = np.abs(y_deriv_x_test - (np.cos(X1_test) * np.cos(X2_test)))
abs_error_dy   = np.abs(y_deriv_y_test - (-np.sin(X1_test) * np.sin(X2_test)))

# Setup figure
fig, axs = plt.subplots(1, 3, figsize=(18, 5))

# Titles and error arrays
titles = [
   r'Absolute Error: $f(x,y)$',
   r'Absolute Error: $\partial f / \partial x$',
   r'Absolute Error: $\partial f / \partial y$'
]
errors = [abs_error_func, abs_error_dx, abs_error_dy]
cmaps  = ['Reds', 'Greens', 'Blues']

# Plot each contour
for ax, err, title, cmap in zip(axs, errors, titles, cmaps):
   cs = ax.contourf(X1_test, X2_test, err, levels=50, cmap=cmap)
   ax.scatter(X_train[:,0], X_train[:,1], c='k', s=40)
   ax.set_title(title, fontsize=14)
   ax.set_xlabel('x')
   ax.set_ylabel('y')
   fig.colorbar(cs, ax=ax)

plt.tight_layout()
plt.show()
../_images/degp_getting_started_21_0.png

Explanation: The contour plots show absolute errors for the function and its first derivatives. Including derivatives along each dimension allows the GP to capture local slopes, resulting in smoother and more accurate surfaces.

Summary#

This example demonstrates 2D first-order DEGP. By incorporating partial derivatives:

  • The GP learns directional slopes in the 2D space

  • Predictions are accurate across the entire domain

  • The model captures local gradient information effectively

Example 4: 2D Second-Order (Main) Derivatives#

Overview#

Learn a 2D function \(f(x,y) = \sin(x)\cos(y)\) using function values, first derivatives, and second-order derivatives \(\frac{\partial^2 f}{\partial x^2}\) and \(\frac{\partial^2 f}{\partial y^2}\) (excluding mixed partials).

Step 1: Import required packages#

import numpy as np
from jetgp.full_degp.degp import degp
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

print("Modules imported successfully for 2D example with second-order derivatives.")
Modules imported successfully for 2D example with second-order derivatives.

Explanation: Same imports as the previous 2D example.

Step 2: Define training data#

X1 = np.array([0.0, 0.5, 1.0])
X2 = np.array([0.0, 0.5, 1.0])
X1_grid, X2_grid = np.meshgrid(X1, X2)
X_train = np.column_stack([X1_grid.flatten(), X2_grid.flatten()])

# True function and derivatives
y_func = np.sin(X_train[:,0]) * np.cos(X_train[:,1])
y_deriv_x = np.cos(X_train[:,0]) * np.cos(X_train[:,1])
y_deriv_y = -np.sin(X_train[:,0]) * np.sin(X_train[:,1])
y_deriv_xx = -np.sin(X_train[:,0]) * np.cos(X_train[:,1])
y_deriv_yy = -np.sin(X_train[:,0]) * np.cos(X_train[:,1])

y_train = [
   y_func.reshape(-1,1),
   y_deriv_x.reshape(-1,1),
   y_deriv_y.reshape(-1,1),
   y_deriv_xx.reshape(-1,1),
   y_deriv_yy.reshape(-1,1)
]

print("X_train shape:", X_train.shape)
print("y_train shapes:", [v.shape for v in y_train])
X_train shape: (9, 2)
y_train shapes: [(9, 1), (9, 1), (9, 1), (9, 1), (9, 1)]

Explanation: This dataset includes function values, first derivatives, and second-order main derivatives, allowing DEGP to leverage curvature information for improved learning accuracy.

Step 3: Define derivative indices and locations#

der_indices = [
    [ [[1, 1]], [[2, 1]] ],  # first-order derivatives
    [ [[1, 2]], [[2, 2]] ]   # second-order derivatives
]

# All derivatives at all training points
derivative_locations = []
for i in range(len(der_indices)):
   for j in range(len(der_indices[i])):
         derivative_locations.append([k for k in range(len(X_train))])

print("der_indices:", der_indices)
print("derivative_locations:", derivative_locations)
der_indices: [[[[1, 1]], [[2, 1]]], [[[1, 2]], [[2, 2]]]]
derivative_locations: [[0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8]]

Explanation: The derivative indices specify both first-order and second-order partial derivatives with respect to \(x\) and \(y\). derivative_locations indicates all 4 derivative types are available at all 9 training points.

Step 4: Initialize DEGP model#

model = degp(X_train, y_train, n_order=2, n_bases=2,
             der_indices=der_indices,
             derivative_locations=derivative_locations,
             normalize=True,
             kernel="SE", kernel_type="anisotropic")

print("2D DEGP model with second-order derivatives initialized.")
2D DEGP model with second-order derivatives initialized.

Explanation: Setting n_order=2 enables the model to use both first and second-order derivative information.

Step 5: Optimize hyperparameters#

params =   model.optimize_hyperparameters(
     optimizer='pso',
     pop_size = 100,
     n_generations = 15,
     local_opt_every = 15,
     debug = False
     )
print("Optimized hyperparameters:", params)
Stopping: maximum iterations reached --> 15
Optimized hyperparameters: [ -0.68520552  -0.66361933   0.52920143 -13.18327835]

Explanation: Hyperparameters of the kernel are optimized by maximizing the marginal log likelihood.

Step 6: Validate predictions at training points#

y_train_pred = model.predict(X_train, params, calc_cov=False, return_deriv=True)

# Output shape is [n_derivatives + 1, n_points]
# Row 0: function, Row 1: df/dx, Row 2: df/dy, Row 3: d²f/dx², Row 4: d²f/dy²
y_func_pred = y_train_pred[0, :]
y_deriv_x_pred = y_train_pred[1, :]
y_deriv_y_pred = y_train_pred[2, :]
y_deriv_xx_pred = y_train_pred[3, :]
y_deriv_yy_pred = y_train_pred[4, :]
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[2, 1]], [[1, 2]], [[2, 2]]]

Explanation: Predictions are organized by derivative order and type.

Step 7: Compute absolute errors#

# First-order errors
abs_error_func = np.abs(y_func_pred.flatten() - y_func)
abs_error_dx   = np.abs(y_deriv_x_pred.flatten() - y_deriv_x)
abs_error_dy   = np.abs(y_deriv_y_pred.flatten() - y_deriv_y)

# Second-order main derivative errors
abs_error_dxx  = np.abs(y_deriv_xx_pred.flatten() - y_deriv_xx)
abs_error_dyy  = np.abs(y_deriv_yy_pred.flatten() - y_deriv_yy)

# Print absolute errors
print("Absolute error (function)       :", abs_error_func)
print("Absolute error (derivative x)  :", abs_error_dx)
print("Absolute error (derivative y)  :", abs_error_dy)
print("Absolute error (second x-x)    :", abs_error_dxx)
print("Absolute error (second y-y)    :", abs_error_dyy)
Absolute error (function)       : [2.52768654e-08 1.60906034e-07 1.35368510e-07 1.62180755e-08
 7.50793940e-08 1.63332514e-07 6.96886056e-08 1.01456696e-07
 1.23133645e-07]
Absolute error (derivative x)  : [1.03199227e-09 1.37350364e-09 3.28260918e-08 2.77699750e-08
 8.61559712e-10 8.69844474e-10 3.84339893e-09 7.62644808e-09
 1.02488575e-08]
Absolute error (derivative y)  : [1.17589356e-08 3.62762099e-08 8.88830934e-09 2.66649280e-08
 2.57774815e-08 1.99548687e-08 2.78982341e-08 1.12605335e-08
 4.43919046e-09]
Absolute error (second x-x)    : [1.32362785e-08 4.96790827e-08 3.60800354e-08 7.15397089e-08
 4.42814444e-08 3.76778233e-08 5.39412530e-08 3.57222632e-08
 2.48760330e-08]
Absolute error (second y-y)    : [5.19554857e-09 3.52122796e-08 1.19578457e-08 1.53652815e-09
 4.80837005e-08 3.36570827e-09 4.30748738e-08 3.52404705e-08
 3.27628785e-09]

Explanation: Verification that all derivatives are correctly interpolated at training points.

Step 8: Generate test predictions#

x1_test = np.linspace(0, 1, 50)
x2_test = np.linspace(0, 1, 50)
X1_test, X2_test = np.meshgrid(x1_test, x2_test)
X_test = np.column_stack([X1_test.flatten(), X2_test.flatten()])

y_test_pred = model.predict(X_test, params, calc_cov=False, return_deriv=True)

# Row 0: function, Row 1: df/dx, Row 2: df/dy, Row 3: d²f/dx², Row 4: d²f/dy²
y_func_test = y_test_pred[0, :].reshape(50, 50)
y_deriv_x_test = y_test_pred[1, :].reshape(50, 50)
y_deriv_y_test = y_test_pred[2, :].reshape(50, 50)
y_deriv_xx_test = y_test_pred[3, :].reshape(50, 50)
y_deriv_yy_test = y_test_pred[4, :].reshape(50, 50)
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[2, 1]], [[1, 2]], [[2, 2]]]

Explanation: All predictions (function and derivatives) are computed and reshaped for visualization.

Step 9: Visualize absolute errors#

import matplotlib.pyplot as plt

# Compute absolute errors
abs_error_func = np.abs(y_func_test - (np.sin(X1_test) * np.cos(X2_test)))
abs_error_dx   = np.abs(y_deriv_x_test - (np.cos(X1_test) * np.cos(X2_test)))
abs_error_dy   = np.abs(y_deriv_y_test - (-np.sin(X1_test) * np.sin(X2_test)))
abs_error_xx   = np.abs(y_deriv_xx_test - (-np.sin(X1_test) * np.cos(X2_test)))
abs_error_yy   = np.abs(y_deriv_yy_test - (-np.sin(X1_test) * np.cos(X2_test)))

# Setup figure
fig, axs = plt.subplots(1, 5, figsize=(24, 5))

# Titles and errors
titles = [
   r'Absolute Error: $f(x,y)$',
   r'Absolute Error: $\partial f / \partial x$',
   r'Absolute Error: $\partial f / \partial y$',
   r'Absolute Error: $\partial^2 f / \partial x^2$',
   r'Absolute Error: $\partial^2 f / \partial y^2$'
]

errors = [abs_error_func, abs_error_dx, abs_error_dy, abs_error_xx, abs_error_yy]
cmaps  = ['Reds', 'Greens', 'Blues', 'Greens', 'Blues']

# Plot all
for ax, err, title, cmap in zip(axs, errors, titles, cmaps):
   cs = ax.contourf(X1_test, X2_test, err, levels=50, cmap=cmap)
   ax.scatter(X_train[:,0], X_train[:,1], c='k', s=40)
   ax.set_title(title, fontsize=12)
   ax.set_xlabel('x')
   ax.set_ylabel('y')
   fig.colorbar(cs, ax=ax)

plt.tight_layout()
plt.show()
../_images/degp_getting_started_30_0.png

Explanation: Including second-order derivatives helps DEGP capture curvature and local convexity/concavity information, enhancing interpolation and extrapolation accuracy near sparse training regions.

Summary#

This example demonstrates 2D second-order DEGP. By incorporating both first and second-order partial derivatives:

  • The GP captures local curvature in multiple dimensions

  • Predictions account for surface bending and convexity

  • Accuracy is significantly improved in regions with high curvature

  • The model provides reliable derivative predictions across the domain

Key Benefits of Higher-Order Derivatives:

  • More accurate interpolation between sparse training points

  • Better extrapolation behavior

  • Improved gradient predictions for optimization applications

  • Reduced uncertainty in high-curvature regions

Example 5: 1D Heterogeneous Derivative Coverage#

Overview#

This example demonstrates how to use different derivative orders at different training points using the derivative_locations parameter. This is useful when:

  • Higher-order derivatives are expensive to compute and should only be used where needed

  • Derivative information is only available at certain locations (e.g., from sensors or simulations)

  • You want to concentrate derivative information in regions of high curvature or importance

We learn \(f(x) = \sin(2x) + 0.5\cos(5x)\) with:

  • All points: Function values

  • All points: First-order derivatives

  • Interior points only: Second-order derivatives (curvature information where it matters most)

Step 1: Import required packages#

import numpy as np
from jetgp.full_degp.degp import degp
import matplotlib.pyplot as plt

print("Modules imported successfully.")
Modules imported successfully.

Step 2: Define training data with heterogeneous derivative coverage#

# Define training points
X_train = np.linspace(0, 2, 7).reshape(-1, 1)
n_train = len(X_train)

# Identify boundary and interior points
boundary_indices = [0, n_train - 1]  # First and last points
interior_indices = list(range(1, n_train - 1))  # Middle points
all_indices = list(range(n_train))

print(f"Total training points: {n_train}")
print(f"Boundary indices: {boundary_indices}")
print(f"Interior indices: {interior_indices}")

# Define the true function and its derivatives
def f(x):
    return np.sin(2*x) + 0.5*np.cos(5*x)

def df(x):
    return 2*np.cos(2*x) - 2.5*np.sin(5*x)

def d2f(x):
    return -4*np.sin(2*x) - 12.5*np.cos(5*x)

# Compute training data
y_func = f(X_train)
y_deriv1 = df(X_train)
y_deriv2_interior = d2f(X_train[interior_indices])  # Only at interior points!

# Build y_train list
# Note: y_deriv2 only has len(interior_indices) entries, not n_train
y_train = [y_func, y_deriv1, y_deriv2_interior]

print(f"\nTraining data shapes:")
print(f"  Function values: {y_func.shape} (all {n_train} points)")
print(f"  1st derivatives: {y_deriv1.shape} (all {n_train} points)")
print(f"  2nd derivatives: {y_deriv2_interior.shape} (only {len(interior_indices)} interior points)")
Total training points: 7
Boundary indices: [0, 6]
Interior indices: [1, 2, 3, 4, 5]

Training data shapes:
  Function values: (7, 1) (all 7 points)
  1st derivatives: (7, 1) (all 7 points)
  2nd derivatives: (5, 1) (only 5 interior points)

Explanation: The key insight here is that y_deriv2_interior has fewer entries than y_func and y_deriv1. The second derivative is only computed at interior points where curvature information is most valuable.

Step 3: Define derivative indices and locations#

# Derivative specification: 1st and 2nd order derivatives
der_indices = [[[[1, 1]], [[1, 2]]]]

# Derivative locations: specify which points have each derivative
derivative_locations = [
    all_indices,       # 1st derivative (df/dx) at ALL points
    interior_indices   # 2nd derivative (d²f/dx²) at INTERIOR points only
]

print("der_indices:", der_indices)
print("derivative_locations:", derivative_locations)
der_indices: [[[[1, 1]], [[1, 2]]]]
derivative_locations: [[0, 1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5]]

Explanation: The derivative_locations list has one entry per derivative in der_indices:

  • First entry: indices where first derivative is available (all 7 points)

  • Second entry: indices where second derivative is available (5 interior points)

This structure must match the y_train list exactly.

Step 4: Initialize DEGP model with derivative_locations#

model = degp(
    X_train,
    y_train,
    n_order=2,
    n_bases=1,
    der_indices=der_indices,
    derivative_locations=derivative_locations,  # Key parameter!
    normalize=True,
    kernel="SE",
    kernel_type="anisotropic"
)

print("DEGP model with heterogeneous derivative coverage initialized.")
DEGP model with heterogeneous derivative coverage initialized.

Explanation: The derivative_locations parameter tells DEGP exactly which training points have each type of derivative information.

Step 5: Optimize hyperparameters#

params = model.optimize_hyperparameters(
    optimizer='pso',
    pop_size=100,
    n_generations=15,
    local_opt_every=15,
    debug=False
)
print("Optimized hyperparameters:", params)
Stopping: maximum iterations reached --> 15
Optimized hyperparameters: [ 8.95780680e-03  4.19292282e-01 -1.51087304e+01]

Step 6: Validate predictions at training points#

# Predict with return_deriv=True to get derivative predictions
y_train_pred = model.predict(X_train, params, calc_cov=False, return_deriv=True)

# Output shape is [n_derivatives + 1, n_points]
# Row 0: function (n_train points)
# Row 1: 1st derivative (n_train points)
# Row 2: 2nd derivative (len(interior_indices) points)
y_func_pred = y_train_pred[0, :]
y_deriv1_pred = y_train_pred[1, :]
y_deriv2_pred = y_train_pred[2, interior_indices]

# Compute errors
abs_error_func = np.abs(y_func_pred.flatten() - y_func.flatten())
abs_error_d1 = np.abs(y_deriv1_pred.flatten() - y_deriv1.flatten())
abs_error_d2 = np.abs(y_deriv2_pred.flatten() - y_deriv2_interior.flatten())

print("Absolute error (function) at all training points:")
print(f"  {abs_error_func}")
print("\nAbsolute error (1st derivative) at all training points:")
print(f"  {abs_error_d1}")
print("\nAbsolute error (2nd derivative) at interior points only:")
print(f"  {abs_error_d2}")
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[1, 2]]]
Warning: Cholesky decomposition failed via scipy, using standard np solve instead.
Absolute error (function) at all training points:
  [1.95660397e-08 1.61217033e-08 4.54024768e-09 2.04040038e-08
 7.53119611e-09 4.87798624e-09 1.11146570e-08]

Absolute error (1st derivative) at all training points:
  [1.08431806e-08 1.56174633e-08 2.02459838e-09 2.55145156e-08
 2.18248579e-08 1.76762427e-09 5.84725892e-09]

Absolute error (2nd derivative) at interior points only:
  [1.85020621e-08 7.92071706e-08 4.16755093e-08 1.72419519e-08
 1.22257351e-08]

Explanation: The prediction output structure matches the training data structure:

  • Row 0: Function values at all points

  • Row 1: 1st derivatives at all points

  • Row 2: 2nd derivatives only at interior points (matching derivative_locations)

Step 7: Visualize predictions and derivative coverage#

# Generate test points
X_test = np.linspace(0, 2, 200).reshape(-1, 1)
y_test_pred = model.predict(X_test, params, calc_cov=False, return_deriv=True)

n_test = len(X_test)

# Row 0: function, Row 1: 1st derivative, Row 2: 2nd derivative
y_func_test = y_test_pred[0, :]
y_d1_test = y_test_pred[1, :]
y_d2_test = y_test_pred[2, :]

# True values for comparison
y_true = f(X_test)
dy_true = df(X_test)
d2y_true = d2f(X_test)

# Create figure
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# --- Plot 1: Function ---
ax1 = axes[0, 0]
ax1.plot(X_test, y_true, 'b-', linewidth=2, label='True f(x)')
ax1.plot(X_test, y_func_test, 'r--', linewidth=2, label='GP prediction')
ax1.scatter(X_train[boundary_indices].flatten(), y_func[boundary_indices].flatten(),
            c='orange', s=150, marker='s', zorder=5, edgecolors='black',
            label='Boundary (f, df only)')
ax1.scatter(X_train[interior_indices].flatten(), y_func[interior_indices].flatten(),
            c='green', s=150, marker='o', zorder=5, edgecolors='black',
            label='Interior (f, df, d²f)')
ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('f(x)', fontsize=12)
ax1.set_title('Function Prediction', fontsize=14)
ax1.legend(loc='best')
ax1.grid(True, alpha=0.3)

# --- Plot 2: First Derivative ---
ax2 = axes[0, 1]
ax2.plot(X_test, dy_true, 'b-', linewidth=2, label="True f'(x)")
ax2.plot(X_test, y_d1_test, 'r--', linewidth=2, label='GP prediction')
ax2.scatter(X_train[boundary_indices].flatten(), y_deriv1[boundary_indices].flatten(),
            c='orange', s=150, marker='s', zorder=5, edgecolors='black')
ax2.scatter(X_train[interior_indices].flatten(), y_deriv1[interior_indices].flatten(),
            c='green', s=150, marker='o', zorder=5, edgecolors='black')
ax2.set_xlabel('x', fontsize=12)
ax2.set_ylabel("f'(x)", fontsize=12)
ax2.set_title('First Derivative Prediction', fontsize=14)
ax2.legend(loc='best')
ax2.grid(True, alpha=0.3)

# --- Plot 3: Second Derivative ---
ax3 = axes[1, 0]
ax3.plot(X_test, d2y_true, 'b-', linewidth=2, label="True f''(x)")
ax3.plot(X_test, y_d2_test, 'r--', linewidth=2, label='GP prediction')
# Only interior points have 2nd derivative training data
ax3.scatter(X_train[interior_indices].flatten(), d2f(X_train[interior_indices]).flatten(),
            c='green', s=150, marker='o', zorder=5, edgecolors='black',
            label='Interior (has d²f)')
# Mark boundary points without 2nd derivative
ax3.axvline(x=X_train[0, 0], color='orange', linestyle=':', alpha=0.7)
ax3.axvline(x=X_train[-1, 0], color='orange', linestyle=':', alpha=0.7,
            label='Boundary (no d²f)')
ax3.set_xlabel('x', fontsize=12)
ax3.set_ylabel("f''(x)", fontsize=12)
ax3.set_title('Second Derivative Prediction', fontsize=14)
ax3.legend(loc='best')
ax3.grid(True, alpha=0.3)

# --- Plot 4: Derivative Coverage Visualization ---
ax4 = axes[1, 1]
ax4.axis('off')

coverage_text = """
Derivative Coverage Summary
===========================

Training Points: 7 total

Point Type    | f(x) | df/dx | d²f/dx²
--------------|------|-------|--------
Boundary (2)  |  ✓   |   ✓   |   ✗
Interior (5)  |  ✓   |   ✓   |   ✓

derivative_locations structure:

  der_indices = [[[[1, 1]], [[1, 2]]]]

  derivative_locations = [
      [0, 1, 2, 3, 4, 5, 6],  # df/dx at all
      [1, 2, 3, 4, 5]         # d²f/dx² interior only
  ]

Benefits:
• Reduced computational cost (fewer 2nd derivatives)
• Curvature info concentrated where needed
• Boundary regions rely on function + slope only
"""

ax4.text(0.05, 0.95, coverage_text, transform=ax4.transAxes,
         fontsize=11, verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()
Note: derivs_to_predict is None. Predictions will include all derivatives used in training: [[[1, 1]], [[1, 2]]]
Warning: Cholesky decomposition failed via scipy, using standard np solve instead.
../_images/degp_getting_started_37_1.png

Explanation: The visualization shows:

  • Orange squares: Boundary points with only function values and first derivatives

  • Green circles: Interior points with full derivative coverage (1st and 2nd order)

  • The GP successfully interpolates all training data and provides smooth predictions

Step 8: Compare errors at boundary vs interior#

# Evaluate prediction accuracy in different regions
boundary_test_mask = (X_test.flatten() < 0.3) | (X_test.flatten() > 1.7)
interior_test_mask = ~boundary_test_mask

func_error_boundary = np.mean(np.abs(y_func_test[boundary_test_mask].flatten() -
                                      y_true[boundary_test_mask].flatten()))
func_error_interior = np.mean(np.abs(y_func_test[interior_test_mask].flatten() -
                                      y_true[interior_test_mask].flatten()))

d2_error_boundary = np.mean(np.abs(y_d2_test[boundary_test_mask].flatten() -
                                    d2y_true[boundary_test_mask].flatten()))
d2_error_interior = np.mean(np.abs(y_d2_test[interior_test_mask].flatten() -
                                    d2y_true[interior_test_mask].flatten()))

print("Mean Absolute Error Comparison:")
print(f"\nFunction prediction:")
print(f"  Boundary region: {func_error_boundary:.6f}")
print(f"  Interior region: {func_error_interior:.6f}")
print(f"\nSecond derivative prediction:")
print(f"  Boundary region: {d2_error_boundary:.6f}")
print(f"  Interior region: {d2_error_interior:.6f}")
Mean Absolute Error Comparison:

Function prediction:
  Boundary region: 0.000000
  Interior region: 0.000000

Second derivative prediction:
  Boundary region: 0.000013
  Interior region: 0.000000

Explanation: This comparison shows that even without second-order derivative information at the boundaries, the GP can still make reasonable predictions by leveraging the curvature information from nearby interior points.

Summary#

This example demonstrates heterogeneous derivative coverage in DEGP using derivative_locations. Key takeaways:

  • Flexible derivative specification: Different training points can have different derivative orders

  • Matching structure: y_train, der_indices, and derivative_locations must be consistent

  • Cost reduction: Compute expensive higher-order derivatives only where they provide the most value

  • Practical applications: - Sensor data with varying derivative availability - Concentrated derivative information in regions of interest - Balancing accuracy vs computational cost

When to use heterogeneous derivative coverage:

  • Higher-order derivatives are expensive to compute

  • Derivative data comes from multiple sources with different capabilities

  • You want to focus derivative information in high-curvature or high-importance regions

  • Boundary conditions provide limited derivative information

Example 6: Predicting Untrained Partial Derivatives#

Overview#

This example demonstrates that DEGP can predict partial derivatives that were not observed during training. The cross-covariance \(K_*\) is constructed directly from kernel derivatives, so any partial derivative within the n_bases-dimensional OTI space can be predicted regardless of whether training data was provided for it.

We learn \(f(x_1, x_2) = \sin(x_1) + x_2^2\) using function values and the first partial derivative with respect to \(x_1\) only. At prediction time we also request \(\partial f / \partial x_2\), which was never observed.

Step 1: Import required packages#

import numpy as np
from jetgp.full_degp.degp import degp
import matplotlib.pyplot as plt

print("Modules imported successfully.")
Modules imported successfully.

Step 2: Define the function and training data#

np.random.seed(42)

# 2D input — 6×6 training grid
x_vals = np.linspace(0, 1, 7)
X_train = np.array([[x1, x2] for x1 in x_vals for x2 in x_vals])

def f(X):      return np.sin(X[:, 0]) + X[:, 1] ** 2
def df_dx1(X): return np.cos(X[:, 0])
def df_dx2(X): return 2.0 * X[:, 1]   # NOT included in training

y_func  = f(X_train).reshape(-1, 1)
y_dx1   = df_dx1(X_train).reshape(-1, 1)
y_train = [y_func, y_dx1]   # df/dx2 deliberately omitted

print(f"Training points : {X_train.shape[0]}")
print(f"Training outputs: f, df/dx1  (df/dx2 NOT provided)")
Training points : 49
Training outputs: f, df/dx1  (df/dx2 NOT provided)

Explanation: A 6×6 grid of 2D points is used. The training list contains function values and only the first partial derivative; \(\partial f / \partial x_2\) is intentionally left out.

Step 3: Define derivative indices and locations#

# Only dx1 is in the training set
der_indices = [[[[1, 1]]]]

n_train = len(X_train)
derivative_locations = [
    list(range(n_train)),   # df/dx1 at all points
]

print("der_indices          :", der_indices)
print("derivative_locations : 1 entry covering all", n_train, "points")
der_indices          : [[[[1, 1]]]]
derivative_locations : 1 entry covering all 49 points

Explanation: der_indices lists only \(\partial f/\partial x_1\). The second coordinate axis is absent from training — but it exists in the OTI space because n_bases=2 spans both coordinate axes.

Step 4: Initialize the DEGP model#

# n_bases=2 because the input space is 2-dimensional.
# This means the OTI module includes basis units e1, e2
# for both coordinate directions, even though dx2 is not trained.
model = degp(
    X_train, y_train,
    n_order=1, n_bases=2,
    der_indices=der_indices,
    derivative_locations=derivative_locations,
    normalize=True,
    kernel="SE", kernel_type="anisotropic"
)

print("DEGP model (2D, training f+dx1 only) initialized.")
DEGP model (2D, training f+dx1 only) initialized.

Step 5: Optimize hyperparameters#

params = model.optimize_hyperparameters(
    optimizer='powell',
    n_restart_optimizer=5,
    debug=False
)
print("Optimized hyperparameters:", params)
Optimized hyperparameters: [-1.10174431 -0.61477023  1.55130939 -5.79999067]

Step 6: Predict the untrained derivative#

np.random.seed(7)
X_test = np.random.uniform(0, 1, (50, 2))

# Request df/dx2 via derivs_to_predict — it was NOT in the training set
pred = model.predict(
    X_test, params,
    calc_cov=False,
    return_deriv=True,
    derivs_to_predict=[[[2, 1]]]   # dx2 only
)

# pred shape: (2, n_test) — row 0 = f, row 1 = df/dx2
f_pred   = pred[0, :]
dx2_pred = pred[1, :]

dx2_true = df_dx2(X_test).flatten()
f_true   = f(X_test).flatten()

rmse_f   = float(np.sqrt(np.mean((f_pred - f_true) ** 2)))
rmse_dx2 = float(np.sqrt(np.mean((dx2_pred - dx2_true) ** 2)))
corr_dx2 = float(np.corrcoef(dx2_pred, dx2_true)[0, 1])

print(f"Function RMSE            : {rmse_f:.4e}")
print(f"df/dx2 RMSE (untrained)  : {rmse_dx2:.4e}")
print(f"df/dx2 correlation       : {corr_dx2:.4f}")
Function RMSE            : 1.1830e-06
df/dx2 RMSE (untrained)  : 9.2540e-06
df/dx2 correlation       : 1.0000

Explanation: derivs_to_predict=[[[2, 1]]] requests the first partial derivative along the second coordinate axis. Because n_bases=2, basis element e2 already exists in the OTI space — the model simply reads the corresponding Taylor coefficient from \(\phi_\text{exp}\) without needing any observed training data for that direction.

Step 7: Visualise the untrained derivative prediction#

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Scatter: predicted vs true df/dx2
axes[0].scatter(dx2_true, dx2_pred, alpha=0.7, edgecolors='k', linewidths=0.5)
lims = [min(dx2_true.min(), dx2_pred.min()) - 0.1,
        max(dx2_true.max(), dx2_pred.max()) + 0.1]
axes[0].plot(lims, lims, 'r--', linewidth=2, label='Perfect prediction')
axes[0].set_xlabel(r'True $\partial f / \partial x_2$')
axes[0].set_ylabel(r'Predicted $\partial f / \partial x_2$')
axes[0].set_title(f'Untrained df/dx2  (r = {corr_dx2:.3f})')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Sort by x2 for a clean line plot
sort_idx = np.argsort(X_test[:, 1])
axes[1].plot(X_test[sort_idx, 1], dx2_true[sort_idx],
             'b-', linewidth=2, label=r'True $2x_2$')
axes[1].plot(X_test[sort_idx, 1], dx2_pred[sort_idx],
             'r--', linewidth=2, label='GP prediction')
axes[1].set_xlabel(r'$x_2$')
axes[1].set_ylabel(r'$\partial f / \partial x_2$')
axes[1].set_title('Predicted vs True along x2 axis')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
../_images/degp_getting_started_45_0.png

Explanation: The scatter plot (left) and line plot (right) both show that the GP successfully recovers \(\partial f/\partial x_2 = 2x_2\) even though no training data for this derivative was provided. The information propagates through the kernel structure: the GP has learned the length scale along \(x_2\) from the function values, and the kernel’s analytic derivative with respect to \(e_2\) supplies the required cross-covariance.

Summary#

This example demonstrates predicting untrained partial derivatives in DEGP.

Key takeaways:

  • No extra setup required: simply pass derivs_to_predict at prediction time with any index within n_bases

  • Works because ``n_bases`` spans all coordinate axes: setting n_bases=2 for a 2D problem ensures \(e_1, e_2\) both exist in the OTI space, regardless of which derivatives appear in training

  • Cross-covariance from kernel derivatives: \(K_*\) is built analytically from the kernel, not from observed data, so untrained directions are always accessible

  • Accuracy depends on function structure: the prediction quality for the untrained derivative reflects how well the GP has learned the underlying function

When this feature is useful:

  • Derivative data is expensive and only a subset of directions can be observed during training

  • Post-hoc sensitivity analysis in directions not originally anticipated

  • Exploring gradient information along new axes without retraining

Example 7: 2D Function-Only Training with Derivative Predictions#

Overview#

This example demonstrates that DEGP can be trained on function values only in a 2D input space and still predict both partial derivatives with uncertainty at test points. No derivative observations are required during training; the cross-covariance \(K_*\) between \(f\) and its partial derivatives is derived analytically from the kernel.

True function: \(f(x_1,x_2) = \sin(x_1)\cos(x_2)\)

True partials: \(\partial f/\partial x_1 = \cos(x_1)\cos(x_2)\), \(\quad\partial f/\partial x_2 = -\sin(x_1)\sin(x_2)\)

Step 1: Import required packages#

import warnings
import numpy as np
import matplotlib.pyplot as plt
from jetgp.full_degp.degp import degp

print("Modules imported successfully.")
Modules imported successfully.

Step 2: Define the true function and build training data#

def f(X):
    return np.sin(X[:, 0]) * np.cos(X[:, 1])

def df_dx1(X):
    return np.cos(X[:, 0]) * np.cos(X[:, 1])

def df_dx2(X):
    return -np.sin(X[:, 0]) * np.sin(X[:, 1])

# 5×5 training grid — function values ONLY
x1_tr = np.linspace(0, 2 * np.pi, 6)
x2_tr = np.linspace(0, 2 * np.pi, 6)
G1, G2 = np.meshgrid(x1_tr, x2_tr)
X_train = np.column_stack([G1.ravel(), G2.ravel()])  # shape (36, 2)
y_func  = f(X_train).reshape(-1, 1)
y_train = [y_func]  # no derivative arrays

print(f"X_train shape : {X_train.shape}")
print(f"y_train[0] shape : {y_train[0].shape}")
X_train shape : (36, 2)
y_train[0] shape : (36, 1)

Explanation: A sparse 5×5 grid provides 25 training locations. y_train contains only function values; no derivative arrays are included. Setting der_indices=[] at construction tells DEGP that the training covariance is built from function values alone.

Step 3: Initialise DEGP model for function-only training#

with warnings.catch_warnings():
    warnings.simplefilter("ignore")   # suppress "0 derivatives" notice
    model = degp(
        X_train, y_train,
        n_order=1, n_bases=2,
        der_indices=[],
        normalize=True,
        kernel="SE", kernel_type="anisotropic",
    )

print("DEGP model (function-only, 2D) initialised.")
DEGP model (function-only, 2D) initialised.

Explanation:

  • n_order=1: enables first-order OTI arithmetic, required to form the kernel cross-derivatives

  • n_bases=2: one OTI pair per input dimension (x1, x2)

  • der_indices=[]: no derivative observations — training kernel is built from function values only

Step 4: Optimise hyperparameters#

params = model.optimize_hyperparameters(
    optimizer='pso',
    pop_size=100,
    n_generations=15,
    local_opt_every=15,
    debug=False,
)
print("Optimised hyperparameters:", params)
Stopping: maximum iterations reached --> 15
Optimised hyperparameters: [-1.39083569e-02  4.47278616e-02  4.47652758e-01 -1.51974132e+01]

Step 5: Predict f, df/dx1, and df/dx2 with uncertainty#

n_test = 20
x1_te = np.linspace(0, 2 * np.pi, n_test)
x2_te = np.linspace(0, 2 * np.pi, n_test)
G1t, G2t = np.meshgrid(x1_te, x2_te)
X_test = np.column_stack([G1t.ravel(), G2t.ravel()])

# derivs_to_predict: [[1,1]] → df/dx1  (1st-order w.r.t. basis 1)
#                    [[2,1]] → df/dx2  (1st-order w.r.t. basis 2)
mean, var = model.predict(
    X_test, params,
    calc_cov=True,
    return_deriv=True,
    derivs_to_predict=[[[1, 1]], [[2, 1]]],
)

# mean shape: (3, n_test²) — rows: [f, df/dx1, df/dx2]
shape2d = (n_test, n_test)
f_mean_grid   = mean[0, :].reshape(shape2d)
dx1_mean_grid = mean[1, :].reshape(shape2d)
dx2_mean_grid = mean[2, :].reshape(shape2d)

f_true_grid   = f(X_test).reshape(shape2d)
dx1_true_grid = df_dx1(X_test).reshape(shape2d)
dx2_true_grid = df_dx2(X_test).reshape(shape2d)

print(f"Prediction output shape: {mean.shape}")
Prediction output shape: (3, 400)

Explanation: derivs_to_predict=[[[1,1]], [[2,1]]] requests the first-order partial derivative with respect to each input dimension. Both derivatives were never observed during training; they are recovered through the kernel’s analytic cross-covariance.

Step 6: Accuracy metrics#

f_rmse   = float(np.sqrt(np.mean((mean[0, :] - f(X_test))      ** 2)))
dx1_rmse = float(np.sqrt(np.mean((mean[1, :] - df_dx1(X_test)) ** 2)))
dx2_rmse = float(np.sqrt(np.mean((mean[2, :] - df_dx2(X_test)) ** 2)))
dx1_corr = float(np.corrcoef(mean[1, :], df_dx1(X_test))[0, 1])
dx2_corr = float(np.corrcoef(mean[2, :], df_dx2(X_test))[0, 1])

print(f"f        RMSE : {f_rmse:.4e}")
print(f"df/dx1   RMSE : {dx1_rmse:.4e}   Pearson r: {dx1_corr:.3f}")
print(f"df/dx2   RMSE : {dx2_rmse:.4e}   Pearson r: {dx2_corr:.3f}")
f        RMSE : 8.8283e-03
df/dx1   RMSE : 9.9000e-03   Pearson r: 1.000
df/dx2   RMSE : 3.1993e-02   Pearson r: 0.998

Step 7: Visualise predictions#

extent = [0, 2 * np.pi, 0, 2 * np.pi]
titles_row = [r"$f(x_1,x_2)$",
              r"$\partial f/\partial x_1$",
              r"$\partial f/\partial x_2$"]
trues = [f_true_grid,   dx1_true_grid,  dx2_true_grid]
means = [f_mean_grid,   dx1_mean_grid,  dx2_mean_grid]
stds  = [np.sqrt(np.abs(var[r, :])).reshape(shape2d) for r in range(3)]

fig, axes = plt.subplots(3, 3, figsize=(14, 12))
kw = dict(origin="lower", extent=extent, aspect="auto")

for row, (label, true, gp_mean, gp_std) in enumerate(
        zip(titles_row, trues, means, stds)):
    vmin, vmax = true.min(), true.max()

    im0 = axes[row, 0].imshow(true,    **kw, vmin=vmin, vmax=vmax, cmap="RdBu_r")
    axes[row, 0].set_title(f"True {label}")
    plt.colorbar(im0, ax=axes[row, 0])

    im1 = axes[row, 1].imshow(gp_mean, **kw, vmin=vmin, vmax=vmax, cmap="RdBu_r")
    axes[row, 1].set_title(f"GP mean {label}")
    plt.colorbar(im1, ax=axes[row, 1])

    im2 = axes[row, 2].imshow(gp_std,  **kw, cmap="viridis")
    axes[row, 2].set_title(f"GP std {label}")
    plt.colorbar(im2, ax=axes[row, 2])

    for col in range(3):
        axes[row, col].set_xlabel("$x_1$")
        axes[row, col].set_ylabel("$x_2$")

for col in range(2):
    axes[0, col].scatter(X_train[:, 0], X_train[:, 1],
                         c="k", s=20, zorder=5, label="Training pts")
axes[0, 0].legend(fontsize=8, loc="upper right")

plt.suptitle(
    "DEGP 2D — function-only training, derivative predictions\n"
    r"$f(x_1,x_2) = \sin(x_1)\cos(x_2)$",
    fontsize=13,
)
plt.tight_layout()
plt.show()
../_images/degp_getting_started_52_0.png

Explanation: The 3×3 grid shows true values, GP posterior means, and posterior standard deviations for \(f\), \(\partial f/\partial x_1\), and \(\partial f/\partial x_2\). Despite training exclusively on function values, DEGP recovers the gradient structure through the kernel’s analytic cross-covariance.

Summary#

This example demonstrates 2D function-only DEGP with untrained derivative predictions.

Key takeaways:

  • Zero derivative observations required: der_indices=[] excludes all derivative data from training; partial derivatives remain predictable via derivs_to_predict at test time

  • ``n_bases`` spans all coordinate axes automatically: for a 2D problem, n_bases=2 ensures \(e_1, e_2\) both exist in the OTI space

  • Uncertainty is meaningful: the GP standard deviation captures where derivative predictions carry most uncertainty (sparse training regions)

  • Same API as untrained single-direction prediction (Example 6): the only difference is that here all directions are untrained and we predict over a 2D grid