Making Predictions#
Overview#
After optimizing hyperparameters, JetGP models can make predictions at new test locations.
The predict() method returns function value predictions and optionally their uncertainties
(predictive variances/covariances) and derivative predictions.
All GP modules share a common prediction interface, with minor variations for directional derivative models and weighted variants.
Basic Usage#
After training and optimizing a GP model, predictions are made using the predict() method:
Standard workflow:
# 1. Initialize the model
gp = degp(
X_train, y_train, n_order, n_bases=1,
der_indices=der_indices,
derivative_locations=derivative_locations,
normalize=True, kernel="SE", kernel_type="anisotropic"
)
# 2. Optimize hyperparameters
params = gp.optimize_hyperparameters(
optimizer='jade',
pop_size=100,
n_generations=15
)
# 3. Make predictions
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=False
)
Common Arguments#
X_test
Array of test input locations where predictions are desired.
Shape: (n_test, n_dims) where n_test is the number of test points and
n_dims is the input dimensionality.
params
Optimized hyperparameters returned by optimize_hyperparameters().
These parameters define the trained GP model’s covariance structure.
calc_cov Boolean flag determining whether to compute predictive uncertainty.
True→ Returns predictive variance at test pointsFalse→ Returns only mean predictions (faster computation)
Default: False
return_deriv Boolean flag controlling whether derivative predictions are returned alongside function predictions.
True→ Returns predictions for both function values and their derivatives in a row-wise formatFalse→ Returns only function value predictions
Default: False
Note: Not all model variants support derivative predictions. When return_deriv=True, the
output is a 2D array where each row corresponds to a different output type (function values,
then derivatives in order). See model-specific sections below for details.
derivs_to_predict
List specifying which derivatives to predict when return_deriv=True.
Must be a subset of the derivatives used during training (as defined in der_indices).
None→ Predicts all derivatives from training (default behavior)List of derivative specifications → Predicts only the specified derivatives
Default: None
Format: Same nested list format as der_indices, but flattened (no grouping by order required):
# Predict only first-order partial derivatives
derivs_to_predict = [[[1, 1]], [[2, 1]]]
# Predict first-order and one second-order derivative
derivs_to_predict = [[[1, 1]], [[2, 1]], [[1, 2]]]
# Predict only mixed second-order derivative
derivs_to_predict = [[[1, 1], [2, 1]]]
Note: The order of derivatives in derivs_to_predict determines the row order in the output.
Derivatives are returned in the same order they appear in this list.
Return Values#
The structure of returned values depends on the model type and input arguments.
Standard Models (DEGP)
For standard derivative-enhanced models, the return signature is:
y_pred, y_var = gp.predict(X_test, params, calc_cov=True, return_deriv=False)
Returns:
y_pred: array, shape(n_test,)Predicted function values at test locationsy_var: array, shape(n_test,)(ifcalc_cov=True) Predictive variance at each test location
If calc_cov=False:
y_pred = gp.predict(X_test, params, calc_cov=False, return_deriv=False)
Returns:
y_pred: array, shape(n_test,)Predicted function values at test locations (no uncertainty)
If return_deriv=True:
y_pred, y_var = gp.predict(X_test, params, calc_cov=True, return_deriv=True)
Returns:
y_pred: array, shape(num_derivs + 1, n_test)Row-wise predictions including function values and all derivatives.Structure:
Row 0: Function value predictions
Row 1: First derivative component predictions
Row 2: Second derivative component predictions
… and so on for all derivative components
where
num_derivsis the total number of derivative components specified inder_indices.y_var: array, shape(num_derivs + 1, n_test)(ifcalc_cov=True) Predictive variance corresponding to each row iny_pred, following the same structure.
Example with return_deriv=True:
# Model with first-order derivatives in 2D
# der_indices = [[[[1, 1]], [[2, 1]]]] # Two first-order derivatives
# num_derivs = 2
X_test = np.random.rand(10, 2) # 10 test points
y_pred, y_var = gp.predict(X_test, params, calc_cov=True, return_deriv=True)
# y_pred.shape = (3, 10) # (2 + 1) rows, 10 columns
# y_var.shape = (3, 10)
# Extract predictions by row:
func_pred = y_pred[0, :] # Function predictions (row 0)
deriv1_pred = y_pred[1, :] # ∂f/∂x₁ predictions (row 1)
deriv2_pred = y_pred[2, :] # ∂f/∂x₂ predictions (row 2)
# Similarly for variances:
func_var = y_var[0, :]
deriv1_var = y_var[1, :]
deriv2_var = y_var[2, :]
With higher-order derivatives:
# Model with first and second-order derivatives in 2D
# der_indices = [
# [[[1, 1]], [[2, 1]]], # 2 first-order
# [[[1, 2]], [[1,1],[2,1]], [[2, 2]]] # 3 second-order
# ]
# num_derivs = 5
X_test = np.random.rand(10, 2)
y_pred, y_var = gp.predict(X_test, params, calc_cov=True, return_deriv=True)
# y_pred.shape = (6, 10) # (5 + 1) rows, 10 columns
func_pred = y_pred[0, :] # f(x)
d1_dx1 = y_pred[1, :] # ∂f/∂x₁
d1_dx2 = y_pred[2, :] # ∂f/∂x₂
d2_dx1 = y_pred[3, :] # ∂²f/∂x₁²
d2_dx1dx2 = y_pred[4, :] # ∂²f/∂x₁∂x₂
d2_dx2 = y_pred[5, :] # ∂²f/∂x₂²
With derivs_to_predict (selective derivative predictions):
# Model trained with first and second-order derivatives in 2D
# der_indices = [
# [[[1, 1]], [[2, 1]]], # 2 first-order
# [[[1, 2]], [[1, 1], [2, 1]], [[2, 2]]] # 3 second-order
# ]
# Predict only first-order derivatives
derivs_to_predict = [[[1, 1]], [[2, 1]]]
y_pred, y_var = gp.predict(
X_test, params,
calc_cov=True,
return_deriv=True,
derivs_to_predict=derivs_to_predict
)
Returns:
y_pred: array, shape(len(derivs_to_predict) + 1, n_test)Row 0 contains function predictions, subsequent rows contain only the requested derivatives in the order specified.y_var: array, shape(len(derivs_to_predict) + 1, n_test)(ifcalc_cov=True)
Example with selective derivatives:
X_test = np.random.rand(10, 2)
# Full model has 5 derivatives, but we only want 2
derivs_to_predict = [[[1, 1]], [[2, 2]]] # ∂f/∂x₁ and ∂²f/∂x₂²
y_pred, y_var = gp.predict(
X_test, params,
calc_cov=True,
return_deriv=True,
derivs_to_predict=derivs_to_predict
)
# y_pred.shape = (3, 10) # (2 + 1) rows, not (5 + 1)
func_pred = y_pred[0, :] # Function values
d1_dx1 = y_pred[1, :] # ∂f/∂x₁ (first in derivs_to_predict)
d2_dx2 = y_pred[2, :] # ∂²f/∂x₂² (second in derivs_to_predict)
Reordering derivatives:
The output order matches derivs_to_predict, not the original der_indices order:
# Original training order: ∂f/∂x₁, ∂f/∂x₂, ∂²f/∂x₁², ...
# Request in different order
derivs_to_predict = [[[2, 1]], [[1, 1]]] # ∂f/∂x₂ first, then ∂f/∂x₁
y_pred, _ = gp.predict(X_test, params, return_deriv=True, derivs_to_predict=derivs_to_predict)
# y_pred[1, :] is now ∂f/∂x₂ (not ∂f/∂x₁)
# y_pred[2, :] is now ∂f/∂x₁ (not ∂f/∂x₂)
Model-Specific Prediction Interfaces#
1. DEGP (Derivative-Enhanced GP)
Standard interface with full support for function and derivative predictions.
Example:
# Function predictions with uncertainty
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=False
)
# Function and derivative predictions with uncertainty
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=True
)
# Selective derivative predictions
derivs_to_predict = [[[1, 1]]] # Only ∂f/∂x₁
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=True,
derivs_to_predict=derivs_to_predict
)
Returned shapes:
y_pred:(n_test,)ifreturn_deriv=Falsey_pred:(num_derivs + 1, n_test)ifreturn_deriv=Trueandderivs_to_predict=Noney_pred:(len(derivs_to_predict) + 1, n_test)ifreturn_deriv=Truewithderivs_to_predicty_var: matchesy_predshape whencalc_cov=True
2. DDEGP (Directional Derivative-Enhanced GP)
Uses directional derivatives along global directions (same direction at all training points).
Standard interface (same as DEGP):
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=False
)
Returns:
y_pred: array, shape(n_test,)Predicted function valuesy_var: array, shape(n_test,)(ifcalc_cov=True) Predictive variance
With derivative predictions:
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=True
)
Returns:
y_pred: array, shape(num_derivs + 1, n_test)Row-wise predictions including function values and directional derivatives. Row 0 contains function predictions, subsequent rows contain directional derivative predictions for each direction/order specified inder_indices.y_var: array, shape(num_derivs + 1, n_test)(ifcalc_cov=True) Predictive variance for each prediction component
When return_deriv=True, predictions include directional derivatives along the
same global directions used during training.
With derivs_to_predict:
# Predict only specific directional derivatives
derivs_to_predict = [[[1, 1]]] # Only first direction, first order
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=True,
derivs_to_predict=derivs_to_predict
)
Note
DDEGP: rays define the full prediction vocabulary.
Unlike DEGP (where the OTI space always spans the coordinate axes),
DDEGP builds its internal OTI space from the rays array passed at
construction — specifically n_rays = rays.shape[1] determines the
OTI dimension. A consequence is that derivs_to_predict can only
reference ray indices that exist in rays.
This means you can predict derivatives along directions not present
in the training data, but those directions must still appear as columns
in rays at construction time. For example, to later predict along
a 4th direction [[4, 1]], rays must have been defined with at
least 4 columns — even if no training data was provided for ray 4:
# rays defines 4 directions; training data only covers 3
rays = np.column_stack([ray1, ray2, ray3, ray4]) # shape (d, 4)
der_indices = [[[[1,1]], [[2,1]], [[3,1]]]] # train with 3
model = ddegp(X_train, y_train, n_order=1,
der_indices=der_indices, rays=rays, ...)
# Valid: ray 4 was declared upfront even though it wasn't trained
model.predict(X_test, params, return_deriv=True,
derivs_to_predict=[[[4, 1]]])
If rays had only 3 columns, requesting [[4, 1]] would fail
because e4 does not exist in the 3-dimensional OTI space.
3. GDDEGP (Generalized Directional Derivative-Enhanced GP)
Allows point-specific directional derivatives (different directions at each training point).
Function predictions only:
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=False
)
Returns:
y_pred: array, shape(n_test,)Predicted function valuesy_var: array, shape(n_test,)(ifcalc_cov=True) Predictive variance
With derivative predictions (requires rays_predict):
When return_deriv=True, you must provide rays_predict specifying the directional
vectors at each test point:
y_pred, y_var = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
Additional argument:
rays_predict: list of arrays Required whenreturn_deriv=True. A list of 2D arrays specifying directional vectors at each test point. Structure:rays_predict[direction_idx]has shape(d, n_test)where columnjis the direction vector for test pointj.
Returns:
y_pred: array, shape(num_derivs + 1, n_test)Row-wise predictions including function values and directional derivatives along the directions specified inrays_predicty_var: array, shape(num_derivs + 1, n_test)(ifcalc_cov=True) Predictive variance for each prediction component
With derivs_to_predict:
Selective derivative predictions work with rays_predict. The number of directions
in rays_predict should match the number of unique direction indices referenced
in derivs_to_predict:
# Training used 2 directions with 1st and 2nd order derivatives
# der_indices = [[[[1,1]], [[2,1]]], [[[1,2]], [[2,2]]]]
# Predict only 1st-order derivatives along both directions
derivs_to_predict = [[[1, 1]], [[2, 1]]]
rays_predict = [rays_dir1, rays_dir2] # Both directions needed
y_pred, _ = gp.predict(
X_test, params,
rays_predict=rays_predict,
return_deriv=True,
derivs_to_predict=derivs_to_predict
)
Flexibility of GDDEGP Predictions:
A key advantage of GDDEGP is that you can predict directional derivatives in any direction at test points, not just the directions used during training. The only restriction is that the derivative order must have been included in training.
# Training used gradient-aligned directions
# But predictions can use ANY direction:
# Predict along x-axis
rays_x = np.array([[1.0] * n_test, [0.0] * n_test])
y_pred_x, _ = gp.predict(X_test, params, rays_predict=[rays_x], return_deriv=True)
# Predict along y-axis
rays_y = np.array([[0.0] * n_test, [1.0] * n_test])
y_pred_y, _ = gp.predict(X_test, params, rays_predict=[rays_y], return_deriv=True)
# Predict along 45° diagonal
rays_diag = np.array([[0.707] * n_test, [0.707] * n_test])
y_pred_diag, _ = gp.predict(X_test, params, rays_predict=[rays_diag], return_deriv=True)
Example:
import numpy as np
# Define test points
X_test = np.array([[0.5, 0.5], [1.0, 1.0], [1.5, 1.5]])
n_test = len(X_test)
# Define directional vectors for predictions at each test point
# rays_predict[direction_idx] has shape (d, n_test)
rays_dir1 = np.array([
[1.0, 0.707, 0.0], # x-components for 3 test points
[0.0, 0.707, 1.0] # y-components for 3 test points
]) # Shape: (2, 3)
rays_predict = [rays_dir1] # List with one direction
# Function predictions only (no rays_predict needed)
y_pred, y_var = gp.predict(
X_test, params, calc_cov=True, return_deriv=False
)
# Function + derivative predictions (rays_predict required)
y_pred, y_var = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
# Extract components
func_pred = y_pred[0, :] # Function values
deriv1_pred = y_pred[1, :] # Directional derivative along rays_dir1
Example with multiple directions:
# Two directional derivatives at each test point
rays_dir1 = np.zeros((2, n_test)) # Gradient direction
rays_dir2 = np.zeros((2, n_test)) # Perpendicular direction
for i in range(n_test):
grad = 2 * X_test[i] # For f = x² + y²
grad_norm = np.linalg.norm(grad)
rays_dir1[:, i] = grad / grad_norm
rays_dir2[:, i] = [-rays_dir1[1, i], rays_dir1[0, i]]
rays_predict = [rays_dir1, rays_dir2]
y_pred, y_var = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
func_pred = y_pred[0, :] # Function values
deriv1_pred = y_pred[1, :] # Derivative along gradient
deriv2_pred = y_pred[2, :] # Derivative perpendicular to gradient
4. WDEGP (Weighted Derivative-Enhanced GP)
Weighted models combine predictions from multiple submodels and provide additional
diagnostics for each submodel. WDEGP supports three submodel types via the
submodel_type parameter: 'degp', 'ddegp', or 'gddegp'.
Standard prediction:
y_pred, y_cov = gp.predict(
X_test, params, calc_cov=True
)
Returns:
y_pred: array, shape(n_test,)Weighted combination of all submodel predictionsy_cov: array, shape(n_test,)(ifcalc_cov=True) Predictive variance from weighted model
With derivative predictions:
y_pred, y_cov = gp.predict(
X_test, params, calc_cov=True, return_deriv=True
)
Returns when return_deriv=True:
y_pred: array, shape(n_shared_derivs + 1, n_test)Row-wise predictions: row 0 is function values, subsequent rows are derivatives. Only includes derivatives that are shared across all submodels (see restrictions below).y_cov: array, shape(n_shared_derivs + 1, n_test)(ifcalc_cov=True) Row-wise predictive variances corresponding toy_pred
With derivs_to_predict:
For weighted models, derivs_to_predict can specify any valid derivative index —
it is no longer restricted to derivatives that are common to all submodels. Each
submodel constructs its own K_* from kernel derivatives directly, so a submodel
that was not trained on a particular derivative can still contribute a cross-covariance
prediction for it (informed by its function values and any trained derivatives):
# WDEGP with two submodels
# Submodel 1 has: ∂f/∂x₁, ∂f/∂x₂
# Submodel 2 has: ∂f/∂x₁ only
# Common: ∂f/∂x₁ — but ∂f/∂x₂ can still be requested
# Predict the common derivative
derivs_to_predict = [[[1, 1]]] # Only ∂f/∂x₁
y_pred, y_cov = gp.predict(
X_test, params, calc_cov=True, return_deriv=True,
derivs_to_predict=derivs_to_predict
)
# Also valid: request ∂f/∂x₂ even though submodel 2 wasn't trained on it
derivs_to_predict = [[[1, 1]], [[2, 1]]]
y_pred, y_cov = gp.predict(
X_test, params, calc_cov=True, return_deriv=True,
derivs_to_predict=derivs_to_predict
)
With submodel outputs:
y_pred, y_cov, submodel_vals, submodel_cov = gp.predict(
X_test, params, calc_cov=True, return_submodels=True
)
Additional arguments:
return_submodels– Boolean flag to return individual submodel predictions (default:False)
Additional returns when return_submodels=True:
submodel_vals: list of arrays Predictions from each individual submodelsubmodel_cov: list of arrays (ifcalc_cov=True) Predictive variances from each individual submodel
WDEGP with GDDEGP Submodels (rays_predict required for derivatives):
When using submodel_type='gddegp' and return_deriv=True, you must provide
rays_predict specifying directional vectors at each test point:
# WDEGP with GDDEGP submodels - derivative predictions
y_pred, y_cov = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
The rays_predict structure is the same as for standalone GDDEGP:
rays_predict[direction_idx] has shape (d, n_test).
Example with GDDEGP submodels:
# Build rays_predict for all test points
rays_dir1 = np.zeros((2, n_test))
rays_dir2 = np.zeros((2, n_test))
for i in range(n_test):
grad = 2 * X_test[i]
grad_norm = np.linalg.norm(grad)
if grad_norm > 1e-10:
rays_dir1[:, i] = grad / grad_norm
rays_dir2[:, i] = [-rays_dir1[1, i], rays_dir1[0, i]]
else:
rays_dir1[:, i] = [1, 0]
rays_dir2[:, i] = [0, 1]
rays_predict = [rays_dir1, rays_dir2]
# Predict with derivatives
y_pred, y_cov = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
Derivative Prediction Restrictions:
Derivative predictions in WDEGP have different restrictions depending on the submodel type.
For DEGP and DDEGP submodels (direction-specific):
Derivatives can only be predicted if they appear in all submodels, because predictions are tied to specific coordinate directions (DEGP) or global ray directions (DDEGP).
# Example: Two DEGP submodels
# Submodel 1 has derivatives: [[[1,1]]] (∂f/∂x only)
# Submodel 2 has derivatives: [[[1,1]], [[2,1]]] (∂f/∂x and ∂f/∂y)
# Global predictions can ONLY be made for [[1,1]] (∂f/∂x)
# [[2,1]] (∂f/∂y) is not shared, so it's excluded from weighted predictions
For GDDEGP submodels (order-specific):
WDEGP with GDDEGP submodels is a special case. Since you can specify any direction
via rays_predict, the restriction is on derivative order, not direction.
# Example: Two GDDEGP submodels
# Submodel 1 trains on: 1st-order directional derivatives only
# Submodel 2 trains on: 1st-order AND 2nd-order directional derivatives
# Shared order: 1st-order only
# You can predict 1st-order directional derivatives in ANY direction
# You CANNOT predict 2nd-order derivatives (not shared across all submodels)
This means WDEGP with GDDEGP submodels offers maximum flexibility for derivative predictions:
Direction flexibility: Predict directional derivatives along any direction you choose
Order restriction: Only derivative orders present in ALL submodels can be predicted
# Training used different gradient-aligned directions in each submodel
# But at prediction time, you can use completely different directions:
# Predict 1st-order derivatives along coordinate axes
rays_x = np.array([[1.0] * n_test, [0.0] * n_test])
rays_y = np.array([[0.0] * n_test, [1.0] * n_test])
y_pred, _ = gp.predict(
X_test, params,
rays_predict=[rays_x, rays_y], # Any directions work!
return_deriv=True
)
# This works because 1st-order is shared across submodels
# The specific directions don't need to match training
Additional restrictions (all submodel types):
Training derivatives only: Predictions can only be made for derivative orders that were explicitly included in training.
# Example: Single submodel with 1st-order derivatives # Can predict: f(x) and 1st-order derivatives # Cannot predict: 2nd-order derivatives (not used in training)
Point-level availability: As long as at least one training point includes a specific derivative order, that order can be predicted anywhere in the input space via the GP.
# Example: 10 training points # Only points [0, 2, 5] have 1st-order derivative information # GP can still predict 1st-order derivatives at any test point
Alternative for unavailable derivatives: If a desired derivative order was not used in training, you can approximate it using finite differences on the GP function predictions:
# Approximate 2nd-order derivative using finite differences h = 1e-5 X_test_plus = X_test + h X_test_minus = X_test - h f_plus = gp.predict(X_test_plus, params, calc_cov=False) f_center = gp.predict(X_test, params, calc_cov=False) f_minus = gp.predict(X_test_minus, params, calc_cov=False) d2f_approx = (f_plus - 2*f_center + f_minus) / (h**2)
Summary of WDEGP Derivative Restrictions:
Submodel Type |
Restriction Type |
What Can Be Predicted |
|---|---|---|
|
Direction-specific |
Only coordinate derivatives (∂f/∂xᵢ) shared by all submodels |
|
Direction-specific |
Only global ray directions shared by all submodels |
|
Order-specific |
Any direction, but only orders shared by all submodels |
Use case for return_submodels:
Examining individual submodel predictions is useful for:
Visualization: Understanding how different derivative subsets contribute to final predictions
Diagnostics: Identifying if certain submodels dominate or perform poorly
Analysis: Studying sensitivity to different derivative information
Example:
# Get weighted prediction and individual submodel contributions
y_pred, y_var, submodel_preds, submodel_vars = gp.predict(
X_test, params, calc_cov=True, return_submodels=True
)
# Visualize submodel contributions
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
for i, sm_pred in enumerate(submodel_preds):
plt.plot(X_test, sm_pred.flatten(), alpha=0.3, label=f'Submodel {i+1}')
plt.plot(X_test, y_pred.flatten(), 'k-', linewidth=2, label='Weighted prediction')
plt.legend()
plt.show()
# Get derivative predictions (shared derivatives/orders only)
y_with_derivs, cov_with_derivs = gp.predict(
X_test, params, calc_cov=True, return_deriv=True
)
# Extract function and derivative values by row
func_pred = y_with_derivs[0, :] # f(x)
deriv1_pred = y_with_derivs[1, :] # First derivative (if shared)
Predictive Uncertainty#
Understanding Predictive Variance
The predictive variance quantifies uncertainty in the GP predictions:
High variance indicates regions with:
Sparse or no nearby training data
High observation noise
Poor model fit
Low variance indicates regions with:
Dense training data coverage
Low observation noise
High model confidence
Computing variance:
y_pred, y_var = gp.predict(X_test, params, calc_cov=True)
# Standard deviation (uncertainty in same units as prediction)
y_std = np.sqrt(y_var)
# 95% confidence interval (assuming Gaussian predictive distribution)
confidence_interval = 1.96 * y_std
Covariance vs. Variance:
When
calc_cov=Truewith single test points or independent predictions, returns variance (scalar per test point)Some implementations may return full covariance matrix between test points for joint predictions
Check specific model documentation for covariance matrix support
Performance Considerations#
Computational Cost
Prediction cost depends on:
Number of training points (n_train): Dominant factor in covariance computation
Number of test points (n_test): Linear scaling
Dimensionality (n_dims): Affects kernel evaluations
Uncertainty computation (calc_cov): Adds computational overhead
Derivative predictions (return_deriv): Increases complexity
Number of derivatives (derivs_to_predict): Fewer derivatives = faster computation
Approximate scaling:
Without variance: O(n_train × n_test)
With variance: O(n_train² × n_test) due to matrix operations
With derivatives: Scales with number of derivative components
With derivs_to_predict: Scales with len(derivs_to_predict), not total training derivatives
Optimization tips:
Disable uncertainty when not needed:
y_pred = gp.predict(X_test, params, calc_cov=False)
Use derivs_to_predict to compute only needed derivatives:
# Instead of computing all derivatives y_pred, _ = gp.predict(X_test, params, return_deriv=True) # Compute only what you need derivs_to_predict = [[[1, 1]]] # Just one derivative y_pred, _ = gp.predict(X_test, params, return_deriv=True, derivs_to_predict=derivs_to_predict)
Batch predictions instead of sequential calls:
# Efficient: single call y_pred = gp.predict(X_test_large, params) # Inefficient: multiple calls for x in X_test_large: y = gp.predict(x.reshape(1, -1), params)
For large test sets, consider splitting:
# Split large test set into chunks chunk_size = 1000 predictions = [] for i in range(0, len(X_test), chunk_size): X_chunk = X_test[i:i+chunk_size] y_chunk = gp.predict(X_chunk, params, calc_cov=False) predictions.append(y_chunk) y_pred = np.concatenate(predictions)
Common Issues and Troubleshooting#
Issue 1: Predictions far from training data have high uncertainty
This is expected behavior. GP predictions extrapolate poorly beyond the training data range.
Solution:
Ensure test points are within or near the training data domain
Add training data in regions where predictions are needed
Issue 2: Negative variances
Should not occur with properly implemented GPs, but can arise from:
Numerical instability in covariance matrix inversion
Ill-conditioned covariance matrices
Solution:
Enable normalization:
normalize=TrueCheck for duplicate training points
Reduce derivative order if using high-order derivatives
Verify hyperparameter optimization converged properly
Issue 3: Very large or very small predictions
Can occur when normalization is disabled and data scales vary widely.
Solution:
Always use
normalize=Truefor numerical stabilityManually scale data if custom scaling is required
Issue 4: Slow predictions
Solution:
Disable uncertainty computation if not needed (
calc_cov=False)Reduce training set size if feasible (consider sparse GP methods)
Disable derivative predictions if not required (
return_deriv=False)Use
derivs_to_predictto compute only the derivatives you need
Issue 5: GDDEGP derivative predictions require rays_predict
When using GDDEGP with return_deriv=True, you must provide rays_predict.
Solution:
# Define directional vectors at each test point
# rays_predict[direction_idx] has shape (d, n_test)
rays_dir1 = np.zeros((d, n_test))
# ... populate with appropriate directions ...
rays_predict = [rays_dir1] # Add more if multiple directions
y_pred, y_var = gp.predict(
X_test, params,
rays_predict=rays_predict,
calc_cov=True,
return_deriv=True
)
Note: rays_predict is only required when return_deriv=True. For function-only
predictions, simply omit it:
# Function predictions only - no rays_predict needed
y_pred, y_var = gp.predict(X_test, params, calc_cov=True, return_deriv=False)
Issue 6: Understanding the row-wise derivative output format
When return_deriv=True, predictions are returned with shape (num_derivs + 1, n_test).
Solution:
Remember the structure: rows = output types, columns = test points
Row 0 is always function values
Subsequent rows are derivatives in the order specified by
der_indices(orderivs_to_predictif specified)Use row indexing to extract components:
func_pred = y_pred[0, :] # Function values deriv1_pred = y_pred[1, :] # First derivative component deriv2_pred = y_pred[2, :] # Second derivative component # And so on for each derivative component
Issue 7: WDEGP derivative predictions missing some derivatives
For DEGP/DDEGP submodels, WDEGP can only predict derivatives that are shared across all submodels. For GDDEGP submodels, WDEGP can only predict derivative orders shared across all submodels.
Solution:
For DEGP/DDEGP: Check that all submodels include the desired derivative in their
der_indicesFor GDDEGP: Check that all submodels include the desired derivative order
If a derivative/order is only in some submodels, it cannot be predicted globally
Consider restructuring submodels to share common derivatives or orders
Issue 8: WDEGP with GDDEGP - can I predict in different directions than training?
Yes! This is a key advantage of GDDEGP submodels.
Solution:
# Training used gradient-aligned directions, but you can predict ANY direction:
# Predict along x-axis (even if training never used this direction)
rays_x = np.array([[1.0] * n_test, [0.0] * n_test])
y_pred, _ = gp.predict(
X_test, params,
rays_predict=[rays_x],
return_deriv=True
)
# Works as long as the derivative ORDER was used in training
Issue 9: derivs_to_predict contains derivatives not in training
derivs_to_predict must be a subset of the derivatives used during training.
Solution:
Verify each entry in
derivs_to_predictexists in the flattenedder_indicesCheck that derivative specifications match exactly (including order of variable-order pairs)
# Training derivatives
der_indices = [[[[1, 1]], [[2, 1]]], [[[1, 2]]]]
# Flattened: [[1,1]], [[2,1]], [[1,2]]
# Valid derivs_to_predict
derivs_to_predict = [[[1, 1]], [[1, 2]]] # Both exist in training
# Invalid: [[2,2]] was not in training
derivs_to_predict = [[[1, 1]], [[2, 2]]] # Error!
Issue 10: Output shape unexpected when using derivs_to_predict
When derivs_to_predict is specified, output shape is (len(derivs_to_predict) + 1, n_test),
not (num_all_derivs + 1, n_test).
Solution:
Remember that row count equals
len(derivs_to_predict) + 1Row order matches the order in
derivs_to_predict, notder_indices
# Training had 5 derivatives
der_indices = [
[[[1, 1]], [[2, 1]]],
[[[1, 2]], [[1, 1], [2, 1]], [[2, 2]]]
]
# Request only 2 derivatives in specific order
derivs_to_predict = [[[2, 1]], [[1, 2]]] # ∂f/∂x₂, then ∂²f/∂x₁²
y_pred, _ = gp.predict(X_test, params, return_deriv=True, derivs_to_predict=derivs_to_predict)
# y_pred.shape = (3, n_test), NOT (6, n_test)
# y_pred[0, :] = function values
# y_pred[1, :] = ∂f/∂x₂ (first in derivs_to_predict)
# y_pred[2, :] = ∂²f/∂x₁² (second in derivs_to_predict)
Best Practices#
Always enable normalization during model initialization for numerical stability:
gp = degp(..., normalize=True)
Always provide derivative_locations specifying which points have each derivative:
# All derivatives at all points derivative_locations = [] for i in range(len(der_indices)): for j in range(len(der_indices[i])): derivative_locations.append([k for k in range(len(X_train))]) gp = degp(..., derivative_locations=derivative_locations)
Compute uncertainty for critical predictions to quantify confidence:
y_pred, y_var = gp.predict(X_test, params, calc_cov=True)
Visualize predictions with confidence intervals to understand model behavior:
y_std = np.sqrt(y_var) plt.fill_between(X_test.flatten(), y_pred - 2*y_std, y_pred + 2*y_std, alpha=0.3)
For weighted models, examine submodel predictions to diagnose model behavior:
y_pred, y_var, sub_vals, sub_cov = gp.predict(..., return_submodels=True)
Use appropriate prediction settings based on application:
Real-time applications:
calc_cov=False,return_deriv=FalseUncertainty quantification:
calc_cov=TrueSensitivity analysis:
return_deriv=True
Validate predictions against held-out test data when possible
Check prediction uncertainty - high uncertainty may indicate:
Insufficient training data
Extrapolation beyond training domain
Poor hyperparameter optimization
When using return_deriv=True, extract components using row indexing:
# Function values (row 0) func_pred = y_pred[0, :] # Each derivative component (rows 1, 2, ...) deriv_i = y_pred[i + 1, :]
For GDDEGP/WDEGP-GDDEGP with derivative predictions, always normalize directional vectors:
for i in range(n_test): rays_dir1[:, i] = rays_dir1[:, i] / np.linalg.norm(rays_dir1[:, i])
Leverage GDDEGP flexibility - predict directional derivatives in any direction:
# Don't feel constrained to training directions # Use whatever directions are most meaningful for your analysis
Profile prediction performance for large-scale applications and optimize accordingly
Use derivs_to_predict for efficiency when only specific derivatives are needed:
# Instead of computing all 10 derivatives and discarding 8: y_pred, _ = gp.predict(X_test, params, return_deriv=True) needed = y_pred[[0, 3, 7], :] # Extract rows manually # Compute only what you need: derivs_to_predict = [[[1, 2]], [[2, 2]]] # Just these two y_pred, _ = gp.predict( X_test, params, return_deriv=True, derivs_to_predict=derivs_to_predict )
Match derivs_to_predict order to your analysis needs - output rows follow your specified order:
# If you need ∂f/∂x₂ before ∂f/∂x₁ for downstream processing: derivs_to_predict = [[[2, 1]], [[1, 1]]] # Specify desired order
Summary#
JetGP provides a flexible and powerful prediction interface across all model variants:
Key Features:
Unified interface for function and derivative predictions
Optional uncertainty quantification via
calc_covparameterSelective derivative predictions via
derivs_to_predictparameterRow-wise output format when
return_deriv=True: shape(num_derivs + 1, n_test)Automatic denormalization when
normalize=Trueduring trainingEfficient vectorized operations for batch predictions
Output Format Summary:
Configuration |
Output Shape |
|---|---|
|
|
|
|
|
|
Row Structure when return_deriv=True:
Row 0: Function value predictions
Row 1: First derivative component (per
der_indicesorderivs_to_predictorder)Row 2: Second derivative component
… (following specified order)
Model-Specific Notes:
Model |
|
Derivative Restriction |
|---|---|---|
DEGP |
N/A |
Specific coordinates |
DDEGP |
N/A (uses global |
Specific global directions |
GDDEGP |
Only when |
Any direction, trained order |
WDEGP (DEGP) |
N/A |
Shared coordinates only |
WDEGP (DDEGP) |
N/A |
Shared directions only |
WDEGP (GDDEGP) |
Only when |
Any direction, shared order |
Best Practices:
Always use
normalize=Trueduring model initializationAlways provide
derivative_locationsspecifying point-derivative associationsCompute uncertainty (
calc_cov=True) for critical predictionsVisualize predictions with confidence intervals
Extract derivative components using row indexing when
return_deriv=TrueUse
return_submodels=Truefor weighted models to diagnose performanceNormalize directional vectors in GDDEGP when using
rays_predictLeverage GDDEGP/WDEGP-GDDEGP flexibility to predict in any direction
Use
derivs_to_predictfor efficiency and custom derivative orderingBatch large prediction sets for memory efficiency
Typical Workflow:
Initialize and train model with appropriate configuration (including
derivative_locations)Optimize hyperparameters thoroughly
Make predictions with uncertainty quantification
Use
derivs_to_predictto select specific derivatives if neededExtract relevant components using row indexing (function, derivatives, submodels)
Visualize results with confidence bands
Validate against test data when available
For more information on model initialization and hyperparameter optimization, see the respective documentation sections.