graph LR
Model[Model Type] --> WB[White-box]
Model --> GB[Grey-box]
Model --> BB[Black-box]
WB --- WBdesc[Physics-based]
GB --- GBdesc[Hybrid]
BB --- BBdesc[Data-driven]
FAU Erlangen-Nürnberg



Landscape of advanced materials development companies & Startups
By the end of this lecture, students can:

Example — Alloy yield strength: A neural net trained on composition → yield-strength data predicts \(\sigma_y < 0\) for a novel alloy. The test MSE is low, but the prediction is physically meaningless (\(\sigma_y \geq 0\)). Encoding this constraint (e.g. softplus output) eliminates impossible predictions and reduces the data needed because the model no longer wastes capacity on the infeasible region.
Example — Melt-pool dynamics in additive manufacturing: We can formulate the partial differential equations (PDEs) for laser-melting metal powder, but simulating them for a whole part is computationally too expensive for real-time control. Internal defects are also physically “partially observed.” A hybrid strategy trains an ML model on high-fidelity simulations and sensor data to predict defects in real-time, bypassing the computational bottleneck of pure physics while retaining physical validity (Meng et al. 2020).
graph LR
Model[Model Type] --> WB[White-box]
Model --> GB[Grey-box]
Model --> BB[Black-box]
WB --- WBdesc[Physics-based]
GB --- GBdesc[Hybrid]
BB --- BBdesc[Data-driven]
Example — Grey-box: crystal-plasticity with a learned hardening law: In crystal-plasticity finite-element modeling (CPFEM) the kinematics (deformation gradient decomposition \(\mathbf{F} = \mathbf{F}^e \mathbf{F}^p\), slip-system geometry) are well understood and kept as white-box components. The strain-hardening law \(\dot{\tau}_c = h(\gamma, \dot{\gamma}, T)\), however, encodes complex dislocation interactions that are expensive to derive from first principles. A grey-box strategy replaces only this hardening function with a small neural network trained on experimental stress–strain curves, while the surrounding finite-element equilibrium and crystallographic slip rules remain physics-based. The result: physically consistent deformation fields and accurate hardening behavior without a full empirical constitutive model (Neuer et al. 2024).
Example — Automated weld inspection: A deep CNN classifies radiographic weld images as accept / reject with 97 % accuracy. When a batch of welds is rejected, the production engineer asks: “Is the defect porosity, lack of fusion, or a crack?” The model cannot answer — it was trained end-to-end on a binary label. Without an interpretable intermediate representation, the team must repeat expensive manual inspection to identify the root cause, negating the deployment benefit.
Example — Fatigue-life prediction with SHAP analysis: A gradient-boosted tree predicts fatigue life \(N_f\) of welded joints from geometry, load ratio, and material grade. Global explainability (SHAP summary plot) reveals that stress range \(\Delta\sigma\) and weld-toe radius \(r\) dominate predictions across the dataset — confirming known fracture-mechanics drivers. Local explainability (SHAP waterfall for a single joint) shows that for one anomalous prediction the model relied heavily on an unusual surface-roughness value, flagging a possible measurement error. The same model, two explainability scopes, two different actionable insights.
Example — Thermal-barrier coating lifetime: A turbine-blade thermal-barrier coating (TBC) degrades through oxide-layer growth and thermal cycling. The oxidation kinetics follow a well-known parabolic rate law \(h_{\text{ox}} \propto \sqrt{t}\) (trusted physics). However, the spallation failure also depends on interface roughness, coating microstructure, and thermal-cycle profile — couplings that are poorly modelled analytically.
Hybrid strategy:
- Physics module: parabolic oxidation model computes oxide thickness \(h_{\text{ox}}(t, T)\).
- ML module: a small network takes \(h_{\text{ox}}\), cycle count, roughness, and porosity as inputs and predicts remaining useful life (RUL).
- Explicit interface: the physics module outputs \(h_{\text{ox}}\) in μm; the ML module ingests it as a feature alongside microstructural descriptors. If the oxidation model is updated (e.g., different alloy), only the physics module changes; the ML module is retrained on new residuals.
This is exactly the pattern: trusted physics → learned residual → explicit interface.
Learning with labeled data. Includes regression (continuous targets) and classification (discrete categories).
Example: Predicting alloy yield strength from chemical composition [Bhandari et al., 2020]. 
Finding hidden structure in unlabeled data (clustering, dimensionality reduction, embeddings).
Example: Clustering unlabelled microscopy images to discover distinct phases [Stender et al., 4D-STEM phase mapping]. 
Learning optimal actions through trial and error to maximize a reward signal.
Example: An autonomous agent controlling a laser-melting process to minimize defects [Wang et al., 2021]. 
\[ \hat{\theta} = \arg\min_\theta \frac{1}{N}\sum_{i=1}^{N}\ell\big(f_\theta(\mathbf{x}_i), y_i\big) \]
\[ \hat{\theta} = \arg\min_\theta \frac{1}{N}\sum_{i=1}^{N}\ell\big(f_\theta(\mathbf{x}_i), y_i\big) + \lambda\Omega(\theta) \]
\[\text{MSE} = \frac{1}{N}\sum_{i=1}^{N} (y_i - \hat{y}_i)^2\]
\[\text{MAE} = \frac{1}{N}\sum_{i=1}^{N} |y_i - \hat{y}_i|\]
\[L_{0-1} = \begin{cases} 0 & \text{if } \hat{y} = y \\ 1 & \text{if } \hat{y} \neq y \end{cases}\]
Represent labels as vectors where the correct category is \(1\) and others are \(0\): - Vacancy = \([1, 0, 0]^T\) - Dislocation = \([0, 1, 0]^T\) - Precipitate = \([0, 0, 1]^T\)
Our model outputs a raw score (logit) \(o_i\) for each class. But these can be negative and don’t sum to 1! How do we convert them to probabilities \(\hat{y}\)?
\[\hat{y}_i = \frac{\exp(o_i)}{\sum_{j} \exp(o_j)}\]
Now that we have probability predictions \(\hat{y}\), how do we penalize bad ones? \[L = -\sum_{c=1}^{C} y_c \log(\hat{y}_c)\] Since \(y\) is one-hot, this simplifies to \(-\log(\hat{y}_{\text{true}})\).
\[ p(\theta\mid\mathcal{D}) \propto p(\mathcal{D}\mid\theta)\,p(\theta) \]


Total Expected Error = \(\text{Bias}^2\) + \(\text{Variance}\) + \(\text{Noise}\)
How do we turn the dials to hit the sweet spot?
Note for Engineers: Adding a hard physical equation as a constraint explicitly increases Bias to drastically kill Variance in small-data regimes, making it deployable!
flowchart LR
Data[(Full Dataset)] --> Train[Training Set]
Data --> Val[Validation Set]
Data --> Test[Test Set]
Train --->|Fit parameters| Model((Model))
Val --->|Tune hyperparameters| Model
Model -.->|Iterative tuning| Val
Model ===>|One-shot evaluation| Test
style Train fill:#d4edda,stroke:#28a745,color:#155724,stroke-width:2px
style Val fill:#fff3cd,stroke:#ffc107,color:#856404,stroke-width:2px
style Test fill:#f8d7da,stroke:#dc3545,color:#721c24,stroke-width:2px,stroke-dasharray: 5 5
style Model fill:#cce5ff,stroke:#004085,color:#004085,stroke-width:2px
style Data fill:#e2e3e5,stroke:#383d41,color:#383d41,stroke-width:2px
Irreducible Noise 🎲
Reducible Ignorance 📚
\(\rightarrow\) Key takeaway: Different mitigation actions are required. You cannot “smooth out” epistemic ignorance, nor can you “gather more data” to fix aleatoric noise (Neuer et al. 2024).
Question: You train a deep neural network to predict the fatigue strength of an alloy. The training MSE is nearly zero, but the test MSE is very high. Adding a physical constraint (e.g., non-negative stiffness) slightly increases training MSE but significantly lowers test MSE. Why?
Answer: B. The domain constraint restricts the model from fitting spurious, physically impossible correlations in the training data.

© Philipp Pelz - Mathematical Foundations of AI & ML