AI 4 Materials / KI-Materialtechnologie
FAU Erlangen-Nürnberg
| Method | One-line summary | When to reach for it |
|---|---|---|
| Gaussian Process | Closed-form Bayesian regression over functions | Small \(n\) (\(\lesssim 10^3\)), tabular, smooth response, need calibrated CI |
| MC Dropout | Keep dropout on at inference, sample \(T\) passes | Big NN already trained, cheap epistemic estimate per pixel/voxel |
| Deep ensembles | Train \(M\) independent NNs, use disagreement | Best-calibrated NN UQ; budget for \(M\times\) training |
| MDN | NN outputs \((\pi_k, \mu_k, \sigma_k)\) of a Gaussian mixture | Multi-modal output (phase A or phase B from the same input) |
| Calibration | Reliability diagram + temperature scaling | Mandatory before any deployed model |
Note
We are not re-deriving the math. See MFML W12 for posteriors, ELBO, marginal likelihood. Today: which tool, on which lab task, with which numbers.
Note
Trust = prediction + calibrated confidence. The “calibrated” word is what most published materials-ML papers skip.
(a) Tool / operator / coating domain shift
(b) Experiments cost €1k+/h
| Lab task | Recommended UQ | Rationale | Cost driver |
|---|---|---|---|
| Tabular regression, \(n \in [10, 300]\) (composition \(\to\) property) | GP, RBF or Matérn \(\nu{=}5/2\) | Closed-form CI, smooth response, hyperparams interpretable | \(O(N^3)\) once — fine for \(N \lesssim 10^3\) |
| Pixel-wise segmentation of microscopy (CNN, U-Net) | MC Dropout, \(T \approx 30\) | Reuse trained net, get per-pixel variance map | \(T\times\) inference per image |
| High-stakes property regression with budget for retraining | Deep ensemble, \(M \in [5, 10]\) | Best calibration in literature (Lakshminarayanan et al. 2017) | \(M\times\) training |
| Multi-modal output (one input, two phases possible) | MDN, \(K \in \{2,3\}\) | Bimodal \(p(y\|x)\) — mean is meaningless | One training, harder to fit |
| Any deployed model | Reliability diagram + temp scaling | Free, post-hoc, on a held-out cal set | Trivial |
Training-time vs inference-time tradeoff is more loaded in a lab than in webscale ML:
Note
MC Dropout’s variance estimate degrades with very deep nets and very low dropout rates — it can collapse to near-zero variance and look overconfident. Always validate with a held-out reliability diagram.
Why this is a textbook GP problem:
\[ k_{\text{RBF}}(T, T') = \sigma_f^2 \exp\!\left(-\frac{(T - T')^2}{2\,\ell^2}\right) \]
Note
For metallurgical responses with regime changes, prefer Matérn \(\nu{=}5/2\) over RBF — it is once-differentiable instead of \(C^\infty\), which matches the physics better.
Numerical example. With \(n{=}30\), \(\sigma_n \approx 0.8\) HRC (from replicates), \(\ell \approx 60\) °C: the GP posterior at \(T{=}500\) °C (a held-out point) gives \(\hat{\text{HRC}} = 38.2 \pm 1.6\) (2\(\sigma\)). Spec sheet says 36–40 — we just skipped a destructive test.
Note
The CI growth is only honest if the kernel is correct. A too-long \(\ell\) will make the GP overconfident outside the data. Always cross-check with a held-out CV reliability diagram before you trust extrapolation.
v2) handles up to \(\sim 10\,000\) rows and is competitive with tuned XGBoost on small-tabular benchmarks.When to reach for TabPFN over a GP on the 21CrMoV5-7 task:
When the GP still wins:
Note
On the 21CrMoV5-7 task TabPFN matches the GP’s leave-one-out RMSE within \(\sim 0.3\) HRC; the GP wins on interpretability of \(\ell\) and \(\sigma_f\). Use whichever your stakeholder will sign off on.
Note
“My segmentation accuracy dropped after the chamber vent” is a calibration failure as often as a model failure. Diagnose with a reliability diagram before retraining.
Practical detectors:
Split conformal (Angelopoulos and Bates 2023). Given any pre-trained predictor \(\hat{f}\):
\[ \mathbb{P}\big[\,Y_{\text{test}} \in C(X_{\text{test}})\,\big] \;\geq\; 1 - \alpha \]
Why this lands in materials labs
Note
The only assumption is exchangeability of calibration and test data. Under distribution shift (new alloy family, new microscope) this breaks — width must grow or coverage drops silently.
Conformalized Quantile Regression (CQR) (Romano et al. 2019):
Materials picture. On a LPBF process map (laser power vs scan velocity), CQR widens the predicted-hardness interval inside the keyhole-onset band and tightens it in the safe interior — automatically. Vanilla split conformal would put the same interval everywhere.
When to use which
| Setting | Use |
|---|---|
| Homoscedastic, in-control process | Split conformal (cheapest) |
| Heteroscedastic / regime-dependent noise | CQR |
| Online streaming with drift | Adaptive conformal (Gibbs & Candès 2021) |
| Safety-critical, regulator-facing | Any conformal + held-out coverage report |
The one-paragraph “uncertainty section” of a model card:
“Uncertainty is reported as 95% CIs from \(T{=}30\) MC-dropout passes. The model is calibrated by temperature scaling on a 200-image held-out set; expected calibration error 0.03. The model is in-distribution iff Mahalanobis score \(<\tau_{\text{OOD}} = 14.2\); outside that, predictions are not returned.”
Note
If you cannot write that paragraph for your model, you cannot deploy it.
Pointer slide. If you want the math behind today’s tools, MFML W12 has it:

© Philipp Pelz - ML for Characterization and Processing