AI 4 Materials / KI-Materialtechnologie
FAU Erlangen-Nürnberg
| Method | One-line summary | When to reach for it |
|---|---|---|
| Gaussian Process | Closed-form Bayesian regression over functions | Small \(n\) (\(\lesssim 10^3\)), tabular, smooth response, need calibrated CI |
| MC Dropout | Keep dropout on at inference, sample \(T\) passes | Big NN already trained, cheap epistemic estimate per pixel/voxel |
| Deep ensembles | Train \(M\) independent NNs, use disagreement | Best-calibrated NN UQ; budget for \(M\times\) training |
| MDN | NN outputs \((\pi_k, \mu_k, \sigma_k)\) of a Gaussian mixture | Multi-modal output (phase A or phase B from the same input) |
| Calibration | Reliability diagram + temperature scaling | Mandatory before any deployed model |
Note
We are not re-deriving the math. See MFML W12 for posteriors, ELBO, marginal likelihood. Today: which tool, on which lab task, with which numbers.
Note
Trust = prediction + calibrated confidence. The “calibrated” word is what most published materials-ML papers skip.
(a) Tool / operator / coating domain shift
(b) Experiments cost €1k+/h
| Lab task | Recommended UQ | Rationale | Cost driver |
|---|---|---|---|
| Tabular regression, \(n \in [10, 300]\) (composition \(\to\) property) | GP, RBF or Matérn \(\nu{=}5/2\) | Closed-form CI, smooth response, hyperparams interpretable | \(O(N^3)\) once — fine for \(N \lesssim 10^3\) |
| Pixel-wise segmentation of microscopy (CNN, U-Net) | MC Dropout, \(T \approx 30\) | Reuse trained net, get per-pixel variance map | \(T\times\) inference per image |
| High-stakes property regression with budget for retraining | Deep ensemble, \(M \in [5, 10]\) | Best calibration in literature (Lakshminarayanan et al. 2017) | \(M\times\) training |
| Multi-modal output (one input, two phases possible) | MDN, \(K \in \{2,3\}\) | Bimodal \(p(y\|x)\) — mean is meaningless | One training, harder to fit |
| Any deployed model | Reliability diagram + temp scaling | Free, post-hoc, on a held-out cal set | Trivial |
Training-time vs inference-time tradeoff is more loaded in a lab than in webscale ML:
Note
MC Dropout’s variance estimate degrades with very deep nets and very low dropout rates — it can collapse to near-zero variance and look overconfident. Always validate with a held-out reliability diagram.
Why this is a textbook GP problem:
\[ k_{\text{RBF}}(T, T') = \sigma_f^2 \exp\!\left(-\frac{(T - T')^2}{2\,\ell^2}\right) \]
Note
For metallurgical responses with regime changes, prefer Matérn \(\nu{=}5/2\) over RBF — it is once-differentiable instead of \(C^\infty\), which matches the physics better.
Numerical example. With \(n{=}30\), \(\sigma_n \approx 0.8\) HRC (from replicates), \(\ell \approx 60\) °C: the GP posterior at \(T{=}500\) °C (a held-out point) gives \(\hat{\text{HRC}} = 38.2 \pm 1.6\) (2\(\sigma\)). Spec sheet says 36–40 — we just skipped a destructive test.
Note
The CI growth is only honest if the kernel is correct. A too-long \(\ell\) will make the GP overconfident outside the data. Always cross-check with a held-out CV reliability diagram before you trust extrapolation.
v2) handles up to \(\sim 10\,000\) rows and is competitive with tuned XGBoost on small-tabular benchmarks.github.com/PriorLabs/TabPFN — pip install tabpfn, scikit-learn-compatible API.When the GP still wins:
Note
On the 21CrMoV5-7 task TabPFN matches the GP’s leave-one-out RMSE within \(\sim 0.3\) HRC; the GP wins on interpretability of \(\ell\) and \(\sigma_f\). Use whichever your stakeholder will sign off on.
Note
“My segmentation accuracy dropped after the chamber vent” is a calibration failure as often as a model failure. Diagnose with a reliability diagram before retraining.
Practical detectors:
Materials deployment defaults:
Why we promote it from “footnote” to “default”:
Warning
Materials-specific failure mode. Tool drift (§4 SEM #1 → #2) breaks exchangeability silently. Always re-run coverage on a per-tool calibration set, or accept that the guarantee is gone.
The one-paragraph “uncertainty section” of a model card:
“Uncertainty is reported as 95% CIs from \(T{=}30\) MC-dropout passes. The model is calibrated by temperature scaling on a 200-image held-out set; expected calibration error 0.03. The model is in-distribution iff Mahalanobis score \(<\tau_{\text{OOD}} = 14.2\); outside that, predictions are not returned.”
Note
If you cannot write that paragraph for your model, you cannot deploy it.
Pointer slide. If you want the math behind today’s tools:
MFML W7 (probabilistic view of learning):
MFML W12 (uncertainty in predictions):

© Philipp Pelz - ML for Characterization and Processing