Machine Learning in Materials Processing & Characterization
Unit 2: Physics of Data Formation
FAU Erlangen-Nürnberg
Prerequisite (MFML Unit 2): feature matrix \(\mathbf{X}\), SVD, PCA, scree plots, standardization, low-rank approximation, eigen-microstructures.
By the end of this unit you can:
To understand data, we first ask: What is probing the sample?

Physics of the Probe:
Information & Data Format:

Physics of the Probe:
Information & Data Format:
Electromagnetic spectrum
Ions:

Neutrons:
Physics of the Probe:
Information & Data Format:
In Characterization:
To fully capture a signal with maximum frequency \(\nu_{max}\), we must sample at least twice as fast: \[\nu_S \ge 2\nu_{max}\]
Nyquist Frequency: \(\nu_{Nyquist} = \frac{1}{2} \nu_S\).
Frequencies above \(\nu_{Nyquist}\) cannot be resolved and cause artifacts.
Aleatory (Statistical):
Epistemic (Knowledge-based):
\[p(x; k, \lambda) = \frac{k}{\lambda}\!\left(\frac{x}{\lambda}\right)^{k-1}\!e^{-(x/\lambda)^k}\]
Materials use cases:
ML consequence: Using MSE on Weibull-distributed targets gives wrong confidence intervals. Use a Weibull NLL or transform to log-space.
You already have the mathematical machinery:
In this unit we focus on the physics sitting on top of that math:
See MFML Unit 2 for derivations; here we apply them.
Mean — the signal level: \[\mu = E[x] = \sum_i x_i\, p(x_i)\]
Variance — the noise level: \[\sigma^2 = E\!\left[(x - \mu)^2\right]\]
For many detectors \(\text{SNR} \approx \mu / \sigma\).
Noise-model constraints on moments:
PCA components often correspond to physical modes:
Warning: PCA components are mathematical (orthogonal), not physical. Two physical effects can be mixed into one PC, or one effect can be spread across several PCs — always validate against domain knowledge.
When PCA fails: rotations, deformations and phase transitions lie on non-linear manifolds — consider NMF, autoencoders, or Kernel PCA (see MFML Unit 2 / 9).
Hohlraum laser pulse simulation:
\[L = \sum_{k=1}^{K} \sum_{\mathbf{x}_i \in C_k} \|\mathbf{x}_i - \boldsymbol{\mu}_k\|^2\]
In materials, prior knowledge often constrains \(K\):
Note
When the elbow is ambiguous, combine quantitative criteria with domain knowledge — e.g., do \(K=4\) clusters correspond to known phases?
t-Distributed Stochastic Neighbor Embedding
Fashion-MNIST example (McClarren 2021):
Materials analogue: apply t-SNE to a stack of micrographs and watch microstructure types separate.
Important
t-SNE is stochastic and depends on perplexity. Run it multiple times and vary perplexity before trusting any visual cluster.

© Philipp Pelz - Machine Learning in Materials Processing & Characterization