FAU Erlangen-Nürnberg
This unit is not a methods unit. The methods were derived elsewhere; here we ask what the signal physics demands of them.
| Method | Owned by (derived in) |
|---|---|
| PCA / SVD | MFML u02 · ML-PC u02 |
| Clustering, autoencoders | MFML u05 · ML-PC u05 |
| NMF | MFML u02 · ML-PC u05 |
| t-SNE / UMAP, latent spaces | MFML u09 |
| MAE / SSL (DINOv2, I-JEPA) | MFML u09 · ML-PC u09b |
Important
We reference these methods. We do not re-derive them. Unit 9 is about background subtraction, calibration transfer, quantification, rotational ambiguity, operando streaming — the things the signal physics forces on you.
Learning outcomes. After 90 minutes you can:
Note
A spectrum with \(N\) channels is a vector \(\mathbf{x} \in \mathbb{R}^N\). Every linear-algebra / ML tool applies directly — but each “dimension” is a physical energy channel, and that is what makes this unit different from generic vector ML.
Important
The signal-to-noise is Poisson: variance equals the mean. A peak with 100 counts has \(\pm 10\) noise; a background of 10 000 counts has \(\pm 100\). This is why dose, normalization, and background all couple — you cannot treat them independently.
The signal’s structure is dictated by the physics of the probe — every method in §2–§4 must respect it.
Note
Different physics → different background model, different noise, different invariances. There is no universal preprocessing. The pipeline must be chosen per modality.
[FIGURE: five mini-panels, one per modality, each showing the characteristic peak/edge shape sitting on its characteristic background, annotated with the physical origin of each component]
Important
Pointer. The methods that solve these — PCA/SVD (MFML u02, ML-PC u02), autoencoders / VAE / conv-AE (MFML u05, ML-PC u05), t-SNE/UMAP & latent spaces & MAE/DINOv2/I-JEPA (MFML u09) — were all derived there. From here we apply them to what the signal physics demands.
Important
The dominant error term in any spectral-ML result is almost never the model — it is the preprocessing. A 2% baseline error swamps the difference between PCA and a transformer. §2 is the unit’s real content because almost none of it exists in the methods units.
Algorithmic (model-free)
Physical (model-based)
Important
A wrong background biases every downstream feature — peak areas, ratios, latent coordinates. It is a systematic, not random, error: averaging more spectra does not remove it.
A real Raman spectrum (black) with a strong rising fluorescence background; several baseline estimates (dotted/dashed) and the resulting baseline-corrected spectra. Background subtraction is the measurement here, not housekeeping (Xu et al. 2021).

The STEM-specific problem. At atomic resolution and low dose, the power-law fit is itself dominated by Poisson noise — a noisy background becomes a systematic error in every extracted edge (slide 03).
Important
Exploiting dataset redundancy improves background estimation and chemical sensitivity — the basis of the Cornell Spectrum Imager (Cueva et al. 2012). Better background craft, not a bigger model, is what lowers detection limits.
Peak-referenced calibration. Anchor to known features:
Warping for non-rigid misalignment.
Important
Misalignment is the silent killer of cross-instrument ML: PCA “discovers” a component that is just the shift; a classifier learns the instrument, not the chemistry. Always align before reduction.
| Strategy | Formula | Physical assumption |
|---|---|---|
| Total count / area | \(\mathbf{x}' = \mathbf{x}/\sum_j x_j\) | All variation in total intensity is dose/thickness, not chemistry |
| Max-peak height | \(\mathbf{x}' = \mathbf{x}/\max_j x_j\) | A reference peak is composition-invariant |
| SNV | \(\mathbf{x}' = (\mathbf{x}-\bar x)/s_x\) | Per-spectrum mean & std are nuisance scatter/offset (chemometrics) |
| Reference-peak ratio | \(\mathbf{x}/x_{\text{ref}}\) | An internal-standard line is constant (e.g. matrix element) |
Important
Every normalization changes the noise model. Total-count normalization correlates channels and breaks the clean Poisson assumption — do it after any variance-stabilizing / Poisson-aware step, not before.
Note
Applied callout — DAE-assisted EELS. At low dose the Fe-L₂,₃ fine structure separating Fe²⁺/Fe³⁺ is buried in Poisson noise. A denoising autoencoder trained on simulated Fe-L edges (the method is ML-PC u05 §D) is used here purely as a preprocessing denoiser feeding the constrained fit — not as the analysis. Result: clean white-line ratios at ~10× lower dose. The AE is a tool inside box 1–4, not the deliverable.
[FIGURE: two overlapping EELS L₃/L₂ white lines, raw noisy data, the constrained Voigt-multiplet fit with shared width and fixed branching ratio, and residuals]
Piecewise Direct Standardization (PDS) (Wang et al. 1991)
Simpler & more general
Important
PDS needs only a handful of standards measured on both instruments — orders of magnitude cheaper than re-labelling. The difference between a model that ships to one lab and one that ships to a fleet.
Important
Trade-off. Spatial smoothing blurs sharp interfaces and can invent mixed spectra at boundaries — choose the receptive field to match the smallest real feature, not the noise.
A spectrum image is an \((x,y,E)\) datacube — every pixel holds a full spectrum. Spatial-spectral models borrow strength from neighbouring (near-identical) pixels to denoise each spectrum.
| Decomposition | Factorization | Spectral interpretation |
|---|---|---|
| PCA / SVD | \(\mathbf{X}\approx \bar{\mathbf{x}} + \mathbf{C}\mathbf{V}^\top\), \(\mathbf{V}\) orthonormal | Eigenspectra = orthogonal variation directions (can be negative — not physical phases) |
| NMF | \(\mathbf{X}\approx \mathbf{W}\mathbf{H}\), \(\mathbf{W},\mathbf{H}\ge 0\) | \(\mathbf{H}\) = end-member spectra (≈ pure phases), \(\mathbf{W}\) = abundance maps |
| AE / conv-AE | \(\mathbf{x}\!\to\!\mathbf{z}\!\to\!\hat{\mathbf{x}}\) | Non-linear latent; handles peak shift, not just mixing |


The seminal EM result. PCA of an EEL spectrum image extracts chemically relevant components — score maps localize phases, loadings are interpretable spectra (Bosman et al. 2006).
Note
This is slide 13’s PCA row in the STEM: the method is MFML u02; the reading of the loadings as chemistry is the materials content.

Composition and bonding. With weighted / two-way-scaled PCA, the same decomposition recovers near-edge fine structure (ELNES) — bonding and orientation, not only which elements are present (Bosman et al. 2006).
Important
Get the weighting wrong and PCA “denoises” the bonding signal away. Weighting is not a detail — it decides whether the chemistry survives (slide 20a).
MCR-ALS — Multivariate Curve Resolution by Alternating Least Squares (Juan et al. 2014): solve \(\mathbf{X}=\mathbf{C}\mathbf{S}^\top+\mathbf{E}\) by alternating non-negative least squares for concentrations \(\mathbf{C}\) and spectra \(\mathbf{S}\). In the STEM it unmixes raw EDS/EELS spectrum images into pure-component spectra + abundance maps (Kotula and Keenan 2006) (slide 14a).
Rotational ambiguity. For any invertible \(\mathbf{T}\): \[\mathbf{X} = (\mathbf{C}\mathbf{T})(\mathbf{T}^{-1}\mathbf{S}^\top)\] fits equally well. Non-negativity alone leaves a feasible band of solutions, not a unique answer.
Constraints collapse the band:
Important
Genuinely not in the methods units. NMF is taught there; MCR-ALS + rotational ambiguity + how physical constraints resolve it is the new content.
[FIGURE: feasible band of resolved spectra under non-negativity only (a fan of curves) collapsing to a single curve as closure + unimodality + a known end-member are added]

The EDS instantiation of slide 14. MCR applied to a raw STEM-EDS spectrum image returns a high-contrast set of component spectra + maps — no element-by-element windowing (Kotula and Keenan 2006).
Important
Sandia’s MSA-for-EM line of work is why MCR/PCA are the production default in EDS labs — decades before deep learning. Reach for a network only when these break (slide 24).

A geometric view of slide 14. Under linear mixing each spectrum is a convex combination of end-members → the data fill a simplex; its vertices are the pure phases.
Note
The endmember/abundance picture borrowed from hyperspectral remote sensing — spectroscopy’s cousin discipline. Same math, different photons.
Cliff-Lorimer (EDX, thin film) (Cliff and Lorimer 1975) \[\frac{C_A}{C_B} = k_{AB}\,\frac{I_A}{I_B}\] Ratio of background-subtracted line intensities × a known \(k\)-factor. Needs the thin-film approximation (no absorption).
ζ-factor method (Watanabe and Williams 2006) \[C_A = \zeta_A \frac{I_A}{\rho t}\cdot(\dots)\] Absolute quantification from first principles; folds in mass-thickness, handles absorption self-consistently — the modern standard.
Important
ML can predict \(I_A\) robustly (denoising, deconvolution); the \(k\)/ζ-factor step is physics. Skipping it and regressing concentration end-to-end discards the physical audit trail a lab/certifier requires.
Important
The often-dominant term is not counting statistics — it is the background-model systematic (slide 07). Reporting only \(\sqrt{N}\) Poisson error bars is the most common honest-looking lie in the field.
Important
A non-linear method fitting a non-linear mixture does not prove it recovered the physics. Always check against a sample of known fractional composition.

When PCA/ICA struggle. If end-member abundances are statistically dependent — the usual case in real maps — PCA/ICA unmix poorly. A Bayesian model with the simplex priors (slide 14b) estimates end-members + abundances and their uncertainty (Dobigeon and Brun 2012).
Important
A principled prior beats a generic decomposition when the physics (non-negativity, closure, dependence) is known. Encode the physics; don’t hope the SVD stumbles onto it.

What the EM field actually deploys. Not a giant foundation model — a compact convolutional denoising autoencoder mapping noisy → clean spectra. RapidEELS (Pate et al. 2021) is the canonical case.
Note
The conv-DAE is the workhorse because it is small, fast, trainable on simulation, and auditable: it drops in as a preprocessing denoiser feeding the physical fit (slide 10), not an end-to-end black box.
Denoised EELS at 25/100/200/400 FPS vs ground truth: the conv-DAE recovers the O-K / Fe-L edge shape even at ~15 counts/channel (shot-noise SNR ≈ 3.8), and beats 5- and 7-component PCA on fine-feature MSE (Pate et al. 2021).
Important
Denoising here is a preprocessing step validated against ground truth (slide 10) — never a latent read as the answer (slide 17).

Important
But a softmax class is not a calibrated valence. For the continuous mixed-valence gradient, read the fitted white-line ratio (slide 23), not the latent (slide 17). Classification is fast triage; the physical fit is the deliverable.
Important
Self-supervision is the right tool in the STEM because clean references rarely exist and dose is capped by beam damage — you learn from the noisy data you can afford to acquire (slides 03, 26).
Note
Method lineage: UDVD is a blind-spot CV denoiser adapted to EELS — the generic architecture is MFML u05/u09; the contribution is the physics-aware adaptation + simulation benchmarking.
Important
Reality check. There is no validated EELS/EDS foundation model yet. The field’s deployed deep learning is the conv denoiser (slide 18) and decades-old multivariate statistics (slides 13–14a). Treat “spectral foundation model” as a research direction, not a tool to reach for.
Question. PCA denoising lets you cut acquisition dose 10×. Why not always do it?
Hint. Think about a feature present in only one or two pixels of a million-pixel map.
Consider.
Important
Denoising vs. discovery is a fundamental tension, not a tuning issue. Always inspect the residual \(\mathbf{x}-\hat{\mathbf{x}}\) spatially — a rare phase hides there, not in the reconstruction. In EELS this is documented: PCA filtering introduces a systematic bias (Lichtert and Verbeeck 2013) and artifacts when peak-to-background is poor (Cueva et al. 2012) (slide 20b).

PCA denoising must be done right.


The cautionary literature — take it seriously.
Important
“Looks clean” ≠ “is correct”. A denoised map can be precise and biased at once. Validate against unfiltered fits / known references (slide 16) before trusting PCA-denoised numbers.
The latent can rediscover the physics.
Seed it from simulation when labels are scarce.
Important
The sim must capture the right physics (peak shapes, backgrounds, artifacts) or you transfer a simulation accent the real instrument never speaks.
Problem. Identify crystallographic phases in a noisy multi-phase XRD pattern; ICDD peak-matching is manual and brittle to mixtures, preferred orientation, and broadening.
Pipeline (all methods referenced). §2 preprocessing (SNIP background, LaB₆ \(2\theta\) calibration) → AE/classifier pretrained on simulated patterns (slide 21) → nearest-neighbour in latent space; high reconstruction error → amorphous/unknown (slide 20).
Note
Here only briefly. XRD is not a (S)TEM modality — a contrast case. The same recipe (sim-pretrain + latent + residual) carries straight over to the EELS/EDS cases that follow (23–24a).

Problem. Map Fe oxidation state at nm resolution. The Fe-L₂,₃ white-line ratio (\(L_3/L_2\)) and onset shift (~0.3 eV, slide 04) separate the valences — but it is buried in Poisson noise at usable dose.
Pipeline. Power-law background (07/07a) → ZLP calibration (08) → DAE denoise pretrained on simulated Fe-L edges (Pate et al. 2021); weak signal → self-supervised UDVD (Wang et al. 2025) (18c) → constrained white-line fit (10) → continuous valence from the fitted ratio, not a raw latent (17).
Impact. Oxidation-state mapping at ~5–10× lower dose; resolves continuous mixed-valence gradients; validated against valence reference standards (16).
The scale challenge (the point of the slide). \(512{\times}512{\times}2048 \approx 5\times10^5\) spectra per field; multiple fields → \(10^6\)–\(10^7\) spectra per sample. The bottleneck is engineering, not method novelty.
Why PCA, specifically, here. Linear, one-pass, \(O(\min(N,D)^2)\), deterministic, streamable (incremental SVD). Denoise via top-\(K\) reconstruction → K-means in score space → map clusters back to \((x,y)\) → phase map in minutes on a workstation. (Method: ML-PC u02; noise-weighted optimal truncation (Potapov and Lubk 2019); MCR for physical components (Kotula and Keenan 2006), slide 14a.)
Engineering reality. Memory-mapped datacubes, chunked/out-of-core SVD, GPU only where it pays. The win is throughput at fixed accuracy, not a better model.
Important
And it inherits slide 20’s caveat: trace elements below the noise floor and rare phases can be denoised away. Keep a residual/anomaly pass alongside the phase map.

The end-to-end story. Raw EDS datacube → noise-weighted PCA denoise + optimal truncation (Potapov and Lubk 2019) → MCR for physical components (Kotula and Keenan 2006) → abundance maps registered to \((x,y)\) → interfacial chemistry of the device.
Important
The deliverable is which phase is where, with what spectrum — auditable components, not a latent embedding. Match the method to the question (throughput + interpretability), not to fashion.
Workflow:
Important
The AE-anomaly method is ML-PC u05 §E (threshold from nominal validation error, never from anomalies). This slide is the materials discovery workflow wrapped around it.
New problems the stream creates
Beam-damage / dose-fractionation
Important
Methods referenced: AE-anomaly ML-PC u05 §E; Poisson noise ML-PC u02. New here: time — non-stationarity, online thresholds, dose as a budgeted resource.
Important
The unit’s recurring lesson, a third time. Ill-posedness here = rotational ambiguity in MCR (14) = non-identifiable peaks (10). The cure is always the same: physical constraints (path filtering, known coordination chemistry). EXAFS is a synchrotron method, kept only to make that pattern unmistakable.
Important
The methods live in MFML u02/u05/u09 and ML-PC u02/u05. This unit was about what the signal physics demands. Next: Unit 10 — Transformers for materials.
| Task | Classical (still the default) | Deep / modern |
|---|---|---|
| Background | power law, SNIP, AsLS; LCPL + local averaging (Cueva et al. 2012) | — |
| Denoising | weighted / optimal PCA (Bosman et al. 2006; Potapov and Lubk 2019) | conv-DAE (Pate et al. 2021); self-supervised UDVD (Wang et al. 2025) |
| Decomposition / unmixing | PCA (Bosman et al. 2006), MCR (Kotula and Keenan 2006), Bayesian LU (Dobigeon and Brun 2012) | non-linear AE decoders (u05) |
| Quantification | Cliff-Lorimer (Cliff and Lorimer 1975), ζ-factor (Watanabe and Williams 2006) | ML for intensity extraction only |
| Valence / ELNES | reference-spectrum fitting (Bosman et al. 2006) | latent classifier (Pate et al. 2021) |
Important
Two honest truths. (1) Multivariate statistics (2006–2019) is still the production workhorse. (2) Deep learning’s clearest EELS/EDS win so far is denoising for low dose — the enabler, not a replacement for the physics.

© Philipp Pelz - Machine Learning in Materials Processing & Characterization