Variability: Peak positions shift with composition; peak shapes change with bonding
Spectra as Vectors
A spectrum with \(N\) channels is simply a vector \(\mathbf{x} \in \mathbb{R}^N\). All the linear algebra and ML tools we have learned apply directly — but the physical meaning of each “dimension” (an energy channel) matters for interpretation.
04. The Need for ML
Traditional Analysis Is Breaking Down
Manual peak fitting is slow, subjective, and does not scale
Fitting 10 peaks in 1 spectrum: feasible
Fitting 10 peaks in 1,000,000 spectra: impossible
Overlapping peaks make decomposition ambiguous
Fe-L and Mn-L edges in EELS overlap at ~640 eV
Ti-K\(\alpha\) and Ba-L\(\alpha\) overlap in EDS at ~4.5 keV
Subtle spectral changes encode critical physics
The Fe-L\(_{2,3}\) edge shape distinguishes Fe\(^{2+}\) from Fe\(^{3+}\)
Peak shift of 0.3 eV indicates a change in oxidation state
\(\bar{\mathbf{x}}\): mean spectrum (average over all \(N\) spectra)
\(\mathbf{v}_k\): the \(k\)-th eigenvector of the covariance matrix (the \(k\)-th eigenspectrum)
\(c_{ik}\): the score (coefficient) of spectrum \(i\) on component \(k\)
Compression: Store only the \(K\) scores \(c_{ik}\) instead of the full \(D\) channels
07. The Basis Function View: Eigenspectra
flowchart LR A["Raw Spectrum<br>x ∈ ℝ²⁰⁴⁸"] --> B["Center<br>x - x̄"] B --> C["Project onto<br>Eigenspectra V"] C --> D["Score Vector<br>c ∈ ℝᴷ"] D --> E["Reconstruct<br>x̂ = x̄ + Vc"] E --> F["Denoised<br>Spectrum"] style A fill:#2d5016,stroke:#4a8c2a,color:#fff style D fill:#1a3a5c,stroke:#2a6a9c,color:#fff style F fill:#5c1a1a,stroke:#9c2a2a,color:#fff
The eigenspectra \(\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_K\}\) form an orthonormal basis
Each eigenspectrum is a “building block” — a spectral shape that varies across the dataset
The scores \(c_{ik} = \mathbf{v}_k^\top (\mathbf{x}_i - \bar{\mathbf{x}})\) are computed by simple inner products
This is computationally cheap: once the eigenspectra are computed, projecting a new spectrum is a matrix-vector multiply
08. Interpreting Eigenspectra
PC1 (first eigenspectrum): Typically resembles the mean spectrum
Captures the dominant overall shape (background + major peaks)
PC2, PC3: Capture the dominant variations
Often correspond to specific chemical differences between phases
Example: PC2 might show positive Fe peaks and negative Cr peaks (Fe vs. Cr variation)
Appear as random oscillations with no physical meaning
09. Intrinsic Dimensionality
A 2048-channel spectrum lives in \(\mathbb{R}^{2048}\) — but how many dimensions does the data actually use?
If the sample has 4 distinct phases, the spectra approximately span a 4-dimensional subspace
The intrinsic dimensionality\(K\) is the number of independent spectral variations
For most materials datasets: \(K \sim 3\text{--}20 \ll D = 2048\)
Rule of Thumb
The intrinsic dimensionality of a spectral dataset is typically close to the number of distinct phases or chemical environments in the sample, plus a few components for background variations and thickness effects.
10. Scree Plots: Choosing the Number of Components
The reconstruction \(\hat{\mathbf{x}}_i\) retains only the signal subspace — noise is discarded
Equivalent to reducing acquisition time by 10x while maintaining chemical sensitivity
12. Think About This…
PCA Denoising: Free Lunch?
Question: If PCA denoising is so effective, why not always reduce acquisition time by 10x and denoise afterwards?
Hint: Think about what happens when a spectrum contains a feature that is unique — present in only one or two pixels of the map.
Consider:
PCA eigenspectra are computed from the entire dataset
Rare features may not contribute enough variance to be captured by the top-\(K\) components
PCA denoising can erase rare phases — minority signals get projected away
This is a fundamental tension: denoising vs. discovery
Key Insight
PCA denoising is excellent for known, common features but can suppress rare or unexpected signals. Always inspect the residuals \(\mathbf{x}_i - \hat{\mathbf{x}}_i\) for signs of discarded information.
13. PCA Limitations
The Linearity Constraint
PCA assumes that spectral variations can be expressed as linear combinations
This works well when: phases mix additively (Beer-Lambert law, EDS from thin sections)
PCA fails when:
Peak positions shift with composition (e.g., XRD peak shift with lattice parameter)
Peak shapes change non-linearly (e.g., EELS fine structure with oxidation state)
Backgrounds are multiplicative rather than additive
PCs are orthogonal — but physical phases are not orthogonal
PCs can have negative values, but spectra are non-negative
Poisson: Resample from \(\text{Poisson}(\mathbf{x}_i^{\text{clean}})\)
Masking: Randomly zero out spectral channels
The DAE learns to project noisy spectra back onto the clean data manifold
DAE vs. PCA Denoising
Both discard noise by projecting onto a low-dimensional representation. But a DAE explicitly trains on noisy-clean pairs, making it more robust when the noise model is known. PCA denoising is implicit — it assumes noise is in the discarded components.
22. DAE Example: EELS Spectra
Denoising the Fe-L\(_{2,3}\) Edge
Problem: At low dose, the EELS fine structure that distinguishes Fe\(^{2+}\) from Fe\(^{3+}\) is buried in noise
Approach:
Train a DAE on simulated Fe-L edge spectra with known oxidation states
Add Poisson noise at realistic dose levels during training
Apply trained DAE to experimental spectra pixel-by-pixel
Result: Clear Fe\(^{2+}\)/Fe\(^{3+}\) discrimination at 10x lower dose than conventional analysis
23. Convolutional Autoencoders for Spectra
Standard AEs use fully connected (dense) layers — every channel connects to every neuron
Problem: Dense layers do not know that nearby channels are related
Start with PCA — it is fast, unique, and gives you a baseline
Switch to AE when PCA reconstruction error plateaus or non-linear effects are important
Use DAE when you have a good noise model
Use VAE when you need generative capabilities or uncertainty
26. Think About This…
The Latent Space as a Chemical Compass
Question: You train an autoencoder with a 2D latent space on EDS spectra from a ternary alloy (Fe-Cr-Ni). You plot the latent coordinates for each pixel and color them by known composition. What do you expect to see?
Hint: The alloy has three independent compositional degrees of freedom, but they sum to 100%.
Consider:
Three elements with the constraint \(c_{\text{Fe}} + c_{\text{Cr}} + c_{\text{Ni}} = 1\) means two independent compositional variables
A 2D latent space should be sufficient to capture this variation
You would expect the latent space to look like a triangle (the ternary phase diagram!)
The autoencoder has rediscovered the Gibbs triangle — without being told anything about thermodynamics
Follow-up: What if you used a 1D latent space? What would be lost?
27. Anomaly Detection with Autoencoders
Train the AE on “normal” spectra from the expected phases
For a new spectrum \(\mathbf{x}\), compute the reconstruction error: \(e = \|\mathbf{x} - \hat{\mathbf{x}}\|^2\)
If \(e > \tau\) (threshold), the spectrum is anomalous — it does not fit the learned representation
Unlike supervised anomaly detection, this requires no labels — only knowledge of what “normal” looks like
28. Summary: Autoencoders for Spectra
Autoencoders are the non-linear generalization of PCA for spectral data
Key variants:
Standard AE: Compression and feature extraction
DAE: Denoising with explicit noise models
CAE: 1D convolutions for shift-invariant peak detection
VAE: Probabilistic latent space for generation and uncertainty
The latent space is the central concept — it provides:
A compressed representation for storage
A denoised representation for analysis
A feature space for clustering and classification
A basis for anomaly detection
Best practice: Always compare AE results against PCA as a baseline
29. MAE Pretraining on Unlabelled Spectra
The masking trick. Masked Autoencoder (He et al. 2022) adapted to 1-D spectra.
Split each spectrum into 64 patches (e.g., 32 bins each on a 2048-channel spectrum).
Randomly mask 75% — keep only 16 patches as visible input.
Encode the visible patches with a small transformer (4 layers, 128 dim).
A lightweight decoder sees encoder outputs + mask tokens, predicts the masked bins.
Loss. MSE on masked bins only: \[\mathcal{L}_{\text{MAE}} = \frac{1}{|\mathcal{M}|}\sum_{j \in \mathcal{M}} (x_j - \hat{x}_j)^2\] The visible bins do not contribute — the model is forced to reconstruct what it has not seen.
Pretrain on all unlabelled spectra the lab has — EELS, Raman, XRD pooled together. No labels needed.
Why this is the natural materials move.
Lab inventory: \(10^4\)–\(10^5\)unlabelled spectra accumulating on the file server, \(10^2\)labelled ones (peak areas, phase tags, valence).
Classical pipeline wastes the unlabelled bulk — PCA/AE use it only for compression, not for downstream task transfer.
MAE turns the unlabelled bulk into a feature extractor: freeze the encoder, attach a linear probe or LoRA head, fine-tune on the 100 labelled spectra.
Downstream wins.
Peak-area regression (small-data).
Phase classification (few-shot).
Anomaly detection in latent space (no labels at all).
Note
Numerical example. A 200 k-spectrum pretraining corpus + 100-labelled-spectrum fine-tuning typically beats a from-scratch CNN by 5–15 points on a small-data benchmark, depending on signal-to-noise.
Section 4: Practical Signal Processing
Pipelines, Hyperspectral Data, and Transfer Learning
30. Signal Processing Workflow
flowchart TD A["Raw Spectra<br>(N × D)"] --> B["Preprocessing"] B --> B1["Background Subtraction"] B --> B2["Normalization"] B --> B3["Alignment / Calibration"] B1 & B2 & B3 --> C["Dimensionality Reduction"] C --> C1["PCA"] C --> C2["Autoencoder"] C1 & C2 --> D["Latent Representation<br>(N × K)"] D --> E1["Clustering<br>(Phase Maps)"] D --> E2["Anomaly Detection<br>(Rare Phases)"] D --> E3["Classification<br>(Known Phases)"] style A fill:#2d5016,stroke:#4a8c2a,color:#fff style D fill:#1a3a5c,stroke:#2a6a9c,color:#fff style E1 fill:#5c1a1a,stroke:#9c2a2a,color:#fff style E2 fill:#5c1a1a,stroke:#9c2a2a,color:#fff style E3 fill:#5c1a1a,stroke:#9c2a2a,color:#fff
Preprocessing: Prepare spectra for ML (remove artifacts, normalize, align)
Dimensionality Reduction: PCA or AE to extract the latent representation
Downstream Analysis: Clustering, anomaly detection, or classification in latent space
31. Normalization Strategies
Why Normalize?
Raw spectral intensities depend on dose, dwell time, specimen thickness — not just chemistry
Without normalization, PCA/AE will learn intensity variations, not chemical variations
After dimensionality reduction (PCA or AE), each spectrum is a point \(\mathbf{z}_i \in \mathbb{R}^K\)
UMAP (Uniform Manifold Approximation and Projection) projects \(K\)-dimensional latent codes to 2D for visualization
Preserves both local and global structure (better than t-SNE for this)
Clusters in UMAP space correspond to distinct material phases
Bridges between clusters indicate transition regions (interfaces, diffusion zones)
Outliers flag anomalies or rare phases
44. Discovering New Phases with Anomaly Detection
When the Model Says “I Don’t Know”
Train an autoencoder on spectra from known phases
Apply it to a new sample — most spectra reconstruct well
High reconstruction error pixels indicate spectra that differ from all known phases
Discovery workflow:
Map reconstruction error spatially → anomaly map
Extract spectra from high-error regions
Inspect these spectra manually → identify the unknown phase
Add to the training set and retrain
Example: Discovery of an unexpected intermetallic compound at a diffusion couple interface
The AE was trained on the two base metals
High reconstruction error at the interface revealed a third phase
45. Real-time Signal Monitoring
Closing the Loop: ML During Acquisition
Conventional workflow: Acquire → Store → Analyze offline
ML-enabled workflow: Acquire → Analyze in real time → Adjust acquisition
Applications:
Monitor reconstruction error during scanning — stop early when all phases are mapped
Detect beam damage by tracking spectral changes in real time
Adaptive acquisition: Spend more time on interesting regions, less on uniform areas
Requirements:
Model must be fast enough for real-time inference (PCA: trivially fast; small AE: ~ms per spectrum)
Must handle streaming data (incremental PCA, online AE updates)
The Future Is Adaptive
Combining real-time ML with instrument control creates self-driving microscopes that autonomously explore materials, guided by the scientific questions encoded in the ML model. This is the topic of Unit 10.
Compute PCA and plot the scree plot — determine the intrinsic dimensionality
Reconstruct the data using \(K = 3, 5, 10\) components — compare denoised spectra
Plot PCA score maps and identify the three phases
Part 2: Autoencoder
Build a fully-connected autoencoder with bottleneck size 4
Train on the same dataset — compare reconstruction loss vs. PCA
Visualize the latent space and compare clustering to PCA scores
Introduce an “unknown” phase in a few pixels — use reconstruction error to detect it
Part 3: Denoising
Add Poisson noise at various dose levels — compare PCA and DAE denoising
47. Summary and Key Takeaways
What We Learned Today
Characterization signals (XRD, EELS, EDS, XPS, Raman) are high-dimensional but lie on low-dimensional manifolds
PCA provides fast, linear dimensionality reduction — excellent for denoising and compression, but limited to linear variations
Autoencoders extend PCA to non-linear settings — handling peak shifts, shape changes, and complex backgrounds
Denoising autoencoders explicitly learn to map noisy spectra to clean ones — enabling lower-dose characterization
Masked Autoencoders(He et al. 2022) turn the lab’s \(10^4\)–\(10^5\)unlabelled spectra into a pretrained feature extractor — small-data property prediction with linear probing or LoRA
The latent space is the universal interface: it enables compression, denoising, clustering, anomaly detection, and visualization
Practical pipelines require careful normalization, validation, and robustness to instrumental shifts
Next unit: Unit 10 — Automation in Microscopy (closing the loop)
Neuer et al. (2024): Ch. 5 (Unsupervised Learning)
Further Reading
Kingma & Welling (2014): “Auto-Encoding Variational Bayes” — the original VAE paper
Lee & Seung (1999): “Learning the parts of objects by non-negative matrix factorization” — NMF
McInnes et al. (2018): “UMAP: Uniform Manifold Approximation and Projection” — dimensionality reduction for visualization
Software
HyperSpy: Open-source Python library for multi-dimensional spectral data analysis
scikit-learn: PCA, NMF, K-means implementations
PyTorch / TensorFlow: Autoencoder implementations
He, Kaiming, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. “Masked Autoencoders Are Scalable Vision Learners.”Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16000–16009.
McClarren, Ryan G. 2021. Machine Learning for Engineers: Using Data to Solve Problems for Physical Systems. Springer.
Neuer, Michael et al. 2024. Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications. Springer Nature.
Sandfeld, Stefan et al. 2024. Materials Data Science. Springer.