FAU Erlangen-Nürnberg
Which pretrained crystal representation do we trust, on which downstream task?
What this unit is not.
Recap — what we already have
Today — Unit 10 in one line
By the end of 90 minutes, you can:
Restated from MFML W9
The materials-specific question for today
Restated from MFML W9
Why this matters for §F
Restated from MFML W9
The materials-specific question for §D
Restated from MFML W9
The materials-specific question for §E
An image
A crystal
Consequence: copy-pasting a vision encoder onto crystals discards the inductive biases that make crystals tractable. Every crystal-specific architecture (CGCNN, SchNet, M3GNet, MACE) is built around the right priors instead.
What a learned crystal embedding should know
Where this knowledge enters
What a learned crystal embedding should know
Connection to MG U6
The PBC requirement
The free positive pair
Symmetries to respect
Where it comes from in 2026
Default in 2026: equivariant architecture for the encoder backbone; augmentations on top for contrastive pretraining. Both routes, in the same model.
Database snapshot, SS26
| Database | Structures (approx.) | Labels |
|---|---|---|
| Materials Project (jain2013materialsproject?) | 1.5 M | DFT energies; some band gaps, elastic |
| OQMD (saal2013oqmd?) | 1.0 M | Formation energies |
| AFLOW (curtarolo2012aflow?) | 3.5 M | DFT energies, mostly intermetallics |
| NOMAD (draxl2018nomad?) | 19 M (entries) | Heterogeneous, multi-source |
The asymmetry that matters
Three knobs
Choosing \(T\) defines the pretext task. §C3–C6 walk through the four standard choices for crystal data.
The task
What it teaches the encoder
The task
When this is the right pretext
The task
The pleasant surprise
The task
Where this gets interesting
The 2018 baseline
Why it matters in 2026
M3GNet (chen2022m3gnet?)
MACE-MP-0 (batatia2024macemp?)
The standard 2026 recipe
What to compare against
Pretraining helps most when
Pretraining helps least when
Empirical pattern (2023–2026 literature): the small-data, in-distribution regime is where foundation embeddings dominate. Outside that regime, Magpie or SOAP often catch up — and they cost orders of magnitude less.
The pattern, restated
What is materials-specific
Four standard moves
Why each is valid
Three ways to get this wrong
Why this matters
In-batch negatives
Batch size matters
The signal asymmetry
Mining strategies
InfoNCE
\[\mathcal{L} = -\log \dfrac{\exp(\text{sim}(z_a, z_p)/\tau)}{\sum_{j} \exp(\text{sim}(z_a, z_j)/\tau)}\]
Triplet
\[\mathcal{L} = \max\bigl(0, \, d(z_a, z_p) - d(z_a, z_n) + m\bigr)\]
Notable systems (2022–2025)
Headline results
The retrieval task
Why retrieval beats t-SNE for diagnostics
Borrowed from NLP/Vision
The materials version, in 2026
Three lanes
No lane subsumes the others
MatBERT in one slide
Strengths and weaknesses
Models
Use cases
M3GNet (chen2022m3gnet?)
Strengths and weaknesses
MACE-MP-0 (batatia2024macemp?)
The universal-MLIP era
OMat24 (barrosoluque2024omat24?)
Why this matters
GNoME (merchant2023gnome?)
The cautious read
The workflow
Why this works at small N
| Downstream task | First choice | Backup |
|---|---|---|
| Crystal structure → property (in-dist.) | M3GNet / MACE-MP frozen + linear probe | SOAP + GP |
| Crystal structure → property (OOD chem.) | MACE-MP fine-tuned | SOAP + GP |
| Molecular property (organic) | MoLFormer / ChemFormer | Morgan fingerprint + RF |
| Literature mining / abstract classification | MatBERT | SciBERT |
| Discovery candidate ranking | OMat24-class encoder + active learning | GNoME-style pipeline |
No model dominates all rows. The decision is task-driven, not hype-driven (Sandfeld et al. 2024).
The question that matters
Does this embedding contain information about the property I care about?
The question that does NOT matter (alone)
Does this t-SNE plot look pretty?
Protocol
The four comparisons that matter
| Probe input | What it tests |
|---|---|
| Pretrained \(\mathcal{E}\) | The embedding |
| Random-init \(\mathcal{E}\) | Did pretraining help? |
| Magpie / matminer | Engineered baseline |
| SOAP | Engineered structural baseline |
Without the random-init comparison, you cannot tell what the pretraining contributed. This is the most-omitted comparison in published work (Sandfeld et al. 2024).
Protocol
Why retrieval beats t-SNE
Protocol
What “good” looks like
The anti-pattern
Diagnosis
The symmetric pattern
Diagnosis
Foundation embedding wins when
Engineered baseline wins when
2026 honesty: “always use the foundation model” is wrong. The right answer is task-driven (Sandfeld et al. 2024; Neuer et al. 2024).
Beyond the encoder: what do the axes mean?
The interpretation question, in one example
If the embedding’s first principal axis correlates with mean atomic mass, what does the second axis correlate with?
Note
After the 2026-05-13 realignment, this is the natural home for latent-space interpretation: the standalone U11 latent-spaces deck is now optional supplementary reading. The diagnostic discipline of §F applies here — interpretation needs a known-good embedding to begin.
U10 → U12
The discovery loop, in one diagram
You should be able to
You should also be able to

© Philipp Pelz - Materials Genomics