FAU Erlangen-Nürnberg
What do you actually do with a materials database?
Today’s answer in one line.
Recap — what you already have
Today — Unit 13 in one line
By the end of 90 minutes, you can:
Materials Project (MP) (jain2013materialsproject?)
mp-api, pymatgen. De-facto starting point.OQMD (saal2013materials?)
AFLOW (curtarolo2012aflow?)
NOMAD (draxl2018nomad?)
Pedagogical message: no single database is canonical. They disagree because they use different functionals, different convergence criteria, different relaxation protocols. Cross-database disagreement is itself useful information.
What every entry carries
What is not there
First reflex on every “predicted-stable” claim: predicted stable at 0 K, in vacuum, in an idealised periodic crystal, with one functional.
Definition
\[E_f(C) = E(C) - \sum_i n_i \, \mu_i^{\text{ref}}\]
Reads as
Reference-state choice is not innocent. Allotropes (C, P, S) and magnetic ground states (Mn, Fe) shift \(E_f\) by tens of meV/atom across databases.
Construction
In Li–Co–O (ternary)
The convex-hull construction is composition-space generalisation of “is this lower than the line connecting its neighbours?” In \(n\)-component systems, the hull is an \((n-1)\)-dimensional polytope. The geometric core is unchanged.
Definition
\[E_{\text{hull}}(C) = E_f(C) - E_{\text{hull-line}}(x)\]
The 25–50 meV/atom rule of thumb
Why it is not zero. Kinetic stabilisation, finite-temperature entropy, and DFT error all contribute. A 25 meV/atom phase at 0 K may be the global free-energy minimum at 1500 K.
Three failure modes
Standard hygiene checklist
nelements, nsites, nelements_max.e_above_hull < threshold for stable subset.is_stable for hull entries only.The loop the rest of the unit serves
┌─ database ─→ predict ─→ screen ─→ synthesise ─→ measure ─┐
│ │
└────────────────────── refine ◄────────────────────────────┘
Why the loop motivates uncertainty
What §A gave us
What §B asks
Two candidates from the surrogate
Same mean. Not the same candidate.
Read the difference
A point predictor returns only the mean. The information that decides the prioritisation is thrown away.
The wrong objective
\[\text{minimise} \quad \mathbb{E}[(\hat{y} - y)^2]\]
The right objective
\[\text{maximise} \quad \mathbb{E}\bigl[\text{payoff}(\text{top-}k)\bigr]\]
Costs
Decision
Synthesise iff \(\Pr(\text{success} \mid \mathbf{x}) \cdot c_{\text{miss}} > c_{\text{syn}}\).
Why uncertainty is required
Aleatoric (data noise)
Epistemic (model ignorance)
Discovery acquisition targets epistemic uncertainty. It picks the next candidate where the model is uncertain AND the expected payoff is high.
Recap of §B
§C question
A distribution over functions
\[f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}'))\]
Two slogans
At a test point \(\mathbf{x}_*\)
Shape of \(\sigma\) across the candidate set
What the acquisition function consumes: the shape of \(\sigma\), not just its values. The contrast between low-\(\sigma\) regions (exploitable) and high-\(\sigma\) regions (explorable) drives the next-candidate decision.
Workhorse kernels
Hyperparameters
Length scale = materials-similarity assumption. Short \(\ell\): nearby compositions can have very different properties (rough surface). Long \(\ell\): smooth landscape, neighbours are informative.
Why GPs shine at small \(n\)
Why this matches materials reality
Setting
Posterior reads
Exact GP cost
Conceptual escape routes
For typical campaigns of \(10^2\)–\(10^3\) measurements, exact GPs are fine. Scaling matters when bolting GPs onto massive precomputed databases as a screening surrogate.
§C summary
§D question
Exploitation
Exploration
Acquisition functions trade off the two. A scalar score \(\alpha(\mathbf{x})\) over the candidate set; we maximise it to pick the next point.
Definition
\[\alpha_{\text{EI}}(\mathbf{x}) = \mathbb{E}\bigl[\max(f^* - f(\mathbf{x}), 0)\bigr]\]
Reads as
Definition
\[\alpha_{\text{UCB}}(\mathbf{x}) = \mu(\mathbf{x}) + \beta \, \sigma(\mathbf{x})\]
Reads as
Procedure
That’s it.
Why it is useful
Composition space
Structure space
Default choice: start in composition space. Only graduate to structure space when the chemistry is fixed and polymorph selection is the bottleneck.
The naïve trap
The fix: hull-aware EI
\[\alpha_{\text{hull-EI}}(\mathbf{x}) = \mathbb{E}\bigl[\max(0, E_{\text{hull, current}} - E_{\text{hull}}(\mathbf{x}))\bigr]\]
Definition
\[\alpha_{\text{cost}}(\mathbf{x}) = \frac{\alpha(\mathbf{x})}{c(\mathbf{x})}\]
When it matters
The setup
The acquisition picks twice
The problem
Diversity strategies
§D summary
§E question
Procedure
Strengths and weaknesses
Procedure
Strengths and weaknesses
Procedure
Properties
Decision table
Caveats
Construction
Reads
Recalibrate after every batch. Discovery campaigns shift the input distribution; calibration drifts; bad decisions follow.
§E summary
§F question
Setup
Outcome and lesson
Setup
Outcome and lesson
Setup
Lessons learned
Three honest “don’t”s
Diagnostics that flag these regimes
What Unit 13 leaves you with
What Unit 14 adds
Exercise (90 min, this afternoon)
Reading for next week (Unit 14)
Next week (Unit 14): physics-informed constraints, trust, and discovery governance.
The discovery loop
database → predict → screen → synthesise → measure → refine
The disciplines

© Philipp Pelz - Materials Genomics