Materials Genomics
Unit 8: Neural Networks for Materials Properties
FAU Erlangen-Nürnberg
By the end of this unit, students should be able to:
So this is not a “deep learning replaces classical ML” unit. It is a “when does extra flexibility pay off?” unit.
Unit 8 assumes the representation is already fixed:
We are only changing the predictor, not the representation itself. Learned representations come in Unit 9.
For fixed materials features, the relevant benchmark stack is:
The scientific question is never whether an MLP can fit a dataset. The question is whether it improves the benchmark in a meaningful and defensible way.
The same network class behaves differently depending on the input:
This is why model choice cannot be separated from representation quality.
An MLP is therefore not the default. It is a hypothesis to test.
An MLP is plausible when:
The decision is empirical, but it should be grounded in the data regime and the deployment goal.
In Materials Genomics, an MLP often acts as a surrogate for:
This changes the evaluation lens. We care not only about low error, but about whether wrong predictions are likely in the parts of space where we would use the surrogate.
For fixed descriptor inputs, architecture choice is often modest:
Very large networks are rarely justified here. Limited data and high feature correlation usually favor smaller MLPs over deep architectures.
Raw row count can be misleading. A dataset with many close chemical relatives may contain far less independent information than it appears.
If capacity is chosen according to nominal dataset size rather than effective sample size, the MLP is likely to overfit family-specific patterns that do not transfer.
Materials datasets often contain:
This means that ten thousand rows may behave statistically more like a much smaller dataset once correlations are respected.
This is why grouped evaluation matters even more for neural surrogates than for simpler baselines.
The simplest setup is one network for one property:
\[\mathbf{x} \to \text{MLP} \to \mathbf{y}\]
This is appropriate when:
Many materials benchmarks should start here.
A multi-target network predicts a vector \(\mathbf{y} = [y_1, y_2, \dots, y_K]^\top\) from one shared hidden representation:
This can help when the targets share physical drivers, but it can also hurt if unrelated targets force the representation to compromise.
But this benefit is not automatic. It depends on the degree of shared signal in the data.
So multitask learning is not a generic upgrade. It is a materials hypothesis that needs evidence.
A neural surrogate fits the target formulation we give it, not the idealized property in our heads.
The MLP must be compared under: - the same grouped split - the same feature preprocessing logic - the same target transformation - the same evaluation metrics
Anything less turns model comparison into an artifact of protocol differences.
That is exactly the evidence we need to judge whether the model is useful.
Consider a band-gap benchmark using descriptor vectors: - ridge gives a stable baseline - random forest captures some nonlinear effects - an MLP may beat both under a random split
The key question is whether the gain survives a grouped chemistry-aware split. If not, the neural advantage is mostly in-domain interpolation.
Neural surrogates trained on one dataset can fail on another because: - the DFT functional changes - relaxation settings change - curation rules differ - chemistry coverage shifts
This matters because a model may partly learn the conventions of a database rather than a transportable structure-property rule.
MLPs are powerful interpolators, but they are usually unreliable extrapolators.
In materials discovery, the practical question is often whether the model can say something useful about a chemistry family not represented in training. That question is much harder than random-split evaluation suggests.
The core risk is not only error; it is error without warning.
Without explicit uncertainty modeling, an MLP gives point predictions, not trustworthy confidence estimates.
That means: - low average error does not imply calibrated trust - some domains may be much less reliable than others - uncertainty must be handled later with dedicated methods, not assumed from the neural architecture itself
For small or irregular tabular materials datasets, RF may remain preferable: - strong low-data performance - reduced sensitivity to scaling - less fragile tuning - easier interpretation
Unit 8 should make this explicit so the lecture does not collapse into “NNs are the future.”
A good MLP use case looks like: - structure-enriched or descriptor-rich features - evidence of nonlinear interactions between descriptors - enough diversity to support the fit - a grouped validation protocol aligned with deployment - a need for fast repeated prediction
Under these conditions, the MLP becomes a justified surrogate rather than a fashionable choice.
One strength of Materials Genomics is that physically informed features can be combined with flexible nonlinear predictors.
This hybrid strategy often works well because: - domain knowledge enters through the representation - nonlinear interactions are still modeled - data demands remain lower than for end-to-end representation learning
It is often the right intermediate step before moving to learned representations.
This is the exact course-level transition the lecture should make visible.
Do not choose the MLP when: - simpler baselines already match its performance under the right split - the data regime is too small or too correlated - deployment requires extrapolation beyond training support - interpretability is central and the accuracy gain is negligible
Complexity needs a scientific return.
Before using a descriptor-based MLP in practice, ask:
If these questions cannot be answered, the model is not ready.
Week 8: Regression & Generalization — NanoindentationDataset

© Philipp Pelz - Materials Genomics