Materials Genomics
Unit 8: Neural Networks for Materials Properties

Prof. Dr. Philipp Pelz

FAU Erlangen-Nürnberg

This unit does not introduce neural networks from scratch; that job belongs to MFML.
Here the question is narrower and more practical: when does a feed-forward neural surrogate deserve a place in a Materials Genomics workflow?
The focus is application, benchmark discipline, and scientific limits. :::

02. Learning outcomes

By the end of this unit, students should be able to:

explain when a descriptor-based MLP is a sensible next baseline after ridge and random forest
distinguish raw dataset size from effective sample size in materials problems
discuss single-target and multi-target neural surrogates for materials properties
identify domain shift, extrapolation, and false-confidence failure modes
argue for or against the use of an MLP under a concrete materials benchmark design

03. Recap from Unit 7

Unit 7 established that the split defines the scientific claim.
We already know how to compare linear and tree-based baselines under grouped evaluation.
Unit 8 adds one more model family, but the evaluation discipline remains unchanged.

04. Why add neural surrogates at all?

Some structure-property relations remain nonlinear even after careful descriptor design.
A neural surrogate can model interactions among features that linear baselines cannot.
But extra flexibility only matters if it survives fair comparison under the relevant validation protocol.

So this is not a “deep learning replaces classical ML” unit. It is a “when does extra flexibility pay off?” unit.

05. The boundary of this unit

Unit 8 assumes the representation is already fixed:

composition vectors
engineered structural descriptors
pooled local-environment features
other tabular materials features

We are only changing the predictor, not the representation itself. Learned representations come in Unit 9.

06. Neural surrogates live inside a benchmark stack

For fixed materials features, the relevant benchmark stack is:

ridge as the strong linear baseline
random forest as a robust nonlinear non-neural baseline
MLP as the flexible neural baseline

The scientific question is never whether an MLP can fit a dataset. The question is whether it improves the benchmark in a meaningful and defensible way.

07. Composition-only versus structure-enriched inputs

The same network class behaves differently depending on the input:

with composition-only descriptors, it often interpolates within chemistry families
with structure-enriched descriptors, it can capture richer local and geometric effects
with weak descriptors, extra nonlinearity may mainly amplify noise

This is why model choice cannot be separated from representation quality.

08. Why not start with the MLP by default?

materials datasets are often small relative to model flexibility
many examples are chemically correlated, so the true amount of independent information is limited
descriptor pipelines may already encode much of the useful structure
weak validation makes neural models look stronger than they are

An MLP is therefore not the default. It is a hypothesis to test.

09. When an MLP becomes plausible

An MLP is plausible when:

simple baselines show systematic underfitting
descriptor interactions appear important
the dataset is diverse enough to support additional flexibility
the use case benefits from a fast nonlinear surrogate

The decision is empirical, but it should be grounded in the data regime and the deployment goal.

10. Neural surrogates as stand-ins for expensive simulation

In Materials Genomics, an MLP often acts as a surrogate for:

expensive DFT calculations
repeated property evaluation in screening loops
process-property mappings where simulation or experiment is costly

This changes the evaluation lens. We care not only about low error, but about whether wrong predictions are likely in the parts of space where we would use the surrogate.

11. Architecture choice in limited-data materials tasks

For fixed descriptor inputs, architecture choice is often modest:

one to a few hidden layers
moderate width
output head chosen for one or multiple properties

Very large networks are rarely justified here. Limited data and high feature correlation usually favor smaller MLPs over deep architectures.

12. Capacity should match effective sample size

Raw row count can be misleading. A dataset with many close chemical relatives may contain far less independent information than it appears.

If capacity is chosen according to nominal dataset size rather than effective sample size, the MLP is likely to overfit family-specific patterns that do not transfer.

13. Effective sample size in materials datasets

Materials datasets often contain:

near-duplicate structures
many compounds within one chemistry family
polymorph variations of the same system
entries produced by the same workflow and reference data

This means that ten thousand rows may behave statistically more like a much smaller dataset once correlations are respected.

14. Example: why random splits flatter MLPs

Random splits place chemically similar materials in both train and test sets.
A smooth neural surrogate can then interpolate between near neighbors extremely well.
The resulting score may look like discovery performance even though the model has never faced a genuinely new family.

This is why grouped evaluation matters even more for neural surrogates than for simpler baselines.

15. Single-target prediction

The simplest setup is one network for one property:

\[\mathbf{x} \to \text{MLP} \to \mathbf{y}\]

This is appropriate when:

the target is scientifically central
coupled outputs are weak or irrelevant
interpretability of task definition matters more than shared output structure

Many materials benchmarks should start here.

16. Multi-target prediction

A multi-target network predicts a vector \(\mathbf{y} = [y_1, y_2, \dots, y_K]^\top\) from one shared hidden representation:

Single-target:

graph LR
  X(\mathbf{x}) --> MLP(MLP) --> Y(\mathbf{y})

Multi-target:

graph LR
  X(\mathbf{x}) --> Trunk(Shared Trunk)
  Trunk --> Y1(y_1)
  Trunk --> Y2(y_2)
  Trunk --> Y3(y_K)

This can help when the targets share physical drivers, but it can also hurt if unrelated targets force the representation to compromise.

17. Example: when multi-target learning helps

Elastic properties such as bulk and shear moduli share mechanical structure.
A shared hidden representation may improve data efficiency because the targets are not independent.

But this benefit is not automatic. It depends on the degree of shared signal in the data.

18. Example: when multi-target learning hurts

Band gap and ionic conductivity may depend on overlapping chemistry, but often through different mechanisms.
Forcing one shared representation can create negative transfer.

So multitask learning is not a generic upgrade. It is a materials hypothesis that needs evidence.

19. Target scaling and framing still matter

Some targets span a narrow range; others span orders of magnitude.
Log transforms, normalization, and family-aware framing can change what the network learns easily.
The transformed target must still preserve the scientific meaning of the prediction task.

A neural surrogate fits the target formulation we give it, not the idealized property in our heads.

20. Fair comparison to ridge and random forest

The MLP must be compared under: - the same grouped split - the same feature preprocessing logic - the same target transformation - the same evaluation metrics

Anything less turns model comparison into an artifact of protocol differences.

21. Why grouped splits matter more for MLPs

High-capacity models exploit local continuity in feature space very effectively.
If the test set contains near neighbors of the training set, the MLP can look much stronger than it is.
Under grouped chemistry-aware or prototype-aware splits, the apparent advantage may shrink or disappear.

That is exactly the evidence we need to judge whether the model is useful.

22. Worked example: band-gap prediction

Consider a band-gap benchmark using descriptor vectors: - ridge gives a stable baseline - random forest captures some nonlinear effects - an MLP may beat both under a random split

The key question is whether the gain survives a grouped chemistry-aware split. If not, the neural advantage is mostly in-domain interpolation.

23. Cross-database domain shift

Neural surrogates trained on one dataset can fail on another because: - the DFT functional changes - relaxation settings change - curation rules differ - chemistry coverage shifts

This matters because a model may partly learn the conventions of a database rather than a transportable structure-property rule.

24. Extrapolation to unseen chemistry

MLPs are powerful interpolators, but they are usually unreliable extrapolators.

In materials discovery, the practical question is often whether the model can say something useful about a chemistry family not represented in training. That question is much harder than random-split evaluation suggests.

25. False confidence and smooth wrong answers

A plain MLP usually returns a confident-looking number even when the input lies far outside training support.
The prediction may be smooth, stable, and completely misleading.
This is dangerous in screening because expensive follow-up work may be allocated to cases where the model has no real support.

The core risk is not only error; it is error without warning.

26. Calibration limits of plain neural surrogates

Without explicit uncertainty modeling, an MLP gives point predictions, not trustworthy confidence estimates.

That means: - low average error does not imply calibrated trust - some domains may be much less reliable than others - uncertainty must be handled later with dedicated methods, not assumed from the neural architecture itself

27. When random forest still wins

For small or irregular tabular materials datasets, RF may remain preferable: - strong low-data performance - reduced sensitivity to scaling - less fragile tuning - easier interpretation

Unit 8 should make this explicit so the lecture does not collapse into “NNs are the future.”

28. A strong use case for MLPs

A good MLP use case looks like: - structure-enriched or descriptor-rich features - evidence of nonlinear interactions between descriptors - enough diversity to support the fit - a grouped validation protocol aligned with deployment - a need for fast repeated prediction

Under these conditions, the MLP becomes a justified surrogate rather than a fashionable choice.

29. Hybrid value of physically informed descriptors

One strength of Materials Genomics is that physically informed features can be combined with flexible nonlinear predictors.

This hybrid strategy often works well because: - domain knowledge enters through the representation - nonlinear interactions are still modeled - data demands remain lower than for end-to-end representation learning

It is often the right intermediate step before moving to learned representations.

30. Example: local-environment descriptors plus MLP

Unit 6 built local-environment descriptors.
Unit 7 established how to benchmark predictors honestly.
Unit 8 now asks whether a small MLP on pooled local-environment features captures interactions that ridge misses.

This is the exact course-level transition the lecture should make visible.

31. When not to use an MLP

Do not choose the MLP when: - simpler baselines already match its performance under the right split - the data regime is too small or too correlated - deployment requires extrapolation beyond training support - interpretability is central and the accuracy gain is negligible

Complexity needs a scientific return.

32. Trust checklist for neural surrogates

Before using a descriptor-based MLP in practice, ask:

does it beat strong baselines under the deployment-relevant split?

do residuals remain acceptable in important chemistry families?

is there evidence against leakage and dataset artifacts?

does it degrade gracefully under shift?

is the deployment domain close enough to training support?

If these questions cannot be answered, the model is not ready.

33. Summary

In Materials Genomics, an MLP is one candidate surrogate model for fixed features.
Its value depends on benchmark discipline, effective sample size, and deployment alignment.
The main risks are weak splits, domain shift, extrapolation, and false confidence.
The right baseline comparison is part of the science, not an appendix.

34. Bridge to Unit 9

Unit 8 still treats the representation as fixed.
Unit 9 changes the game by asking how representations themselves can be discovered or learned from data.
That is the natural next step once fixed-feature neural surrogates reach their limit.

Example Notebook

Week 8: Regression & Generalization — NanoindentationDataset

Open rendered notebook →

Materials GenomicsUnit 8: Neural Networks for Materials Properties

02. Learning outcomes

03. Recap from Unit 7

04. Why add neural surrogates at all?

05. The boundary of this unit

06. Neural surrogates live inside a benchmark stack

07. Composition-only versus structure-enriched inputs

08. Why not start with the MLP by default?

09. When an MLP becomes plausible

10. Neural surrogates as stand-ins for expensive simulation

11. Architecture choice in limited-data materials tasks

12. Capacity should match effective sample size

13. Effective sample size in materials datasets

14. Example: why random splits flatter MLPs

15. Single-target prediction

16. Multi-target prediction

17. Example: when multi-target learning helps

18. Example: when multi-target learning hurts

19. Target scaling and framing still matter

20. Fair comparison to ridge and random forest

21. Why grouped splits matter more for MLPs

22. Worked example: band-gap prediction

23. Cross-database domain shift

24. Extrapolation to unseen chemistry

25. False confidence and smooth wrong answers

26. Calibration limits of plain neural surrogates

27. When random forest still wins

28. A strong use case for MLPs

29. Hybrid value of physically informed descriptors

30. Example: local-environment descriptors plus MLP

31. When not to use an MLP

32. Trust checklist for neural surrogates

33. Summary

34. Bridge to Unit 9

Example Notebook

Materials Genomics
Unit 8: Neural Networks for Materials Properties