Materials Genomics
Unit 8: Neural Networks for Materials Properties

Prof. Dr. Philipp Pelz

FAU Erlangen-Nürnberg

FAU Logo IMN Logo CENEM Logo ERC Logo Eclipse Logo

01. Title: Neural Networks for Materials Properties

  • This unit does not introduce neural networks from scratch; that job belongs to MFML.
  • Here the question is narrower and more practical: when does a feed-forward neural surrogate deserve a place in a Materials Genomics workflow?
  • The focus is application, benchmark discipline, and scientific limits.

02. Learning outcomes

By the end of this unit, students should be able to: - explain when a descriptor-based MLP is a sensible next baseline after ridge and random forest - distinguish raw dataset size from effective sample size in materials problems - discuss single-target and multi-target neural surrogates for materials properties - identify domain shift, extrapolation, and false-confidence failure modes - argue for or against the use of an MLP under a concrete materials benchmark design

03. Recap from Unit 7

  • Unit 7 established that the split defines the scientific claim.
  • We already know how to compare linear and tree-based baselines under grouped evaluation.
  • Unit 8 adds one more model family, but the evaluation discipline remains unchanged.

04. Why add neural surrogates at all?

  • Some structure-property relations remain nonlinear even after careful descriptor design.
  • A neural surrogate can model interactions among features that linear baselines cannot.
  • But extra flexibility only matters if it survives fair comparison under the relevant validation protocol.

So this is not a “deep learning replaces classical ML” unit. It is a “when does extra flexibility pay off?” unit.

05. The boundary of this unit

Unit 8 assumes the representation is already fixed: - composition vectors - engineered structural descriptors - pooled local-environment features - other tabular materials features

We are only changing the predictor, not the representation itself. Learned representations come in Unit 9.

06. Neural surrogates live inside a benchmark stack

For fixed materials features, the relevant benchmark stack is: - ridge as the strong linear baseline - random forest as a robust nonlinear non-neural baseline - MLP as the flexible neural baseline

The scientific question is never whether an MLP can fit a dataset. The question is whether it improves the benchmark in a meaningful and defensible way.

07. Composition-only versus structure-enriched inputs

The same network class behaves differently depending on the input: - with composition-only descriptors, it often interpolates within chemistry families - with structure-enriched descriptors, it can capture richer local and geometric effects - with weak descriptors, extra nonlinearity may mainly amplify noise

This is why model choice cannot be separated from representation quality.

08. Why not start with the MLP by default?

  • materials datasets are often small relative to model flexibility
  • many examples are chemically correlated, so the true amount of independent information is limited
  • descriptor pipelines may already encode much of the useful structure
  • weak validation makes neural models look stronger than they are

An MLP is therefore not the default. It is a hypothesis to test.

09. When an MLP becomes plausible

An MLP is plausible when: - simple baselines show systematic underfitting - descriptor interactions appear important - the dataset is diverse enough to support additional flexibility - the use case benefits from a fast nonlinear surrogate

The decision is empirical, but it should be grounded in the data regime and the deployment goal.

10. Neural surrogates as stand-ins for expensive simulation

In Materials Genomics, an MLP often acts as a surrogate for: - expensive DFT calculations - repeated property evaluation in screening loops - process-property mappings where simulation or experiment is costly

This changes the evaluation lens. We care not only about low error, but about whether wrong predictions are likely in the parts of space where we would use the surrogate.

11. Architecture choice in limited-data materials tasks

For fixed descriptor inputs, architecture choice is often modest: - one to a few hidden layers - moderate width - output head chosen for one or multiple properties

Very large networks are rarely justified here. Limited data and high feature correlation usually favor smaller MLPs over deep architectures.

12. Capacity should match effective sample size

Raw row count can be misleading. A dataset with many close chemical relatives may contain far less independent information than it appears.

If capacity is chosen according to nominal dataset size rather than effective sample size, the MLP is likely to overfit family-specific patterns that do not transfer.

13. Effective sample size in materials datasets

Materials datasets often contain: - near-duplicate structures - many compounds within one chemistry family - polymorph variations of the same system - entries produced by the same workflow and reference data

This means that ten thousand rows may behave statistically more like a much smaller dataset once correlations are respected.

14. Example: why random splits flatter MLPs

  • Random splits place chemically similar materials in both train and test sets.
  • A smooth neural surrogate can then interpolate between near neighbors extremely well.
  • The resulting score may look like discovery performance even though the model has never faced a genuinely new family.

This is why grouped evaluation matters even more for neural surrogates than for simpler baselines.

15. Single-target prediction

The simplest setup is one network for one property:

x -> MLP -> y

This is appropriate when: - the target is scientifically central - coupled outputs are weak or irrelevant - interpretability of task definition matters more than shared output structure

Many materials benchmarks should start here.

16. Multi-target prediction

A multi-target network predicts several properties from one shared hidden representation:

x -> shared trunk -> (y_1, y_2, ..., y_k)

This can help when the targets share physical drivers, but it can also hurt if unrelated targets force the representation to compromise.

17. Example: when multi-target learning helps

  • Elastic properties such as bulk and shear moduli share mechanical structure.
  • A shared hidden representation may improve data efficiency because the targets are not independent.

But this benefit is not automatic. It depends on the degree of shared signal in the data.

18. Example: when multi-target learning hurts

  • Band gap and ionic conductivity may depend on overlapping chemistry, but often through different mechanisms.
  • Forcing one shared representation can create negative transfer.

So multitask learning is not a generic upgrade. It is a materials hypothesis that needs evidence.

19. Target scaling and framing still matter

  • Some targets span a narrow range; others span orders of magnitude.
  • Log transforms, normalization, and family-aware framing can change what the network learns easily.
  • The transformed target must still preserve the scientific meaning of the prediction task.

A neural surrogate fits the target formulation we give it, not the idealized property in our heads.

20. Fair comparison to ridge and random forest

The MLP must be compared under: - the same grouped split - the same feature preprocessing logic - the same target transformation - the same evaluation metrics

Anything less turns model comparison into an artifact of protocol differences.

21. Why grouped splits matter more for MLPs

  • High-capacity models exploit local continuity in feature space very effectively.
  • If the test set contains near neighbors of the training set, the MLP can look much stronger than it is.
  • Under grouped chemistry-aware or prototype-aware splits, the apparent advantage may shrink or disappear.

That is exactly the evidence we need to judge whether the model is useful.

22. Worked example: band-gap prediction

Consider a band-gap benchmark using descriptor vectors: - ridge gives a stable baseline - random forest captures some nonlinear effects - an MLP may beat both under a random split

The key question is whether the gain survives a grouped chemistry-aware split. If not, the neural advantage is mostly in-domain interpolation.

23. Cross-database domain shift

Neural surrogates trained on one dataset can fail on another because: - the DFT functional changes - relaxation settings change - curation rules differ - chemistry coverage shifts

This matters because a model may partly learn the conventions of a database rather than a transportable structure-property rule.

24. Extrapolation to unseen chemistry

MLPs are powerful interpolators, but they are usually unreliable extrapolators.

In materials discovery, the practical question is often whether the model can say something useful about a chemistry family not represented in training. That question is much harder than random-split evaluation suggests.

25. False confidence and smooth wrong answers

  • A plain MLP usually returns a confident-looking number even when the input lies far outside training support.
  • The prediction may be smooth, stable, and completely misleading.
  • This is dangerous in screening because expensive follow-up work may be allocated to cases where the model has no real support.

The core risk is not only error; it is error without warning.

26. Calibration limits of plain neural surrogates

Without explicit uncertainty modeling, an MLP gives point predictions, not trustworthy confidence estimates.

That means: - low average error does not imply calibrated trust - some domains may be much less reliable than others - uncertainty must be handled later with dedicated methods, not assumed from the neural architecture itself

27. When random forest still wins

For small or irregular tabular materials datasets, random forest may remain preferable: - strong low-data performance - reduced sensitivity to scaling - less fragile tuning - easier interpretation of failure patterns

Unit 8 should make this explicit so the lecture does not collapse into “NNs are the future.”

28. A strong use case for MLPs

A good MLP use case looks like: - structure-enriched or descriptor-rich features - evidence of nonlinear interactions between descriptors - enough diversity to support the fit - a grouped validation protocol aligned with deployment - a need for fast repeated prediction

Under these conditions, the MLP becomes a justified surrogate rather than a fashionable choice.

29. Hybrid value of physically informed descriptors

One strength of Materials Genomics is that physically informed features can be combined with flexible nonlinear predictors.

This hybrid strategy often works well because: - domain knowledge enters through the representation - nonlinear interactions are still modeled - data demands remain lower than for end-to-end representation learning

It is often the right intermediate step before moving to learned representations.

30. Example: local-environment descriptors plus MLP

  • Unit 6 built local-environment descriptors.
  • Unit 7 established how to benchmark predictors honestly.
  • Unit 8 now asks whether a small MLP on pooled local-environment features captures interactions that ridge misses.

This is the exact course-level transition the lecture should make visible.

31. When not to use an MLP

Do not choose the MLP when: - simpler baselines already match its performance under the right split - the data regime is too small or too correlated - deployment requires extrapolation beyond training support - interpretability is central and the accuracy gain is negligible

Complexity needs a scientific return.

32. Trust checklist for neural surrogates

Before using a descriptor-based MLP in practice, ask: - does it beat strong baselines under the deployment-relevant split? - do residuals remain acceptable in important chemistry families? - is there evidence against leakage and dataset artifacts? - does it degrade gracefully under shift? - is the deployment domain close enough to training support?

If these questions cannot be answered, the model is not ready.

33. Summary

  • In Materials Genomics, an MLP is one candidate surrogate model for fixed features.
  • Its value depends on benchmark discipline, effective sample size, and deployment alignment.
  • The main risks are weak splits, domain shift, extrapolation, and false confidence.
  • The right baseline comparison is part of the science, not an appendix.

34. Bridge to Unit 9

  • Unit 8 still treats the representation as fixed.
  • Unit 9 changes the game by asking how representations themselves can be discovered or learned from data.
  • That is the natural next step once fixed-feature neural surrogates reach their limit.