Machine Learning in Materials Processing &amp; Characterization

Philipp Pelz

Abstract

This course teaches how machine learning can be applied to experimental data from materials processing and characterization. The focus lies on images, spectra, time-series, and processing parameters, and on understanding how physical data formation interacts with learning algorithms. Students learn to build robust, uncertainty-aware ML pipelines for real experimental workflows, avoiding common pitfalls such as data leakage, overfitting, and spurious correlations.

ECLIPSE Lab Teaching

Machine Learning in Materials Processing & Characterization

Application-focused course on ML for experimental materials data, from images and spectra to process signals.

Semester

Summer Semester 2026

Format

2h lecture + 2h exercises

Credits

5 ECTS

Audience

Students in Materials Science, data science, and computational engineering

Prerequisites

Helpful: Mathematical Foundations of AI & ML or equivalent background

StudOn GitHub / Materials All Teaching KI in Materialtechnologie

How to use this course site. Use this page as the central hub for syllabus, lecture structure, reading, notebooks, and course materials. Formal announcements and enrollment remain on StudOn; code and openly shared resources live in the linked GitHub repository.

1 Machine Learning in Materials Processing & Characterization

5th Semester – 5 ECTS · 2h lecture + 2h exercises per week
Coordinated with “Mathematical Foundations of AI & ML” (MFML)
and “Materials Genomics” (MG)

1.1 Synergy Map

MFML provides the mathematical spine: loss functions, neural networks, generalization, uncertainty, Gaussian Processes.
This course (ML-PC) applies these concepts to experimental data: images, spectra, and processing signals.
Materials Genomics focuses on crystal structures, databases, and discovery.

ML-PC is therefore application-driven, not algorithm-driven.

1.2 Companion books

Sandfeld (2024): Materials Data Science

1.3 Week-by-Week Curriculum (14 weeks)

1.3.1 Unit I — Experimental Data as a Learning Problem (Weeks 1–3)

1.3.1.1 Week 1 – What makes materials data special?

Lecture: Tuesday, 14.04.2026, 14:15-15:45 | Exercise: Thursday, 16.04.2026, 16:15-17:45

Types of experimental data: micrographs, EBSD, EDS, EELS, XRD, process logs, thermal histories.
PSPP (Processing–Structure–Property–Performance) as a data dependency graph.
Why ML failure modes are common in experimental science.

Summary: This unit introduces the transition from classical physics-based modeling to data-driven discovery in materials science. We explore the unique challenges of experimental materials data, including its multi-modal nature, high acquisition cost, and the fundamental Processing-Structure-Property-Performance (PSPP) relationships. Key concepts include data scales, measurement uncertainty, and the CRISP-DM process adapted for scientific workflows.

Exercise:
Inspect real microscopy and process datasets; identify sources of bias and noise.

1.3.1.2 Week 2 – Physics of data formation

Lecture: Tuesday, 21.04.2026, 14:15-15:45 | Exercise: Thursday, 23.04.2026, 16:15-17:45

Image and signal formation in characterization: resolution, contrast, artifacts.
Sampling, aliasing, noise as physical priors (not preprocessing tricks).
Relation to MFML refresher on PCA and covariance.

Summary: This unit bridges the gap between the physical process of data acquisition and the mathematical tools used to describe it. We analyze how signals are formed in characterization tools and how physical constraints (resolution, noise, sampling) act as priors for learning. We then introduce Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as fundamental techniques for discovering low-dimensional structure in high-dimensional experimental datasets.

Exercise:
Fourier inspection of micrographs; effects of sampling and filtering.

1.3.1.3 Week 3 – Data quality, labels, and leakage

Lecture: Tuesday, 28.04.2026, 14:15-15:45 | Exercise: Thursday, 30.04.2026, 16:15-17:45

Annotation uncertainty and inter-annotator variance.
Train/test leakage in materials workflows.
Why “good accuracy” often means a broken pipeline.

Summary: This unit focuses on the most critical and often overlooked part of the ML pipeline: data integrity. We discuss systematic data cleaning and normalization techniques while highlighting the unique challenges of labeling experimental materials data, such as inter-annotator variance. A major focus is on Data Leakage, specifically how spatial and physical correlations in materials samples can lead to deceptively high model performance. We introduce robust validation strategies to ensure models generalize to truly unseen data.

Exercise:
Construct a deliberately flawed ML pipeline and diagnose its failure.

1.3.2 Unit II — Representation Learning for Microstructures (Weeks 4–6)

(Aligned with early neural networks in MFML)

1.3.2.1 Week 4 – From classical microstructure metrics to learned representations

Lecture: Tuesday, 05.05.2026, 14:15-15:45 | Exercise: Thursday, 07.05.2026, 16:15-17:45

Grain size, phase fractions, orientation maps.
Limits of hand-crafted microstructure features.
Transition to learned representations.

Summary: This unit marks the transition from classical, hand-crafted microstructure quantification (like grain size and phase fractions) to the modern paradigm of learned representations. We first review traditional stereological metrics and their limitations in capturing complex structural nuances. We then introduce the foundational unit of modern ML: the artificial neuron. By understanding weights, biases, and non-linear activation functions, we build the framework for Multi-Layer Perceptrons (MLPs) that can automatically learn optimal features from materials data.

Exercise:
Compare classical features vs simple NN-based features for microstructure tasks.

1.3.2.2 Week 5 – Neural networks for microstructure images

Lecture: Tuesday, 12.05.2026, 14:15-15:45 | Exercise: Thursday, 14.05.2026, 16:15-17:45 (cancelled - Himmelfahrt)

CNN intuition: filters as structure detectors.
Example tasks: phase segmentation, defect detection, porosity identification.
Overfitting risks with small datasets.

Summary: This unit introduces Convolutional Neural Networks (CNNs), the workhorse of modern computer vision, and applies them to materials characterization. We explore how convolutions allow networks to automatically learn hierarchical structure detectors—from simple edges to complex phase morphologies—while drastically reducing the number of parameters compared to standard MLPs. Through case studies in phase segmentation and defect detection, students learn the intuition behind filters, pooling, and the unique challenges of applying deep learning to high-resolution, noisy experimental micrographs.

Exercise:
Train a small CNN on microstructure images; analyze failure cases.

1.3.2.3 Week 6 – Data scarcity & transfer learning

Lecture: Tuesday, 19.05.2026, 14:15-15:45 | Exercise: Thursday, 21.05.2026, 16:15-17:45

Why materials datasets are small.
Transfer learning from natural images vs self-supervised pretraining.
When transfer learning helps—and when it does not.

Summary: This unit addresses the fundamental bottleneck of materials informatics: Data Scarcity. We explore how to build powerful deep learning models when only a few hundred labeled images or signals are available. The core focus is on Transfer Learning, where we leverage knowledge from models pretrained on millions of natural images to accelerate learning and improve generalization on materials tasks. We also cover Data Augmentation strategies tailored for scientific data and discuss when and why transferring knowledge across different physical domains succeeds or fails.

Exercise:
Fine-tune a pretrained model; compare against training from scratch.

1.3.3 Unit III — Learning from Processing Data (Weeks 7–9)

1.3.3.1 Week 7 – Time-series and process monitoring

Lecture: Tuesday, 26.05.2026, 14:15-15:45 | Exercise: Thursday, 28.05.2026, 16:15-17:45

Processing signals: temperature cycles, AM melt pool signals, SPS, rolling.
Regression and sequence models as surrogates.
Relation to MFML concepts of generalization.

Summary: This unit explores the application of machine learning to Time-Series Data, specifically for monitoring and predicting materials processing outcomes. We introduce Recurrent Neural Networks (RNNs) and their advanced variants like LSTMs, which are designed to handle sequential dependencies. We discuss the critical preprocessing steps of signal smoothing and triggering required to handle noisy experimental logs. Through case studies in additive manufacturing and process stability, students learn how to build models that “remember” the processing history to predict future states and detect anomalies in real-time.

Exercise:
Predict a process outcome from time-series data using regression or simple RNNs.

1.3.3.2 Week 8 – Generalization, robustness, and process windows

Lecture: Tuesday, 02.06.2026, 14:15-15:45 | Exercise: Thursday, 04.06.2026, 16:15-17:45 (cancelled - Fronleichnam)

Sensitivity to noise and parameter drift.
Overfitting in process–property models.
Robustness as a design criterion.

Summary: This unit shifts the focus from model performance to Model Reliability. We explore the Bias-Variance tradeoff and the fundamental challenge of generalization—ensuring that an ML model works on new, unseen data from the factory floor. We introduce robust validation techniques like K-Fold and Stratified Cross-Validation to stabilize performance estimates on small materials datasets. A key focus is on Process Robustness, where we use sensitivity analysis to identify “Process Windows”—regions in parameter space where material quality is maximized and insensitive to industrial noise.

Exercise:
Analyze model robustness under perturbed process conditions.

1.3.3.3 Week 9 – Inverse problems and process maps

Lecture: Tuesday, 09.06.2026, 14:15-15:45 | Exercise: Thursday, 11.06.2026, 16:15-17:45

Process → structure inverse problems.
ML-guided process maps (e.g. AM laser power vs scan speed).
Physics-informed vs unconstrained regression.

Summary: This unit explores Inverse Problems—the cornerstone of materials design where we seek the processing parameters required to achieve a target microstructure or performance. We contrast these with causal forward problems and discuss why they are often ill-posed and multi-valued. We introduce Physics-Informed Learning as a way to solve these challenges by enriching models with physical transformations and constraints. Students learn how to build and interpret Process Maps and “Process Corridors,” using machine learning to visualize safe operating regions in complex experimental spaces.

Exercise:
Construct a simple ML-based process map; compare constrained vs unconstrained models.

1.3.4 Unit IV — Uncertainty, Surrogates, and Automation (Weeks 10–12)

1.3.4.1 Week 10 – ML for characterization signals

Lecture: Tuesday, 16.06.2026, 14:15-15:45 | Exercise: Thursday, 18.06.2026, 16:15-17:45

Spectral data: XRD, EELS, EDS.
Denoising, peak finding, dimensionality reduction.
Using ML without destroying physical meaning.

Summary: This unit focuses on the processing of high-dimensional Characterization Signals (like XRD, EDS, and EELS) using unsupervised learning. We introduce K-Means Clustering and t-SNE for the automatic identification and visualization of phases in large experimental libraries. We then explore Autoencoders—neural networks that learn to compress complex spectra into a low-dimensional “latent space.” This allows for advanced denoising and feature extraction, enabling scientists to handle the massive data volumes produced by modern high-throughput characterization tools without losing physical insight.

Exercise:
Apply PCA/NMF to spectral datasets; interpret components physically.

1.3.4.2 Week 11 – Automation in microscopy and characterization

Lecture: Tuesday, 23.06.2026, 14:15-15:45 | Exercise: Thursday, 25.06.2026, 16:15-17:45

Autofocus, drift correction, parameter selection.
ML as a control component, not just a predictor.

Exercise:
Implement a simple ML-assisted autofocus or defect detector.

1.3.4.3 Week 12 – Uncertainty-aware regression & Gaussian Processes

Lecture: Tuesday, 30.06.2026, 14:15-15:45 | Exercise: Thursday, 02.07.2026, 16:15-17:45

Aleatoric vs epistemic uncertainty in experiments.
Gaussian Processes as uncertainty-aware surrogates.
Exploration vs exploitation in experimental design.
Connection to materials acceleration platforms.

Exercise:
Compare GP regression and NN ensembles for a process-parameter problem.

1.3.5 Unit V — Physics, Trust, and Synthesis (Weeks 13–14)

1.3.5.1 Week 13 – Physics-informed and constrained ML

Lecture: Tuesday, 07.07.2026, 14:15-15:45 | Exercise: Thursday, 09.07.2026, 16:15-17:45

Embedding physical constraints into ML models.
Penalty terms, soft constraints, hybrid approaches.
Failure modes of unconstrained models.

Exercise:
Train a constrained model for a processing or characterization task.

1.3.5.2 Week 14 – Integration, limits, and reflection

Lecture: Tuesday, 14.07.2026, 14:15-15:45 | Exercise: Thursday, 16.07.2026, 16:15-17:45

Explainability for experimental ML (CAMs, SHAP).
Why ML fails in real labs.
Where ML genuinely changes materials processing.

Exercise:
Mini-project presentations and critical discussion.

1.4 Learning Outcomes

Students completing this course will be able to:

Interpret materials processing and characterization data as learning problems.
Build ML pipelines for microstructure analysis, process prediction, and spectral data.
Understand the physics of data formation to avoid common ML pitfalls.
Evaluate generalization, robustness, and uncertainty in experimental ML models.
Apply Gaussian Processes and neural networks as surrogate models.
Integrate physical constraints into ML workflows.
Critically assess claims about ML in materials processing and characterization.

1.5 Lab Possibilities

Microscopy datasets: noise, metadata, units, and artifacts.
Fourier inspection of SEM/TEM images.
Broken vs correct ML pipelines (data leakage case studies).
Feature extraction vs learned representations.
Fine-tuning pretrained CNNs on microstructures.
Process–property regression with uncertainty.
GP-based process maps.
Spectral decomposition (NMF) of EELS/XRD data.
ML-assisted autofocus or EBSD pattern classification.
Multi-modal fusion of images, spectra, and process parameters.

Summary: This unit explores the cutting edge of Autonomous Characterization, where machine learning moves from passive data analysis to active instrument control. We introduce Multi-Modal Data Fusion techniques to combine information from diverse sensors like SEM images, EDS spectra, and process logs using Bayesian frameworks. We then discuss Reinforcement Learning (RL) as a tool for automating complex laboratory tasks, such as instrument tuning and process optimization. Through case studies in microscopy and industrial processing, students learn how to build integrated pipelines that can autonomously find, characterize, and decide the next steps of an experiment.

References

Sandfeld, Stefan. 2024. Materials Data Science: Introduction to Data Mining, Machine Learning, and Data-Driven Predictions for Materials Science and Engineering. Springer Nature.