Machine Learning in Materials Processing & Characterization
Unit 4: From Classical Metrics to Learned Representations

Prof. Dr. Philipp Pelz

FAU Erlangen-Nürnberg

FAU Logo IMN Logo CENEM Logo ERC Logo Eclipse Logo

0. Transitioning the Paradigm

From Stereology to Neural Networks

Today’s Learning Journey:

  • Classical Metrics: The limits of “hand-crafted” descriptors.
  • The Artificial Neuron: From biological inspiration to the Perceptron.
  • ADALINE: Learning via gradient descent.
  • The XOR Problem: Why we need layers.
  • Activation Functions: Breaking the linearity.

1. Classical Microstructure Metrics

Stereology: The Bedrock

  • Standard metrics: Volume fractions (\(\phi\)), grain size (\(d\)), surface-to-volume ratio (\(S_V\)).
  • These are scalar condensations of complex 3D structures.

The Limits of Hand-Crafting

  • Information Loss: Scalar metrics discard topological connectivity and local clustering.
  • The Bottleneck: We can only predict based on descriptors we think are important.

Goal: Move from predicting with descriptors to Learning the Representation.

2. The Foundational Neuron

Biological Inspiration

  • Dendrites: Receivers. Soma: Processor. Axon: Transmitter.
  • Synaptic Plasticity: Learning by adjusting connection strengths.

The McCulloch-Pitts (MCP) Neuron (1943)

  • First computational model: A Threshold Logic Unit (TLU).
  • \(y = 1\) if \(\sum x_i \ge \theta\), else \(0\).
  • Simple but powerful: Can solve Boolean AND/OR problems.

3. The Rosenblatt Perceptron (1958)

Weights and Bias

  • Introducing individual weights \(w_i\) and a bias term \(b\).
  • Net input: \(z = \sum w_i x_i + b = \mathbf{w}^T \mathbf{x} + b\).

The Learning Rule

  • \(w \leftarrow w + \eta (d - y)x\)
  • This rule iteratively shifts the Decision Boundary (a hyperplane) to separate classes.

4. ADALINE & Gradient Descent

Continuous Learning

  • ADALINE (Adaptive Linear Neuron) uses the continuous output for training.
  • Delta Rule: Minimize the Mean Squared Error (MSE): \(J = \frac{1}{2} (d - z)^2\).

Gradient Descent: The Optimization Engine

  • We “walk downhill” on the cost surface \(J\) by moving against the gradient: \[w_{new} = w_{old} - \eta \nabla J\]
  • This is the foundation of all modern deep learning optimization.

5. The XOR Problem: Why Layers Matter

The Linear Limit

  • Single neurons are Linear Classifiers. They can only solve problems where a straight line separates the classes.
  • XOR (Exclusive OR): Cannot be separated by a single line.

Hidden Layers

  • By stacking neurons in layers, we can create nonlinear decision boundaries.
  • Each hidden unit learns a new “feature” that simplifies the task for the next layer.

6. Feed-Forward Architecture

Multilayer Perceptrons (MLP)

  • Input Layer: Raw data.
  • Hidden Layer(s): Intermediate feature extraction.
  • Output Layer: Final prediction.

Deep Learning

  • Deep simply means having multiple hidden layers.
  • “Each layer represents a higher level of abstraction.” McClarren, Ryan G., (2021)

7. Activation Functions

Breaking Linearity

  • Without nonlinear activation, a deep network is just a single linear transformation.
  • \(\mathbf{W}_2(\mathbf{W}_1 \mathbf{x}) = \mathbf{W}_{eff} \mathbf{x}\).

Common Functions

  • Sigmoid: Maps to \((0, 1)\). Good for probabilities.
  • Tanh: Zero-centered, maps to \((-1, 1)\).
  • ReLU (Rectified Linear Unit): \(f(x) = \max(0, x)\).
    • Fast, prevents vanishing gradients, and is the current industry standard.

8. Summary & Takeaways

Key Messages:

  1. Classical metrics are limited by manual feature selection.
  2. The Perceptron introduced weights and an iterative learning rule.
  3. Gradient Descent enables automatic optimization of these weights.
  4. Hidden Layers and Activation Functions are required to solve non-linear problems (XOR).
  5. We are now ready for Microstructure Vision (CNNs).

References

Machine learning for engineers: Using data to solve problems for physical systems, Ryan G. McClarren

Example Notebook

Week 4: Baseline Before CNNs — DigitsDataset