Materials Genomics
Unit 9: Representation Learning and Feature Discovery
FAU Erlangen-Nürnberg
By the end of this unit, students can: - explain the bottleneck principle and the role of the latent space in autoencoders, {.fragment} - distinguish between linear (PCA) and nonlinear (Autoencoder) dimensionality reduction, {.fragment} - evaluate embedding quality using separability, transferability, and probe tests, {.fragment} - identify failure modes such as shortcut learning and over-compression in materials tasks, {.fragment} - implement a representation-learning pipeline for spectral or structural data. {.fragment}
Principal Component Analysis (PCA) - A linear projection onto the eigenspace of the covariance matrix. {.fragment} \[ \hat{\mathbf{x}}_i = \mathbf{x}_i \mathbf{S}_C \] - Focuses on preserving variance. {.fragment}
Autoencoder (AE) - Uses nonlinear activation functions (ReLU, Sigmoid) to “unwrap” complex manifolds. {.fragment} - Focuses on minimizing reconstruction error. {.fragment}
| Feature | PCA | Autoencoder |
|---|---|---|
| Map | Linear | Nonlinear (usually) |
| Optimizer | Eigendecomposition | Backpropagation |
| Manifold | Hyperplane | Curved/Arbitrary |
| Equivalent | AE with linear activations | General case |
matminer descriptors vs. learned embeddings.
© Philipp Pelz - Materials Genomics