Mathematical Foundations of AI &amp; ML

Philipp Pelz; Stefan Hiemer

ECLIPSE Lab Teaching

Mathematical Foundations of AI & ML

Foundations course for the ECLIPSE teaching track in machine learning, computational imaging, and materials data science.

Semester

Summer Semester 2026

Format

2h lecture + exercises

Credits

5 ECTS

Audience

Students in Materials Science and related quantitative programmes

Prerequisites

Linear algebra, calculus, and basic Python recommended

StudOn GitHub / Materials All Teaching KI in Materialtechnologie

How to use this course site. Use this page as the central hub for syllabus, lecture structure, reading, notebooks, and course materials. Formal announcements and enrollment remain on StudOn; code and openly shared resources live in the linked GitHub repository.

1 Instructors

Philipp Pelz
Stefan Hiemer

2 Recommended readings

We base much of the lecture on the following books:

Neuer (2024), Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications. Springer Nature.
McClarren (2021), Machine Learning for Engineers: Using Data to Solve Problems for Physical Systems. Springer.

Tangentially, we also recommend the following books:

Murphy (2012), Machine Learning: A Probabilistic Perspective. MIT Press.
Bishop (2006), Pattern Recognition and Machine Learning. Springer Science+ Business Media, LLC Berlin, Germany.

Schedule note: the reading table below is indexed by MFML unit (curriculum order). After the Week-7 (Pfingstmontag, 25.05.2026) cancellation, each unit from Unit 7 onward is delivered one calendar week later — see the weekly schedule above for the actual dates. Unit 14 (Explainability) becomes optional reading.

MFML Unit	MFML Lecture Focus (Revised)	Neuer ? Required Reading	Neuer ? Optional / Skim	McClarren ? Contextual / Optional	Bishop ? Targeted Depth (Optional)
1	Learning vs data analysis; models, loss functions	Ch. 1.1 Data-Based Modeling; 1.1.1 Concept of Model	1.1.3 Criticism of Data-Based Modeling	Ch. 1 Introduction (ML in physical systems)	Ch. 1 �1.1?1.2 (what is a model, pattern recognition view)
2	Linear algebra refresher; covariance, PCA/SVD (R)	Ch. 5.2 PCA (skim, notation & geometry only)	PCA implementation details	Ch. 5 Dimension Reduction (ROM intuition)	Ch. 12 �12.1?12.2 PCA derivation (selective)
3	Regression as loss minimization	Ch. 4.2.2 Regression; Ch. 4.4.1 LMS theory	LMS algorithm variants	Ch. 4 Regression (physical meaning of regression)	Ch. 3 �3.1?3.3 Linear regression, least squares geometry
4	Neural networks early: neuron, activations, backprop	Ch. 4.5.1 Neuron; 4.5.3 Activations; 4.5.4 Training	Framework-specific NN sections	Ch. 8 Neural Networks (surrogate perspective)	Ch. 5 §5.1–5.3 NN basics & Backpropagation
5	Clustering & Autoencoders	Ch. 5.3 K-Means; Ch. 5.5 Autoencoder	Advanced clustering	Ch. 5 Dimension Reduction	Ch. 9 Mixture Models; Ch. 12.3 Nonlinear PCA
6	Loss landscapes & optimization behavior	Ch. 4.4.6 Hyperparameters; Ch. 4.5.5 Optimization	Detailed optimizer variants	Ch. 7 Optimization	Ch. 3 �3.4 Regularization; �3.5 Bayesian view (skim)
7	Probabilistic view of learning; noise	Ch. 2.2 Distinguishing Uncertainties; Ch. 6.4 Uncertainty	Bayesian details	Ch. 3 Error and Uncertainty	Ch. 2 �2.1?2.3 Gaussian distributions & moments
8	Generalization, bias-variance, regularization, tree ensembles (RF & gradient boosting)	Ch. 4.5.9 Overfitting & Cross-Validation	—	Ch. 3 Decision Trees & Random Forests; Ch. 6 Model Selection & Validation	Ch. 3 §3.2 Bias-variance decomposition; Ch. 14 §14.3–14.4 (combining models, bagging, boosting)
9	Latent Spaces & Advanced Representation Learning (t-SNE, UMAP, contrastive, foundation embeddings)	Ch. 5.5 Autoencoder (recap)	—	Ch. 5 Dimension Reduction	Ch. 9 §9.1–9.4 mixture models / latent variables; Ch. 12 §12.3 nonlinear latent models
10	Attention & Transformers (self-attention, multi-head, ViT, foundation models)	—	—	Ch. 8 Neural Networks (context only)	— (covered via (vaswani2017attention?), (dosovitskiy2021vit?))
11	Generative Models: VAE & Diffusion (ELBO, reparameterization, forward/reverse process, classifier-free guidance)	Ch. 5.5 AE (foundation for VAE)	—	Ch. 8 Neural Networks (autoencoder context)	Ch. 9 §9.4 EM as ELBO precursor; Ch. 13 §13.3 deep generative perspectives
12	Uncertainty in predictions	Ch. 6.4 Stochastic Methods for Uncertainty	Advanced stochastic methods	Ch. 3 Error and Uncertainty	Ch. 3 �3.5 Bayesian regularization (skim)
13	Physics-informed & constrained learning	Ch. 6.1?6.3 Physics-Informed Learning	Semantic technologies	Ch. 11 Physics-Informed & Hybrid Models	Ch. 1 �1.6 Model complexity & Occam?s razor
14	Explainability, limits, scientific trust (optional reading — no scheduled lecture)	Ch. 7 Explainability (discussion & outlook)	?	Ch. 12 Limitations and Outlook	Ch. 1 �1.1?1.2 Reflection on model limits

References

Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning by Christopher m. Bishop. Vol. 400. Springer Science+ Business Media, LLC Berlin, Germany:

McClarren, Ryan G.. 2021. Machine Learning for Engineers: Using Data to Solve Problems for Physical Systems. Springer.

Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. MIT press.

Neuer, Marcus J. 2024. Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications. Springer Nature.

3 Week 1 Summary: Learning vs Data Analysis; Models, Loss Functions

Lecture: Monday, 13.04.2026, 10:15-11:45

Slides: Open

Models: simplified representations for prediction / explanation (white / grey / black-box)
Learning types: supervised (regression, classification), unsupervised, reinforcement
Empirical risk minimization: learning as optimization, not statistics
Loss zoo:
- Regression: MSE, MAE
- Classification: 0-1, softmax + cross-entropy
Train / val / test splits, cross-validation, data-leakage taxonomy
Bias–variance intuition; Occam’s razor and regularization
Uncertainty preview: aleatoric vs epistemic
Limits: no-free-lunch, curse of dimensionality
Frequentist vs Bayesian lenses (set-up for Unit 8)

4 Week 2 Summary: Linear Algebra Refresher; Covariance, PCA/SVD

Lecture: Monday, 20.04.2026, 10:15-11:45

Slides: Open

LA refresher: vector spaces, basis, rank; column / row / null spaces and identifiability
Projection geometry; least squares as projection onto column space
Condition number and numerical stability
Spectral decomposition of symmetric / PSD matrices; covariance-matrix geometry
PCA: linear dimensionality reduction by variance maximization
Scree plots for intrinsic dimensionality (“elbow”)
SVD: factorization for any matrix; low-rank approximation (Eckart–Young)
NMF: parts-based decomposition for non-negative spectra / images
Pseudo-inverse and least-squares solvability
L1 vs L2 regularization (geometric intuition); whitening and multicollinearity
Kernel hint from inner products (sets up later units)

5 Week 3 Summary: Regression as Loss Minimization

Lecture: Monday, 27.04.2026, 10:15-11:45

Slides: Open

Supervised Framework: Minimizing a cost function (MSE) to find optimal parameters.
Optimization: Analytical (Ordinary Least Squares) vs. Iterative (Gradient Descent).
Basis Functions: Expanding linear models to fit non-linear data using transformations (polynomials, splines).
Runge’s Phenomenon: Overfitting risk with high-order global polynomials.

6 Week 4 Summary: Neural Networks — From Neurons to CNNs

Lecture: Monday, 04.05.2026, 10:15-11:45

Slides: Open · Backprop self-study

Fixed bases (Fourier / wavelet / polynomial) → motivation for learned representations
The modern neuron and dense layer; why non-linear activations are non-negotiable
Universal approximation vs the parameter explosion of dense layers on images
Invariance vs equivariance: what we want from image models
Convolution from weight sharing; cross-correlation, feature maps, receptive fields
Padding, stride, pooling, \(1\times1\) channel mixing
Architectures: LeNet → VGG → NiN → DenseNet → U-Net for dense prediction
Why CNNs fit microscopy / materials data — and where they fall short

Note: Backpropagation is covered in the self-study supplement 02_backprop_self_study.qmd appended to this unit.

7 Week 5 Summary: Clustering & Autoencoders

Lecture: Monday, 11.05.2026, 10:15-11:45

Slides: Open

K-Means / K-Medoids: hard clustering by minimizing within-cluster distance
- Sensitive to initialization (use k-means++); assumes spherical clusters
GMM + EM: probabilistic clustering with soft assignments
- E-step: responsibilities; M-step: parameter update
- Each EM step never decreases the log-likelihood
Autoencoders: encoder–bottleneck–decoder, trained on reconstruction loss
- Linear AE recovers PCA
- Non-linear AE captures curved manifolds
Applications: compression, anomaly detection (reconstruction error), feature extraction

Note: Backpropagation has moved to a self-study supplement appended to Unit 4 (02_backprop_self_study.qmd); this freed the Unit 5 slot for unsupervised learning.

8 Week 6 Summary: Loss Landscapes & Optimization Behavior

Lecture: Monday, 18.05.2026, 10:15-11:45

Slides: Open

Loss Landscape: High-dimensional topography determining optimization success.
Curvature: Hessian matrix dictates steepness; ill-conditioned landscapes slow descent.
Saddle Points: Common traps in high dimensions that hinder optimizers.
Advanced Optimizers: Momentum and adaptive learning rates (ADAM) navigate complex landscapes robustly.

9 Week 7: No lecture — Pfingstmontag (25.05.2026, public holiday)

Cancelled — no MFML lecture takes place on 25.05.2026 (Pfingstmontag public holiday), and the week’s exercise is cancelled too. Every remaining unit shifts one week later (from Week 8 onward), and the closing Explainability, Limits & Scientific Trust unit becomes optional reading (listed at the end of the schedule).

10 Week 8 Summary: Probabilistic View of Learning; Noise

Lecture: Monday, 01.06.2026, 10:15-11:45

Slides: Open

Aleatoric (irreducible noise) vs epistemic (lack of data) uncertainty
Gaussian as the maximum-entropy distribution; multivariate Gaussian and covariance geometry; CLT
Entropy and KL divergence, including KL between Gaussians (used later for VAEs)
MLE: log-likelihood maximization; for a Gaussian, MLE recovers MSE
Bayes’ theorem workflow: prior · likelihood ∝ posterior; predictive distribution
MAP estimation; MAP = regularized MLE
Frequentist vs Bayesian comparison; credible vs confidence intervals
Robustness: Student’s t-distribution mitigates outliers better than Gaussian
Stochastic enrichment and mixture-density networks (preview of Unit 12)
Practical diagnostic: calibration plots

11 Week 9 Summary: Generalization, Bias-Variance, Regularization, Tree Ensembles

Lecture: Monday, 08.06.2026, 10:15-11:45

Slides: Open

Generalization: performance on unseen data
Bias–variance tradeoff: \(\mathrm{MSE} = \mathrm{Bias}^2 + \mathrm{Variance} + \mathrm{noise}\)
Regularization: L2 (Ridge), L1 (Lasso), Dropout
Validation: cross-validation as unbiased generalization estimate
- Never tune on the test set
Random Forests: bagging + random feature subsets per split
- → variance reduction via decorrelated trees
Gradient Boosting: sequential weak learners on residuals
- → bias reduction
- XGBoost / LightGBM / CatBoost as the practical workhorses
Trees vs NNs: gradient boosting usually wins on tabular data (\(N < 10^5\))

12 Week 10 Summary: Latent Spaces & Advanced Representation Learning

Lecture: Monday, 15.06.2026, 10:15-11:45

Slides: Open

What makes a latent space “good”
- Compactness within concept, separation across concepts
- Smooth interpolation, transferability
- None guaranteed by reconstruction alone
t-SNE: KL between high-dim Gaussian and low-dim Student-t similarities
- Heavy tail solves the crowding problem
- Cluster sizes and between-cluster distances are not quantitatively meaningful
UMAP: preserves more global structure than t-SNE; scales to millions of points
Contrastive learning (SimCLR, InfoNCE): label-free latent shaping
- Pull augmentations of the same sample together, push others apart
Foundation embeddings (DINOv2, CLIP): pretrained encoders as feature extractors
- Linear probe is often the strongest baseline for label-scarce tasks

13 Week 11 Summary: Attention & Transformers

Lecture: Monday, 22.06.2026, 10:15-11:45

Slides: Open

Why attention: limits of RNNs / LSTMs for long sequences
Self-attention: \(\mathrm{softmax}(QK^T/\sqrt{d_k})V\)
- Similarity-weighted average of value vectors with content-based weights
- Positions choose whom to listen to (no fixed locality prior)
Multi-head attention: parallel heads on learned subspaces, then concat + project
- Different heads specialize on different relationships
Positional encoding: needed because attention is permutation-equivariant
- Sinusoidal, learned, or RoPE
Transformer block: multi-head attention + MLP with residual + LayerNorm; stack many
ViT: image as a sequence of patch tokens
- Beats CNNs at scale; loses with little data (no locality prior)
Foundation models: GPT, BERT, ViT, DINO, CLIP
- Pretrain at scale, freeze, reuse via embeddings + small heads

14 Week 12 Summary: Generative Models — VAE & Diffusion

Lecture: Monday, 29.06.2026, 10:15-11:45

Slides: Open

Why generative: vanilla autoencoders cannot sample new data
VAE: stochastic encoder produces \(\mathcal{N}(\mu, \sigma^2)\)
- Trained on the ELBO = reconstruction − KL to a Gaussian prior
- Reparameterization trick \(z = \mu + \sigma \odot \epsilon\) for differentiable sampling
Diffusion: predict the noise \(\epsilon_\theta(x_t, t)\) added at a random timestep
- Loss is plain MSE on the noise
- Sample from \(\mathcal{N}(0, I)\) and iterate the learned reverse process
Classifier-free guidance: train conditional + unconditional jointly; mix at sampling
Trade-offs:
- VAE: fast sampling, lower-bound likelihood, blurry samples
- Diffusion: slow sampling, state-of-the-art quality
- GANs: fast, no likelihood
- Normalizing flows: exact likelihood, restricted architectures
Materials applications:
- Inverse design (compositions matching a target property)
- Microstructure generation
- Physics-constrained spectral synthesis

15 Week 13 Summary: Uncertainty in Predictions

Lecture: Monday, 06.07.2026, 10:15-11:45

Slides: Open

Why point predictions aren’t enough; aleatoric vs epistemic recap
Bayesian predictive distribution and variance decomposition
Evidence framework: marginal likelihood as automatic Occam’s razor; effective number of parameters; empirical Bayes
Gaussian Processes as the unit’s main tool
- Mean and kernel function (RBF and others)
- Closed-form prior and posterior; hyperparameter learning
- Strengths and limits (\(O(n^3)\) cost, kernel choice)
MC Dropout and deep ensembles as cheaper UQ
Mixture-density networks for multi-modal predictive distributions
Calibration plots and recalibration
Choosing a UQ method: comparison table
Active learning via GP uncertainty (materials acceleration platforms)

16 Week 14 Summary: Physics-informed & Constrained Learning

Lecture: Monday, 13.07.2026, 10:15-11:45

Slides: Open

PINNs: Integrating physical laws directly into the loss function to reduce data needs.
Data Enrichment: Applying known mathematical transformations (FFT, derivatives).
Neural Integrators: Using NNs with automatic differentiation as flexible differential equation solvers.
Scientific Trust: Physics constraints act as powerful regularizers promoting Occam’s Razor.

17 Optional reading: Explainability, Limits & Scientific Trust

No scheduled lecture — this unit moved to optional reading after the Week-7 (Pfingstmontag) cancellation. The slides below remain available for self-study.

Slides: Open

Why XAI: from black-box to transparent, justifiable decisions; interpretability vs explainability
Six levels of explainability (E1–E6): data → process → feature → model → prediction → decision
Semantic structures: controlled vocabularies, taxonomies, ontologies for materials data
Sensitivity analysis: perturbation-based, global vs local; feature importance from sensitivity
Attribution methods:
- SHAP (waterfall, beeswarm)
- LIME (local linear approximation)
- Integrated Gradients for deep networks
Causality vs correlation: causal process chains; detection vs prediction
Counterfactuals: “what-if” explanations
Limits: data bias, extrapolation, OOD detection, fairness — when models should NOT be trusted
Course retrospective: how the 14-unit arc fits together

18 Mathematical Foundations of AI & ML ? Unified Syllabus Overview (with ML-PC & MG)

Legend

? First serious use ? concept must be introduced in MFML before being used in ML-PC or MG
? Reinforcement / application ? concept is applied or deepened, but not introduced
(R) Refresher ? topic was covered in a prior course and is only briefly revisited
MFML Mathematical Foundations of AI & ML
ML-PC Machine Learning in Materials Processing & Characterization
MG Materials Genomics

Week	MFML ? Mathematical Foundations (revised)	ML-PC ? ML in Materials Processing & Characterization (revised)	MG ? Materials Genomics (revised)	Exercise (90 min, Python-based)	Dependency Logic
1	Learning vs data analysis; models, loss functions, prediction vs explanation	Role of ML in processing & characterization; ML vs physics models	Role of ML in materials discovery; databases & targets	NumPy refresher; vectors, dot products, simple loss (MSE)	MFML defines ?learning? as optimization, not statistics
2	Linear algebra refresher for learning: covariance, PCA/SVD (R)	PCA as a tool for spectra & images (?)	PCA & low-D structure in materials spaces (?)	PCA refresher on known dataset; visualize variance directions	PCA assumed known; MFML aligns notation & geometry
3	Regression as loss minimization; linear models revisited	Regression as surrogate modeling for processes & properties (?)	Regression & correlation in materials datasets (?)	Linear regression from scratch via loss minimization	Regression reframed explicitly as learning problem
4	Neural networks early: neuron, activations, backprop	NN regression for materials properties (?)	NN models for structure-property relations (?)	Single-neuron + backprop (manual forward/backward pass)	MFML must precede any NN usage
5	Clustering & Autoencoders	Clustering & process drift detection (?)	Clustering vs discovery in materials space (?)	K-Means & simple Autoencoder implementation	MFML supplies unsupervised models
6	Loss landscapes, conditioning, optimization behavior	Hyperparameters, robustness, convergence issues (?)	Model robustness & sensitivity (?)	Gradient descent experiments: learning rate & conditioning	Optimization treated as learning dynamics
7	⚠ Pfingsten holiday — no lecture or exercise this week (all three courses; see note below). MFML & ML-PC units shift one week (Week 8 onward)	Overfitting control in models (?); RF / XGBoost as workhorses for tabular characterization data (?)	Limits of high-D regression (?); tree ensembles for property prediction over tabular materials descriptors (?)	Overfitting demo: polynomial vs NN models; tree-ensemble baseline (RF & XGBoost) on alloy regression	Critical conceptual gate for both applied courses; introduces the tabular workhorse
8	Probabilistic view of learning: noise & likelihood	Noise-aware modeling & error propagation (?)	Noise & uncertainty in materials datasets (?)	Noise injection; likelihood vs MSE comparison	MFML reframes probability for ML
9	Latent spaces & advanced representation learning: t-SNE, UMAP, contrastive learning, foundation embeddings	Visualization & quality assessment of learned features (?)	Foundation embeddings for materials descriptors (?)	t-SNE/UMAP comparison; SimCLR + linear probe on a foundation embedding	Critical for both applied courses; sets up modern self-supervised paradigm
10	Attention & Transformers: self-attention, multi-head, positional encoding, ViT, foundation models	Transformers for sequences and images in characterization (?)	Transformers for compositions & sequences (SMILES, crystal tokens) (?)	Single-head attention from scratch; tiny ViT vs CNN on a small dataset	Architecture behind all modern foundation models
11	Generative Models: VAE & Diffusion (ELBO, reparameterization, forward/reverse process, classifier-free guidance)	Generative models for data augmentation & process simulation (?)	Inverse design via conditional generation (?)	VAE on Fashion-MNIST + toy diffusion model (200-line DDPM)	Enables inverse design and modern generative applications
12	Uncertainty in predictions (aleatoric vs epistemic); Gaussian Processes (conceptual)	Trust & confidence in ML-assisted decisions; surrogate models (?)	Discovery & screening with uncertainty; exploration vs exploitation (?)	Predictive uncertainty: GP regression vs NN ensembles	Enables responsible ML & accelerator concepts
13	Physics-informed & constrained learning	Physics-informed ML for processes & characterization (?)	Physical constraints in materials ML (?)	Constrained NN / penalty-based PINN demo	MFML leads constraints & PINN concepts
14	Explainability, limits, scientific trust	Integrated case studies & failure modes	Limits & ethics of data-driven discovery	Mini end-to-end synthesis project	All courses converge conceptually

Week 7 (25–26.05.2026) is cancelled across all three courses (Pfingstmontag / Pfingstdienstag public holidays) — no lectures and no exercises. MFML and ML-PC shift every later lecture one calendar week (MFML’s Explainability unit becomes optional reading; ML-PC consumes its former Week-14 buffer); Materials Genomics consolidates its Week-7 material into Weeks 6 & 8. The rows above show the topic and dependency sequence — see each course’s weekly schedule for the exact post-holiday dates.