FAU Erlangen-Nürnberg
Institute of Micro- and Nanostructure Research
notebooks/week10_bayesian_optimization.ipynb — implement a 1-D BO loop from scratch (GP surrogate + UCB acquisition, sklearn, CPU, < 60 s); show BO finds the optimal 4D-STEM parameter on a multi-modal objective in 12 iterations, escaping a deceptive local optimum (best found: 0.9323 vs random 0.7229); exercise: vary the UCB \(\kappa\) (exploit vs explore) and the acquisition function (EI).Week 9 GP posterior on 8 EELS measurements of Fe³⁺ fraction. The ±2σ band (blue) balloons beyond x≈0.85 — the GP is maximally uncertain there. The dashed red line marks the position of maximum uncertainty: the GP is actively recommending this as the next measurement site. Week 10 turns this observation into a principled algorithm.
Left: a 64-measurement grid search uniformly samples the (beam voltage × convergence angle) parameter space — most measurements fall in the low-SNR blue region. Right: Bayesian optimisation with 11 targeted measurements (red-to-yellow circles) converges toward the true optimum (gold star, ≈180 kV, 25 mrad) by actively choosing where to sample next. Grid search wastes budget on uninformative regions; BO concentrates effort where it matters.
Four snapshots of the BO loop optimising a multi-modal 4D-STEM acquisition parameter (SEED=42, κ=3.0). Blue region: GP ±2σ. Blue line: GP mean. Dashed red vertical: next query chosen by the UCB acquisition function (red dotted, scaled). Green dotted: true global optimum (x≈0.78). Orange dotted: deceptive local optimum (x≈0.25). Gold star: current best observed point. By iteration 10 the loop has escaped the local optimum and concentrated measurements near the true global peak. Shahriari, Bobak et al., (2016)
GP surrogate (top) and three acquisition functions (bottom) for the same 3-measurement dataset. UCB (red, κ=2) selects x≈0.66, balancing the high mean near x=0.50 and the large uncertainty beyond x=0.70. EI (purple) and PI (teal dashed) agree on direction but differ in how sharply they peak. All three acquisition functions are evaluated by maximising over a dense grid — no gradient needed. Shahriari, Bobak et al., (2016)
Left: grid search with 64 measurements uniformly covers a 2-D (beam voltage × convergence angle) parameter space — most measurements are in low-SNR regions. Right: Bayesian optimisation with 11 measurements (3 initial + 8 BO steps) concentrates near the true optimum (gold star at ≈ 180 kV, 25 mrad), found because the GP acquisition function guided each new measurement toward the high-SNR ridge. Shahriari, Bobak et al., (2016)
Convergence plot: BO (UCB, κ=3, blue) vs random sampling (red dashes) over 12 iterations after 3 initial measurements. Both start at init best = 0.6985 (near the deceptive local optimum). BO discovers the global peak region at iteration 2 (best jumps to 0.8947), then refines further to 0.9323. Random sampling stays stranded near the local optimum (best 0.7229) — never reaching the narrow global peak. BO advantage: +0.2094 (29.0%) at equal budget. Green dotted: true global max (0.9205). Orange dotted: deceptive local max (0.73). (SEED=42, N_INIT=3, N_ITER=12, κ=3.0)
Deep Kernel Learning architecture. Raw input (HAADF image patch) is transformed by a neural network feature extractor \(g(\mathbf{x}; w)\) into a low-dimensional embedding. A base kernel (RBF) then operates in the embedded space. All parameters — NN weights and GP hyperparameters — are trained jointly by maximising the GP marginal likelihood. This combines NN representation power with GP uncertainty quantification. Wilson, Andrew G. et al., (2016)
scalariser = CoM_magnitude. The DKL + BO system then pursues that goal autonomously.RL control loop for an electron microscope. The agent (policy network) observes the current state (image or beam measurement), selects an action (lens current change, stage move), and receives a scalar reward from the environment (the microscope). The reward signal encodes the experimental objective: sharp image (autofocus), symmetric diffraction (beam alignment), or maximum CoM magnitude (4D-STEM discovery). The policy is trained to maximise cumulative reward — no manual labels required.
| Concept | Supervised learning | RL for EM control |
|---|---|---|
| Input | Image, spectrum | State \(s_t\) (current image) |
| Output | Label, regression value | Action \(a_t\) (lens Δ-current) |
| Training signal | Human labels \(y_i\) | Reward \(r_t\) (sharpness score) |
| Learning | Minimise loss | Maximise cumulative reward |
| Data | Fixed labelled dataset | Online interaction with microscope |
| Labels needed | Yes (expensive) | No (reward is computed automatically) |
The key advantage of RL for microscopy control: the reward function is automatically computable from the microscope output — no human annotation loop is needed. Bishop, Christopher M., (2006)
Image sharpness reward (Laplacian variance, normalised) as a function of objective lens defocus (µm). The true sharpness (blue) peaks sharply at 0 µm — the in-focus position. The noisy observed reward (light blue) is what the RL agent sees at each step. The RL agent’s sequential defocus evaluations (red dots, numbered) converge to within ±0.1 µm of the optimum in fewer than 9 steps — faster than a traditional 20-point sweep and without exposing the sample to unnecessary dose.

©Philipp Pelz - FAU Erlangen-Nürnberg - Data Science for Electron Microscopy