FAU Erlangen-Nürnberg
Institute of Micro- and Nanostructure Research
notebooks/week07_transfer_finetune.ipynb — pretrain a tiny CNN on abundant synthetic “task A” (Voronoi-like), then compare (i) from-scratch on few task-B labels vs (ii) transfer (freeze backbone, train head); plot loss and accuracy curves; vary label count and observe the transfer gap shrink. All CPU-fast on tiny data. Slide numbers in this deck match the notebook section headers.Labelled image counts across domains. ImageNet: 14 million images, crowdsourced labels in seconds. Medical imaging: tens of thousands, expert radiologists. Materials science / EM: 50–500 images, PhD microscopists spending hours per image Holm, Elizabeth A. et al., (2020); Sandfeld, Stefan et al., (2024). Three orders of magnitude separate us from where standard deep learning was designed to work.
Training and validation loss for a CNN fine-tuned from scratch on 50 EM images. Training loss falls monotonically; validation loss starts rising around epoch 40 — the model is memorising the training images, not learning to generalise. The gap is the overfitting region.
Six augmented views of the SAME synthetic grain microstructure. All six panels show the same Voronoi grain layout (same polygonal grains, same topology) transformed in different ways. Top row: original, 90° rotation (valid for equiaxed grains), horizontal flip (valid — no polarity). Bottom row: brightness jitter (valid — structural label), Poisson noise (simulates low dose), vertical flip (invalid — breaks a surface gradient if present). Each valid transform is a claim that the physics has a symmetry.
Four panels illustrating when augmentations are illegal. Panel 1 (EBSD map): rotation is illegal — the colour encodes crystallographic orientation; rotating the image without rotating the IPF colour key produces a physically impossible map. Panel 2 (directional solidification): vertical flip is illegal — the thermal gradient is physically real. Panel 3 (EELS map): intensity jitter is illegal — calibrated intensity encodes composition. Panel 4 (equiaxed polycrystal): all augmentations checked here are valid.
Label consistency: when a grain-boundary image is rotated 45°, the segmentation mask must be rotated by exactly the same 45°. Top row (left to right): original image, original mask, rotated image (45°). Bottom row: correct — rotated mask (same 45°, joint transform); wrong — un-rotated mask paired with the rotated image, producing misaligned ground truth.
transform(image=img, mask=mask) samples one random configuration and applies it to both. The mask and image stay aligned.transform(image=img) and transform(image=mask) separately — two independent random angles — mask and image become desynchronised. Symptom: IoU plateaus with no obvious cause.HorizontalFlip claims mirror symmetry; ElasticTransform claims drift robustness; RandomBrightnessContrast claims the label is structural, not intensity-calibrated.Transferability as a function of CNN depth Yosinski, Jason et al., (2014). Layer 1 (edges, gradients): ~95% transferable — universal low-level image features. Layer 2 (textures, corners): ~80% — mostly domain-general. Layer 3 (object parts): ~45% — becoming domain-specific. Layer 4+ (full objects / task-specific): ~10% — ImageNet-dog features are not EM features.
| Small label count (<100) | Medium label count (100–1 000) | |
|---|---|---|
| Small domain gap (optical vs optical) | Feature extraction | Fine-tuning (differential LRs) |
| Large domain gap (ImageNet vs HAADF) | Feature extraction + BN adapt | Fine-tuning (differential LRs + gradual unfreeze) |
| Zero real labels | Synthetic pretrain → head | Synthetic pretrain → fine-tune |
eval() mode uses ImageNet’s stored BN statistics. Grayscale 16-bit micrographs have different statistics → silent mis-normalisation → weak features. Fix: keep BN layers in train() mode even when backbone weights are frozen.Three-stage transfer learning recipe. Stage 1: all backbone blocks frozen (grey); only the head (red) is trained at lr=1e-3. Stage 2: last backbone block unfrozen (orange) with low lr=1e-5; head continues at 1e-3. Stage 3: gradual unfreezing, depth-graded learning rates — early layers receive the smallest lr, late layers more, head the most.
Validation accuracy during fine-tuning. Green (correct): differential LRs — backbone gets lr=1e-5, head gets lr=1e-3; accuracy climbs steadily. Red dashed (wrong): uniform large lr=1e-3 for the whole network — the first few epochs destroy pretrained ImageNet features (catastrophic forgetting spike); recovery is partial and slow.
Voronoi synthetic microstructure pipeline for grain segmentation. From left: (1) random seed points placed in 2D; (2) each pixel assigned to its nearest seed — the Voronoi geometry gives perfect free grain-ID labels; (3) random intensity per grain + dark boundary strip renders a simple grain image; (4) Poisson noise + Gaussian blur makes it look like a low-magnification SEM acquisition.
Three panels showing the sim-to-real challenge. Left: synthetic training image (clean, regular grains, no scan artefacts). Centre: real SEM image with scan distortion, vignette, and contrast drift relative to the synthetic distribution. Right: U-Net prediction — grain topology is correctly identified despite the gap, because topology is the task-relevant invariant.
Left: random labelling strategy — 50 labels scattered uniformly across feature space. Right: active learning — labels concentrated near the decision boundary, where uncertainty is highest. With the same 50 labels, the active strategy correctly identifies the decision boundary; random labelling leaves a large uncertain region.
Complete small-data EM workflow diagram. The labelled EM data (20–200 images) feeds augmentation and transfer learning in parallel; synthetic data feeds domain adaptation; all three converge on a fine-tuned model. The active learning loop (dotted arrow, bottom) queries the fine-tuned model for the most uncertain unlabelled images, sends them to expert annotation, and grows the labelled pool.
GroupKFold(n_splits=5).split(X, y, groups=specimen_ids) — entire specimens are in either train or test, never both. This is the Week 4 lesson applied to the augmented EM context.| Strategy | Label budget | Typical gain | Key caveat |
|---|---|---|---|
| Augmentation alone | 50 images | $$1.5–3× effective data | Invalid transforms hurt |
| Feature extraction | 20–100 images | $$10–30% accuracy improvement | BN running stats trap |
| Full fine-tuning | 100–1000 images | $$20–50% over scratch | Catastrophic forgetting if no diff-LR |
| Voronoi pre-train | 0 real labels | Strong baseline for grain tasks | Fails for twins, non-equiaxed |
| All combined | 20–50 real labels | Close to full-data performance | Honest grouped validation required |

©Philipp Pelz - FAU Erlangen-Nürnberg - Data Science for Electron Microscopy