FAU Erlangen-Nürnberg
By the end of this unit, students can:
Hand-crafted features \(\Rightarrow\) learned features by neighbour-aware message passing.
Crystal structure \(\to\) graph \(G = (V, E)\) over the primitive unit cell plus periodic images.
With lattice matrix \(\mathbf{T}=[\vec{a}_1,\vec{a}_2,\vec{a}_3]\in\mathbb{R}^{3\times 3}\), the lattice-aware displacement to image \(\vec{n}\) is
\[\vec{r}_{ij}(\vec{n}) \;=\; \vec{r}_j + \mathbf{T}\,\vec{n} - \vec{r}_i, \qquad d_{ij}(\vec{n}) = \|\vec{r}_{ij}(\vec{n})\|\]
Edge created iff \(d_{ij}(\vec{n}) \le r_c\) — a single ordered pair \((i,j)\) may yield several edges (one per image inside the cutoff sphere).
A finite-cell graph must reproduce the topology of the infinite lattice.
Minimum-image convention: replace the raw displacement by its closest periodic image,
\[\vec{r}_{ij} \;\longleftarrow\; \vec{r}_{ij} \;-\; \mathbf{T}\,\mathrm{round}\!\big(\mathbf{T}^{-1}\,\vec{r}_{ij}\big)\]
where \(\mathrm{round}\) acts component-wise on the fractional displacement.
Why a naive cutoff fails on small cells:
Forgetting PBC \(\Rightarrow\) disconnected graph, missing nearest neighbours, wrong coordination — the most common silent bug in custom pipelines.
CIF / POSCAR
│
▼
parse lattice T and fractional coords {f_i}
│
▼
Cartesian coords r_i = T · f_i
│
▼
neighbour search: for each i, j, image n ∈ Z^3
compute d_ij(n); keep iff d_ij(n) ≤ r_c
│
▼
edge tensor [src, dst, image, d_ij, r_ij]
node tensor [Z_i, group, period, ...]
│
▼
batched DGL / PyG graph → GNNpymatgen.StructureGraph, ase.neighborlist, jarvis.core.graphs, torch_geometric.transforms.RadialGraphHybrid: \(k\)-NN with maximum cutoff (max_neighbors=12, r_c=8 Å) — the practical default in CGCNN, MEGNet, ALIGNN.
def build_edges(frac_coords, T, r_c):
cart = frac_coords @ T # (N, 3)
# search shells of periodic images that intersect r_c-sphere
n_max = ceil(r_c / min_lattice_spacing(T))
images = product(range(-n_max, n_max+1), repeat=3)
edges = []
for n in images:
shift = np.array(n) @ T # (3,)
for i in range(N):
for j in range(N):
r_ij = cart[j] + shift - cart[i]
d = np.linalg.norm(r_ij)
if 1e-8 < d <= r_c:
edges.append((i, j, n, d, r_ij))
return edgesRaw \(d_{ij}\) is a poor input to an MLP — it is a single scalar with strong non-linearity in the energy.
Expand into a Gaussian RBF basis of \(K\) centres \(\{\mu_k\}\):
\[e_{ij,k} \;=\; \exp\!\Big(-\,\frac{(d_{ij} - \mu_k)^2}{2\sigma^2}\Big), \qquad k = 1,\ldots,K\]
Typical grid (CGCNN / SchNet):
Why a smooth basis:
Higher-order geometry \(\to\) stricter inductive bias, but more expensive graphs and longer training.
A property predictor on a crystal must respect three groups acting on positions:
The right requirement depends on the target tensor type.
Invariance (scalars):
\[f(\{R\vec{r}_i + \vec{t}\}) \;=\; f(\{\vec{r}_i\})\]
Equivariance (vectors / tensors):
\[\vec{F}\!\left(\{R\vec{r}_i + \vec{t}\}\right) \;=\; R\,\vec{F}(\{\vec{r}_i\})\]
Permutation invariance: \(f(\pi\cdot V) = f(V)\) for any \(\pi \in S_N\) — handled automatically by sum/mean readouts over nodes.
Forces are gradients of an invariant scalar:
\[\vec{F}_i = -\nabla_{\vec{r}_i} E\]
If \(E\) is rotation-invariant and differentiable, then automatically
\[\vec{F}_i(R\,\{\vec{r}\}) \;=\; -\nabla_{R\vec{r}_i} E(R\,\{\vec{r}\}) \;=\; R\,\big(-\nabla_{\vec{r}_i} E(\{\vec{r}\})\big) \;=\; R\,\vec{F}_i(\{\vec{r}\})\]
A GNN layer updates each node from its neighbourhood. Generic form:
\[\boxed{\;h_i^{(\ell+1)} \;=\; U^{(\ell)}\!\Big(h_i^{(\ell)},\;\;\mathrm{AGG}_{j\in\mathcal{N}(i)}\, M^{(\ell)}\!\big(h_i^{(\ell)},\,h_j^{(\ell)},\,e_{ij}\big)\Big)\;}\]
After \(L\) layers, node \(i\) has aggregated information from atoms within graph distance \(L\) — its receptive field.
Take the simplest crystal-graph variant:
\[h_i^{(\ell+1)} \;=\; \sigma\!\Big(W^{(\ell)}\,\big(h_i^{(\ell)} + \tfrac{1}{|\mathcal{N}(i)|}\sum_{j\in\mathcal{N}(i)} \alpha_{ij}\,h_j^{(\ell)}\big) + b^{(\ell)}\Big)\]
A water-like fragment \(V = \{O, H_1, H_2\}\), edges \(\{(O,H_1),(O,H_2),(H_1,H_2)\}\), scalar features \(h_i^{(0)} = Z_i\).
Layer 1 (sum aggregation, \(\alpha_{ij}=1\), \(W=1\), \(\sigma=\mathrm{id}\)):
\[h_O^{(1)} = h_O^{(0)} + h_{H_1}^{(0)} + h_{H_2}^{(0)} = 8 + 1 + 1 = 10\]
\[h_{H_1}^{(1)} = 1 + 8 + 1 = 10, \qquad h_{H_2}^{(1)} = 1 + 8 + 1 = 10\]
Layer 2: now every node sees the second-shell sum,
\[h_O^{(2)} = 10 + 10 + 10 = 30\]
— the receptive field doubles per layer. With realistic \(W\) and \(\sigma\), the same mechanism builds chemically-meaningful local descriptors.
Xie and Grossman (2018) introduce the first widely-used crystal-graph model.
\[h_i^{(\ell+1)} \;=\; h_i^{(\ell)} + \sum_{j\in\mathcal{N}(i)} \sigma\!\big(W_z\,z_{ij}^{(\ell)} + b_z\big) \;\odot\; g\!\big(W_s\,z_{ij}^{(\ell)} + b_s\big)\]
with \(z_{ij}^{(\ell)} = h_i^{(\ell)} \Vert h_j^{(\ell)} \Vert e_{ij}\), sigmoid gate \(\sigma\) and softplus content \(g\).
Chen et al. (2019) add a global node \(u\) that interacts with every atom and edge.
\[u^{(\ell+1)} \;=\; \phi_u\!\Big(u^{(\ell)},\;\tfrac{1}{N}\!\sum_i h_i^{(\ell+1)},\;\tfrac{1}{|E|}\!\sum_{ij} e_{ij}^{(\ell+1)}\Big)\]
Schütt et al. (2018) replaces discrete edge weights by a continuous filter \(W: \mathbb{R}^+ \to \mathbb{R}^F\) evaluated on \(d_{ij}\).
\[h_i^{(\ell+1)} \;=\; h_i^{(\ell)} + \sum_{j\in\mathcal{N}(i)} h_j^{(\ell)} \,\odot\, W^{(\ell)}\!\big(d_{ij}\big)\]
Distance-only models cannot represent vectorial features in their hidden layers. Equivariant GNNs carry \(O(3)\)-equivariant tensors:
Rule of thumb: predicting scalars only \(\to\) CGCNN / SchNet is enough; predicting vectors / tensors \(\to\) go equivariant.
After \(L\) message-passing layers, aggregate node features into a graph-level vector \(h_G\):
\[\text{sum:}\quad h_G = \sum_{i\in V} h_i^{(L)} \qquad \text{mean:}\quad h_G = \tfrac{1}{N}\sum_{i\in V} h_i^{(L)}\]
\[\text{attention:}\quad h_G = \sum_{i\in V} \alpha_i\,h_i^{(L)}, \qquad \alpha_i = \frac{\exp(w^\top h_i^{(L)})}{\sum_j \exp(w^\top h_j^{(L)})}\]
The mismatched-readout pitfall: training “energy / atom” with sum readout silently learns to predict cell size.
Each layer mixes neighbour features \(\Rightarrow\) after many layers, all node embeddings converge.
Empirically, \(L \ge 4\) on dense crystal graphs collapses node features:
\[\lim_{\ell \to \infty} h_i^{(\ell)} = h_j^{(\ell)} \quad \forall i, j \in V\]
— the GNN forgets which atom is which.
Mitigations:
batch_index vectortorch_geometric.data.Batch and dgl.batchMulti-modal extensions (atom + microstructure + literature graphs) exist but are niche — covered briefly in U14.
Always pair a graph baseline with a composition-only MLP to expose this.
Reproducibility checklist: (i) cutoff, (ii) max neighbours, (iii) RBF \(\{\mu_k, \sigma\}\), (iv) PBC convention, (v) random seed.
Transfer across chemistries
Out-of-distribution structures
When reporting a new GNN result, always compare against:
Any new architecture must beat all four on both MAE and Spearman rank correlation.
Bandgap
Formation energy
Elasticity
pymatgen + torch_geometric. Verify edge count under PBC.Week 5: Crystal graphs + CGCNN — ABX\(_3\) perovskites

© Philipp Pelz - Materials Genomics