Mathematical Foundations of AI & ML
Unit 10: Latent Spaces and Embeddings
FAU Erlangen-Nürnberg
By the end of this lecture, students can:
| Space | Definition | Structure |
|---|---|---|
| Feature space | Raw measurements (\(\mathbb{R}^d\)) | High-dimensional, redundant |
| PCA eigenspace | Linear projection onto top eigenvectors | Flat subspace, orthogonal axes |
| Latent space | Learned embedding (possibly nonlinear) | Captures manifold structure |
[frequency, phase]. Interpolating yields smooth structural changes in the generated waveform.//| echo: false
viewof alpha_interp = Inputs.range([0, 1], {value: 0.5, step: 0.01, label: "α (Interpolation)"})
z1_latent = [1, 0] // freq 1, phase 0
z2_latent = [5, Math.PI] // freq 5, phase pi
z_mix_latent = [
alpha_interp * z1_latent[0] + (1 - alpha_interp) * z2_latent[0],
alpha_interp * z1_latent[1] + (1 - alpha_interp) * z2_latent[1]
]
x_interp = d3.range(0, 4 * Math.PI, 0.05)
Plot.plot({
width: 550,
height: 350,
y: {domain: [-1.2, 1.2], label: "Decoded Output"},
x: {domain: [0, 4 * Math.PI], label: "x"},
marks: [
Plot.ruleY([0]),
Plot.line(x_interp, {x: d => d, y: d => Math.sin(z1_latent[0] * d + z1_latent[1]), stroke: "black", strokeOpacity: 0.2}),
Plot.line(x_interp, {x: d => d, y: d => Math.sin(z2_latent[0] * d + z2_latent[1]), stroke: "black", strokeOpacity: 0.2}),
Plot.line(x_interp, {x: d => d, y: d => Math.sin(z_mix_latent[0] * d + z_mix_latent[1]), stroke: "blue", strokeWidth: 3, title: "Interpolated"}),
Plot.text([[2 * Math.PI, 1.1]], {text: [`Freq: ${z_mix_latent[0].toFixed(2)}, Phase: ${(z_mix_latent[1]/Math.PI).toFixed(2)}π`], fill: "blue", fontSize: 16})
]
})\[ p_{j|i} = \frac{\exp(-\|\mathbf{x}_i - \mathbf{x}_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i}\exp(-\|\mathbf{x}_i - \mathbf{x}_k\|^2 / 2\sigma_i^2)} \]
//| echo: false
viewof sigma_tsne = Inputs.range([0.1, 4], {value: 1, step: 0.1, label: "Bandwidth (σ_i)"})
pts_tsne = [-3, -2, -0.5, 0, 0.8, 2.5, 4]
target_idx_tsne = 3
function p_ji_tsne(idx_i, idx_j, sigma) {
if (idx_i === idx_j) return 0;
let dist_sq = (pts_tsne[idx_i] - pts_tsne[idx_j])**2;
let num = Math.exp(-dist_sq / (2 * sigma * sigma));
let den = 0;
for (let k = 0; k < pts_tsne.length; k++) {
if (k !== idx_i) {
den += Math.exp(-((pts_tsne[idx_i] - pts_tsne[k])**2) / (2 * sigma * sigma));
}
}
return num / den;
}
tsne_data = pts_tsne.map((x, i) => ({
x: x,
p: p_ji_tsne(target_idx_tsne, i, sigma_tsne),
is_target: i === target_idx_tsne
}))
Plot.plot({
width: 550,
height: 350,
x: {domain: [-4.5, 4.5], label: "Data points (1D Space)"},
y: {domain: [-0.1, 1], label: "Probability p_{j|i}"},
marks: [
Plot.ruleY([0]),
Plot.line(d3.range(-5, 5, 0.1), {
x: d => d,
y: d => Math.exp(-Math.pow(d - pts_tsne[target_idx_tsne], 2) / (2 * sigma_tsne * sigma_tsne)),
stroke: "gray", strokeDasharray: "4"
}),
Plot.ruleX(tsne_data.filter(d => !d.is_target), {x: "x", y1: 0, y2: "p", stroke: "blue", strokeWidth: 4}),
Plot.dot(tsne_data.filter(d => !d.is_target), {x: "x", y: "p", fill: "blue", r: 6}),
Plot.dot(tsne_data, {x: "x", y: -0.02, fill: d => d.is_target ? "red" : "black", r: d => d.is_target ? 8 : 5})
]
})\[ q_{ij} = \frac{(1 + \|\mathbf{y}_i - \mathbf{y}_j\|^2)^{-1}}{\sum_{k \neq l}(1 + \|\mathbf{y}_k - \mathbf{y}_l\|^2)^{-1}} \]
//| echo: false
viewof distScale = Inputs.range([1, 10], {value: 4, step: 0.1, label: "Viewing Distance"})
x_vals_t = d3.range(-distScale, distScale, distScale/200)
Plot.plot({
width: 550,
height: 350,
y: {domain: [0, 0.45], label: "Similarity / Density"},
x: {domain: [-distScale, distScale], label: "Distance ||y_i - y_j||"},
marks: [
Plot.ruleY([0]),
Plot.line(x_vals_t, {x: d => d, y: d => Math.exp(-d*d/2) / Math.sqrt(2*Math.PI), stroke: "blue", strokeWidth: 3}),
Plot.line(x_vals_t, {x: d => d, y: d => Math.PI**(-1) * (1 + d*d)**(-1), stroke: "red", strokeWidth: 3}),
Plot.text([[distScale*0.5, 0.4]], {text: ["Gaussian similarity"], fill: "blue", fontSize: 16}),
Plot.text([[distScale*0.5, 0.35]], {text: ["Student-t similarity"], fill: "red", fontSize: 16})
]
})\[ \text{KL}(\mathbf{P} \| \mathbf{Q}) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}} \]
| Strengths | Weaknesses |
|---|---|
| Excellent local structure | No global distance preservation |
| Handles complex manifolds | \(O(N^2)\) complexity (slow) |
| Reveals clusters and subgroups | Non-parametric (no new-point mapping) |
| Widely used and understood | Hyperparameter-sensitive |
| Aspect | t-SNE | UMAP |
|---|---|---|
| Speed | Slow (\(O(N^2)\)) | Fast (\(O(N \log N)\)) |
| Global structure | Poor | Better preserved |
| New points | Must rerun | Transform method available |
| Theory | KL divergence | Topological / cross-entropy |
| Typical use | Small–medium datasets | Any size |
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \phi(\mathbf{x}_i)^\top \phi(\mathbf{x}_j) \]
| Method | Linear? | Preserves | Scalability | New points |
|---|---|---|---|---|
| PCA | Yes | Global variance | \(O(d^2 N)\) | Yes |
| Kernel PCA | No | Kernel similarity | \(O(N^3)\) | Limited |
| t-SNE | No | Local neighborhoods | \(O(N^2)\) | No |
| UMAP | No | Local + some global | \(O(N \log N)\) | Yes |


Week 10: Autoencoder Latent Space — IsingDataset (64×64)