Data Science for Electron Microscopy
Lecture 5: Rotationally Invariant Variational Autoencoders & Generative Adversarial Networks
FAU Erlangen-Nürnberg
Authors: Sergei V. Kalinin, Ondrej Dyck, Stephen Jesse, Maxim Ziatdinov
Published in: Science Advances (2021)
DOI: 10.1126/sciadv.abd5084
class SpatialGenerator(nn.Module):
def forward(self, x, z):
# x is (batch, num_coords, 2)
# z is (batch, latent_dim)
if len(x.size()) < 3:
x = x.unsqueeze(0)
b = x.size(0)
n = x.size(1)
x = x.view(b*n, -1)
h_x = self.coord_linear(x)
h_x = h_x.view(b, n, -1)
h_z = 0
if len(z.size()) < 2:
z = z.unsqueeze(0)
h_z = self.latent_linear(z)
h_z = h_z.unsqueeze(1)
h_bi = 0
h = h_x + h_z + h_bi # (batch, num_coords, hidden_dim)
h = h.view(b*n, -1)
y = self.layers(h) # (batch*num_coords, nout)
y = y.view(b, n, -1)
if self.softplus: # only apply softplus to first output
y = torch.cat([F.softplus(y[:,:,:1]), y[:,:,1:]], 2)
return y
The GAN architecture is illustrated in Figure 1.
Let’s get started.
Since this is going to be the world’s lamest example, we simply generate data drawn from a Gaussian.
Let’s see what we got. This should be a Gaussian shifted in some rather arbitrary way with mean \(b\) and covariance matrix \(A^TA\).
d2l.set_figsize()
d2l.plt.scatter(d2l.numpy(data[:100, 0]), d2l.numpy(data[:100, 1]));
print(f'The covariance matrix is\n{d2l.matmul(A.T, A)}')
The covariance matrix is
tensor([[1.0100, 1.9500],
[1.9500, 4.2500]])
Key Components
class G_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, **kwargs):
super(G_block, self).__init__(**kwargs)
self.conv2d_trans = nn.ConvTranspose2d(in_channels, out_channels,
kernel_size, strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.ReLU()
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
Default Configuration: - Kernel size: \(4 \times 4\) - Stride: \(2 \times 2\) - Padding: \(1 \times 1\)
Effect: Doubles spatial dimensions
Example: \(16 \times 16 \rightarrow 32 \times 32\)
\[ \begin{aligned} n_h^{'} \times n_w^{'} &= [(k_h + s_h (n_h-1)- 2p_h] \times [(k_w + s_w (n_w-1)- 2p_w]\\ &= [(4 + 2 \times (16-1)- 2 \times 1] \times [(4 + 2 \times (16-1)- 2 \times 1]\\ &= 32 \times 32 \end{aligned} \]
Parameters: - Kernel: \(4 \times 4\) - Stride: \(1 \times 1\) - Padding: \(0 \times 0\)
Effect: Increases dimensions by 3
Example: \(1 \times 1 \rightarrow 4 \times 4\)
Progressive Upsampling: - Start: \(1 \times 1\) (latent vector) - End: \(64 \times 64\) (final image) - Channel progression: \(100 \rightarrow 512 \rightarrow 256 \rightarrow 128 \rightarrow 64 \rightarrow 3\)
n_G = 64
net_G = nn.Sequential(
G_block(in_channels=100, out_channels=n_G*8,
strides=1, padding=0), # Output: (512, 4, 4)
G_block(in_channels=n_G*8, out_channels=n_G*4), # Output: (256, 8, 8)
G_block(in_channels=n_G*4, out_channels=n_G*2), # Output: (128, 16, 16)
G_block(in_channels=n_G*2, out_channels=n_G), # Output: (64, 32, 32)
nn.ConvTranspose2d(in_channels=n_G, out_channels=3,
kernel_size=4, stride=2, padding=1, bias=False),
nn.Tanh()) # Output: (3, 64, 64)
Final Layer Details
Test the generator with a sample latent vector:
The discriminator is a normal convolutional network network except that it uses a leaky ReLU as its activation function. Given \(\alpha \in[0, 1]\), its definition is
\[\textrm{leaky ReLU}(x) = \begin{cases}x & \text{if}\ x > 0\\ \alpha x &\text{otherwise}\end{cases}.\]
As it can be seen, it is normal ReLU if \(\alpha=0\), and an identity function if \(\alpha=1\). For \(\alpha \in (0, 1)\), leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the “dying ReLU” problem that a neuron might always output a negative value and therefore cannot make any progress since the gradient of ReLU is 0.
The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyperparameters of the convolution layer are similar to the transpose convolution layer in the generator block.
class D_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, alpha=0.2, **kwargs):
super(D_block, self).__init__(**kwargs)
self.conv2d = nn.Conv2d(in_channels, out_channels, kernel_size,
strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.LeakyReLU(alpha, inplace=True)
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
A basic block with default settings will halve the width and height of the inputs, as we demonstrated in the Padding section. For example, given a input shape \(n_h = n_w = 16\), with a kernel shape \(k_h = k_w = 4\), a stride shape \(s_h = s_w = 2\), and a padding shape \(p_h = p_w = 1\), the output shape will be:
\[ \begin{aligned} n_h^{'} \times n_w^{'} &= \lfloor(n_h-k_h+2p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+2p_w+s_w)/s_w\rfloor\\ &= \lfloor(16-4+2\times 1+2)/2\rfloor \times \lfloor(16-4+2\times 1+2)/2\rfloor\\ &= 8 \times 8 .\\ \end{aligned} \]
n_D = 64
net_D = nn.Sequential(
D_block(n_D), # Output: (64, 32, 32)
D_block(in_channels=n_D, out_channels=n_D*2), # Output: (64 * 2, 16, 16)
D_block(in_channels=n_D*2, out_channels=n_D*4), # Output: (64 * 4, 8, 8)
D_block(in_channels=n_D*4, out_channels=n_D*8), # Output: (64 * 8, 4, 4)
nn.Conv2d(in_channels=n_D*8, out_channels=1,
kernel_size=4, bias=False)) # Output: (1, 1, 1)
It uses a convolution layer with output channel \(1\) as the last layer to obtain a single prediction value.
Compared to the basic GAN, we use the same learning rate for both generator and discriminator since they are similar to each other. In addition, we change \(\beta_1\) in Adam from \(0.9\) to \(0.5\). It decreases the smoothness of the momentum, the exponentially weighted moving average of past gradients, to take care of the rapid changing gradients because the generator and the discriminator fight with each other. Besides, the random generated noise Z
, is a 4-D tensor and we are using GPU to accelerate the computation.
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
device=d2l.try_gpu()):
loss = nn.BCEWithLogitsLoss(reduction='sum')
for w in net_D.parameters():
nn.init.normal_(w, 0, 0.02)
for w in net_G.parameters():
nn.init.normal_(w, 0, 0.02)
net_D, net_G = net_D.to(device), net_G.to(device)
trainer_hp = {'lr': lr, 'betas': [0.5,0.999]}
trainer_D = torch.optim.Adam(net_D.parameters(), **trainer_hp)
trainer_G = torch.optim.Adam(net_G.parameters(), **trainer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = torch.normal(0, 1, size=(batch_size, latent_dim, 1, 1))
X, Z = X.to(device), Z.to(device)
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, trainer_D),
d2l.update_G(Z, net_D, net_G, loss, trainer_G),
batch_size)
# Show generated examples
Z = torch.normal(0, 1, size=(21, latent_dim, 1, 1), device=device)
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z).permute(0, 2, 3, 1) / 2 + 0.5
imgs = torch.cat(
[torch.cat([
fake_x[i * 7 + j].cpu().detach() for j in range(7)], dim=1)
for i in range(len(fake_x)//7)], dim=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs)
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
f'{metric[2] / timer.stop():.1f} examples/sec on {str(device)}')
We train the model with a small number of epochs just for demonstration. For better performance, the variable num_epochs
can be set to a larger number.
The dataset we will use is a collection of Pokemon sprites obtained from pokemondb. First download, extract and load this dataset.
d2l.DATA_HUB['pokemon'] = (d2l.DATA_URL + 'pokemon.zip',
'c065c0e2593b8b161a2d7873e42418bf6a21106c')
data_dir = d2l.download_extract('pokemon')
print(data_dir)
pokemon = torchvision.datasets.ImageFolder(data_dir)
# pokemon = torchvision.datasets.EMNIST(root='./', split='digits', download=True)
../data/pokemon
ToTensor()
: Maps pixel values to \([0, 1]\)tanh
activation for \([-1, 1]\) rangeWhy This Normalization?
The normalization ensures compatibility between:
ToTensor()
)tanh
activation)batch_size = 256
transformer = torchvision.transforms.Compose([
torchvision.transforms.Resize((64, 64)), # Resize images
torchvision.transforms.ToTensor(), # Scale to [0,1]
torchvision.transforms.Normalize(0.5, 0.5) # Map to [-1,1]
])
pokemon.transform = transformer
data_iter = torch.utils.data.DataLoader(
pokemon, batch_size=batch_size,
shuffle=True, num_workers=1)
Key Points
The generator needs to map the noise variable \(\mathbf z\in\mathbb R^d\), a length-\(d\) vector, to a RGB image with width and height to be \(64\times 64\) . In :numref:sec_fcn
we introduced the fully convolutional network that uses transposed convolution layer (refer to :numref:sec_transposed_conv
) to enlarge input size. The basic block of the generator contains a transposed convolution layer followed by the batch normalization and ReLU activation.
class G_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, **kwargs):
super(G_block, self).__init__(**kwargs)
self.conv2d_trans = nn.ConvTranspose2d(in_channels, out_channels,
kernel_size, strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.ReLU()
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
In default, the transposed convolution layer uses a \(k_h = k_w = 4\) kernel, a \(s_h = s_w = 2\) strides, and a \(p_h = p_w = 1\) padding. With a input shape of \(n_h^{'} \times n_w^{'} = 16 \times 16\), the generator block will double input’s width and height.
\[ \begin{aligned} n_h^{'} \times n_w^{'} &= [(n_h k_h - (n_h-1)(k_h-s_h)- 2p_h] \times [(n_w k_w - (n_w-1)(k_w-s_w)- 2p_w]\\ &= [(k_h + s_h (n_h-1)- 2p_h] \times [(k_w + s_w (n_w-1)- 2p_w]\\ &= [(4 + 2 \times (16-1)- 2 \times 1] \times [(4 + 2 \times (16-1)- 2 \times 1]\\ &= 32 \times 32 .\\ \end{aligned} \]
If changing the transposed convolution layer to a \(4\times 4\) kernel, \(1\times 1\) strides and zero padding. With a input size of \(1 \times 1\), the output will have its width and height increased by 3 respectively.
torch.Size([2, 20, 4, 4])
The generator consists of four basic blocks that increase input’s both width and height from 1 to 32. At the same time, it first projects the latent variable into \(64\times 8\) channels, and then halve the channels each time. At last, a transposed convolution layer is used to generate the output. It further doubles the width and height to match the desired \(64\times 64\) shape, and reduces the channel size to \(3\). The tanh activation function is applied to project output values into the \((-1, 1)\) range.
n_G = 64
net_G = nn.Sequential(
G_block(in_channels=100, out_channels=n_G*8,
strides=1, padding=0), # Output: (64 * 8, 4, 4)
G_block(in_channels=n_G*8, out_channels=n_G*4), # Output: (64 * 4, 8, 8)
G_block(in_channels=n_G*4, out_channels=n_G*2), # Output: (64 * 2, 16, 16)
G_block(in_channels=n_G*2, out_channels=n_G), # Output: (64, 32, 32)
nn.ConvTranspose2d(in_channels=n_G, out_channels=3,
kernel_size=4, stride=2, padding=1, bias=False),
nn.Tanh()) # Output: (3, 64, 64)
Generate a 100 dimensional latent variable to verify the generator’s output shape.
The discriminator is a normal convolutional network network except that it uses a leaky ReLU as its activation function. Given \(\alpha \in[0, 1]\), its definition is
\[\textrm{leaky ReLU}(x) = \begin{cases}x & \text{if}\ x > 0\\ \alpha x &\text{otherwise}\end{cases}.\]
As it can be seen, it is normal ReLU if \(\alpha=0\), and an identity function if \(\alpha=1\). For \(\alpha \in (0, 1)\), leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the “dying ReLU” problem and helps the gradients flow easier through the architecture.
The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyperparameters of the convolution layer are similar to the transpose convolution layer in the generator block.
class D_block(nn.Module):
def __init__(self, out_channels, in_channels=3, kernel_size=4, strides=2,
padding=1, alpha=0.2, **kwargs):
super(D_block, self).__init__(**kwargs)
self.conv2d = nn.Conv2d(in_channels, out_channels, kernel_size,
strides, padding, bias=False)
self.batch_norm = nn.BatchNorm2d(out_channels)
self.activation = nn.LeakyReLU(alpha, inplace=True)
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
A basic block with default settings will halve the width and height of the inputs, as we demonstrated in :numref:sec_padding
. For example, given a input shape \(n_h = n_w = 16\), with a kernel shape \(k_h = k_w = 4\), a stride shape \(s_h = s_w = 2\), and a padding shape \(p_h = p_w = 1\), the output shape will be:
\[ \begin{aligned} n_h^{'} \times n_w^{'} &= \lfloor(n_h-k_h+2p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+2p_w+s_w)/s_w\rfloor\\ &= \lfloor(16-4+2\times 1+2)/2\rfloor \times \lfloor(16-4+2\times 1+2)/2\rfloor\\ &= 8 \times 8 .\\ \end{aligned} \]
n_D = 64
net_D = nn.Sequential(
D_block(n_D), # Output: (64, 32, 32)
D_block(in_channels=n_D, out_channels=n_D*2), # Output: (64 * 2, 16, 16)
D_block(in_channels=n_D*2, out_channels=n_D*4), # Output: (64 * 4, 8, 8)
D_block(in_channels=n_D*4, out_channels=n_D*8), # Output: (64 * 8, 4, 4)
nn.Conv2d(in_channels=n_D*8, out_channels=1,
kernel_size=4, bias=False)) # Output: (1, 1, 1)
It uses a convolution layer with output channel \(1\) as the last layer to obtain a single prediction value.
Compared to the basic GAN in :numref:sec_basic_gan
, we use the same learning rate for both generator and discriminator since they are similar to each other. In addition, we change \(\beta_1\) in Adam (:numref:sec_adam
) from \(0.9\) to \(0.5\). It decreases the smoothness of the momentum, the exponentially weighted moving average of past gradients, to take care of the rapid changing gradients because the generator and the discriminator fight with each other. Besides, the random generated noise Z
, is a 4-D tensor and we are using GPU to accelerate the computation.
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
device=d2l.try_gpu()):
loss = nn.BCEWithLogitsLoss(reduction='sum')
for w in net_D.parameters():
nn.init.normal_(w, 0, 0.02)
for w in net_G.parameters():
nn.init.normal_(w, 0, 0.02)
net_D, net_G = net_D.to(device), net_G.to(device)
trainer_hp = {'lr': lr, 'betas': [0.5,0.999]}
trainer_D = torch.optim.Adam(net_D.parameters(), **trainer_hp)
trainer_G = torch.optim.Adam(net_G.parameters(), **trainer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = torch.normal(0, 1, size=(batch_size, latent_dim, 1, 1))
X, Z = X.to(device), Z.to(device)
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, trainer_D),
d2l.update_G(Z, net_D, net_G, loss, trainer_G),
batch_size)
# Show generated examples
Z = torch.normal(0, 1, size=(21, latent_dim, 1, 1), device=device)
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z).permute(0, 2, 3, 1) / 2 + 0.5
imgs = torch.cat(
[torch.cat([
fake_x[i * 7 + j].cpu().detach() for j in range(7)], dim=1)
for i in range(len(fake_x)//7)], dim=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs)
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
f'{metric[2] / timer.stop():.1f} examples/sec on {str(device)}')
We train the model with a small number of epochs just for demonstration. For better performance, the variable num_epochs
can be set to a larger number.
Authors: Abid Khan, Chia-Hao Lee, Pinshane Y. Huang, Bryan K. Clark
Published in: npj Computational Materials (2023)
DOI: 10.1038/s41524-023-01042-3
Authors: Guillaume Lambard, Kazuhiko Yamazaki, Masahiko Demura
Published in: Scientific Reports (2023)
DOI: 10.1038/s41598-023-27574-8
©Philipp Pelz - FAU Erlangen-Nürnberg - Data Science for Electron Microscopy