How to calculate output size of convolution layer for following linear layer input size


In using Convolutional Network (CNN), one step is to calculate the output size after convolution and pooling steps, so we can pipe the outputs
to a fully collected linear layer.

Taking a one dimensional CNN in pytorch as an example,

self.conv1 = nn.Conv1d(in_channels=1, out_channels=64, kernel_size=3, padding=1)

important parameters includes input_channels, here we set as 1, it could be more than 1 depending on how many
features you have. In the 2d image case, the inut channels could be RGB, in the 1d data case, could be multiple sensor
measurements in a time series problem. the output_channels are dertermined by the number of filters we use
in our example, we set it as 64, meaning we are using 64 filters. kernel_size is the size of the
convolutional filter we use. padding is used to adjust the spatial resolution.

So the number of total output size of this 1d convolution module equals out_channels multiplied by the output size of
each filter. And the equation to calculate the exact output size of each filter is:

From the formula, we can see it is affected by many factors including input_size of the sequence for each channel (1 chanel in our example),
kernel size, padding size and Strid size (usually set to 1).

Padding in Convolutional Layers:

Padding refers to the addition of extra elements (typically zeros) to the input data before the convolution operation is performed.
This is done to control the size of the output of the convolutional layer.

Effect of Padding:

The primary purpose of padding is to allow control over the spatial dimensions (in this case, the length) of the output tensor.
With padding, you can preserve the size of the input, increase it, or control the amount by which it decreases.
Specifically, padding can be used to ensure that the output size of the layer is the same as the input size, which is common in many CNN architectures to maintain the spatial resolution of the input through the network layers.

Padding Value of 1:

A padding value of 1 means that one element of padding is added to each side of the input.
In the context of a 1D convolution, this would add one zero-value element to the beginning and one to the end of the input sequence.
For instance, if your input sequence is [a, b, c, d], with padding=1, it effectively becomes [0, a, b, c, d, 0] before the convolution operation is applied.

Impact on Output Size:
When the kernel size is 3 and padding is 1 (as in your example), the convolutional layer will produce an output that has the same length as the input. This is because the padding compensates for the reduction in size that would otherwise occur due to the convolution operation.

Using the above formula, we can get the output size for each input sequence. In our example, since kernel_size is 3, setting padding to be 1, will
make the output size for each input sequence is the same as the input size, easy. In pytorch, we can also set padding='same', it will automatically
adjust the padding size to make sure the output size is the same as input size for each input sequence.

After the convolution step, it then follow the pooling step, which could be:

self.pool = nn.MaxPool1d(kernel_size = 2)

in this example, we basically half the size of the output from convolution. So in the end, each input sequence will become half of the input_size.

Now to get the total output size of two above steps, we need to multiply the number of filters we use. In our example, we use 64 filters,
so the final output size should be:

64* (input_size//2 )

And this size should be the input size to a fillowing fully connected layer.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC