Recently, I attended a lecture on geometric deep learning. The lecture was really amazing. Came to learn about a completely different perspective about methods that I already knew, like CNNs, Transformers. In this lecture I came across the concept of circulant matrices.
Circulant Matrix is a special matrix where each row is a cyclic shift of the previous row. In mathematical terms, a circulant matrix C of size n×n is a square matrix where each row is obtained by shifting the previous row one position to the right (or left) in a circular manner. It can be written as:
We can represent a shift operator as a matrix multiplication with a circulant matrix. So if we want to shift the columns of a matrix to the right by one we can use a circulant matrix to do it. For example:
If we put the circulant matrix on the right during the matrix multiplication, we can produce a shift in terms of the rows.
Now, it is easy to see that we can actually create any permuation of the matrix rows or columns depending on how we place the 1 entries in the rows of the matrix.
Let us denote the above matrix by S (right shift):
S=⎣⎡0001100001000010⎦⎤
Applying the S matrix multiple k-times will shift the original matrix to the right by k:
S2=⎣⎡0010000110000100⎦⎤
Inverse of a shift operator is its transpose, i.e. S−1=S⊤.
Shift operator are orthogonal, i.e. S⊤S=I
Using the shift operation as above we can move an image for example to the left or right or up and down using matrix multiplications.
Circular convolution as multiplication by circulant matrices
Let us look at multiplication of a circulant matrix C with a vector x:
If we now see how the individual elements of y are computed:
y0=c0x0+c1x1+c2x2+⋯+cn−1xn−1
y1=cn−1x0+c0x1+c1x2+⋯+cn−2xn−1
and so on.
This can be rewritten as
yk=i=0∑nci−kxi
Here the subscripts can be considered to be circular so c−1=cn−1 and so on. Then the above sum represents a circular convolution. In this way we see that a convolution operation can be represented as multiplcation with a circulant matrix. This can be easily extended to two dimensions, then we have the concept of block circulant matrices.