The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster. ——Hinton

First quote one of the three giants of deep learning Geoffrey Hinton（ Geoffrey · Xin ton ） As a beginning .

CNN It's a very hot model now , First of all, we all know , adopt pooling Layer can learn some high-order features , For example, for face recognition, nose recognition can be activated , mouth , eyes etc. .

The great gods saw CNN The strength of the model began to think about the essence of human nature , So what's the time when it doesn't understand ？

answer ： It does .

say concretely ,pooling It is not possible to learn from which layer of features these features are learned , It's just ** It means that the mutual spatial relationship between higher-order features will be lost , Spatial level information cannot be obtained **. That is, chaotic Can't represent a face . This is it. pooling Defects in the layer .

ok , In short, face blindness .

Let's go on to see , The process of getting images by computer is layer by layer , From the internal representation of the image to the overall image representation . But people's understanding of images is just the opposite ！

It's time for science popularization ：

The key point of human brain's understanding of image is the understanding of image position and posture , That is, even if the image is rotated , Translating the human brain can still recognize images , But computers don't . Therefore, it is proposed that ** Capsule network CapsNet**.

So let's take a serious look at what a capsule is （ Anyway, it's not the little pill you took ）

** One ： Definition of capsule **

capsule （Capsule） It's a , A feature vector containing multiple neurons . Each neuron can represent various attributes of a specific entity in the image , Like posture （ Location , size , Direction ）, texture , Deformation, etc. .

Capsule to ** vector ** The form encapsulates the various attribute representations of features . The value is the probability of existence of this attribute , It will change with the spatial change of features , If the length of the vector remains constant , The high-level characteristics of the whole capsule remain the same , This is it. Hinton Proposed activities, etc , The significance of this invariance is higher than pooling The invariance of .

After understanding the above concepts , Let's continue to look at its calculation process .

** Two ： Capsule convolution operation process **

1. Matrix multiplication of input vectors

2. The scalar weighting of the input vector

3. The sum of weighted input vectors

4. Nonlinear transformation from vector to vector

In short, that's the first thing , Just do that again ～

Okay , Return to dry goods ：

$u_1$、$u_2$、$u_3$ It's from the bottom 3 A capsule , The length of the vector encodes the probability of the corresponding feature of the lower capsule .

that

$w_{1j}$、$w_{2j}$、$w_{3j}$ It can encode the spatial relationship between high-level features and low-level features .

Neural networks learn parameters through back propagation , The capsule network is through “ Dynamic routing ” Algorithm to update .

The low-level capsule needs to decide which high-level capsule its output is output to . Through the study $c_i$ To activate which direction the capsule is mapped .

So the dynamic routing algorithm is $u_j$ Obey a certain distribution , Each layer of capsules will gather relatively , Then, which high-level capsule is mapped to the low-level capsule , It is judged by the closer capsule aggregation predicted by this .

Next, the nonlinear transformation from vector to vector uses a novel nonlinear activation function , Take a vector , And then without changing direction , Compress its length to 1 following . Namely $squash(\bullet)$：

In order to make it clear to everyone , The whole picture , Describe the whole learning process more vividly ：️

What about? , Did you see it all at once ^ ^

** 3、 ... and ： Dynamic routing algorithm **

All right, let's move on ️

From the above algorithm process, we can understand , The input is the output of the linear transformation of all capsules in the lower layer $\hat{u}_{j|i}$ And the number of routing iterations $r$ And layers $l$ . Defines a zero time variable $b_{ij}$ Initialize to 0, It will be updated during the iteration ,$c_i \leftarrow softmax(b_i)$ Is all the weight of the lower capsule .

A simple example

Weight distribution process ：$b_{ij }$ Initialize to 0, The first iteration , Suppose there is 3 A low-level capsule ,2 A high-rise capsule , that $c_{ij}$ Will be equal to 0.5, All the weights $c_{ij}$ All equal .

With the iteration, the low-level capsule can point to the corresponding high-level capsule according to this weight .$s_j \leftarrow \sum_i c_{ij} \hat{\mathbf{u}}_{j|i}$ Is to make a linear combination of each capsule , And then through $squash$ Function to get the weight vector with the same transfer direction . Finally, the corresponding weight $b_{ij}$.

（ Um. ！ Do the great gods think it's very simple (o^^o)）

High level capsule $j$ The current output and from the lower level capsule $i$ Output the received input and do dot product , Plus the weight of the previous round $b_{ij}$, Get updated $b_{ij}$. Dot product can characterize the similarity between Capsules , In fact, it means learning the characteristics of low-level Capsules , It's about $CNN$ The learning effect is consistent .

It is obvious from the above figure that ,$\hat{u}_{1|1}$ Not like the black vector above , $\hat{u}_{2|1}$ It's similar to the black vector above , Then the routing weight $c_{11}$ It will reduce , and $c_{12}$ Will increase . Thus, the learning of low-level capsules can have the best matching .

Sum up ！ A simple popular science teaching article is over , Who knows ！

therefore ————

Understand the applause ！

In the next article, let's talk briefly 「 Similar to three companies 」：DNN、RNN、CNN Inside RNN Well ！（ Recursive neural network ）