ELI5: Max pooling(CNN) vs Capsule Network

#ArtificialIntelligence: ELI5: Max pooling(CNN) vs Capsule Network

I’ve read some articles on CNNs vs Capsule Networks and am still wrapping my head on the difference between the two

This is a quote from Geoffrey Hinton when he introduced CapsNet

‘Hinton: “The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.”’

I wanted to dive more into what this max pooling step is. From my understanding CNN works by having Convolutional layers which use kernels like the Sobel operator to identify different features like edges, etc. We could have different kernels for more complex features like noise, ear, etc

Is Max pooling at a high level just combining all the features that are found throughout the different Convolutional layers? -Face, noise, ear during facial recognition

Because of this, the CNN predicts both of these images to be faces – https://hackernoon.com/hn-images/1*R5sjf4XtV9FApC7N39mPew.png despite the fact that one is clearly not positionally.

I believe CapsNet was developed to solve this problem(position, translational/orientation equivariance). Can someone give a high level ELI5 overview of how CapsNet solves this problem and differentiates from ConvsNet? From reading, is it because CapsNet uses vectors(maintain position information) while ConvsNet uses scalars(matrix multiplication)

submitted by /u/Truetree9999 to r/artificial
[link] [comments]

top scoring links : multi

Related posts

Leave a Comment