From the way we interact to the way we conduct businesses, the advancements in technology, especially in the fields of Artificial Intelligence, are continuously changing the way we interact with the world. While there may still be a long wait before we can experience a machine making human-like decisions, there have been remarkable developments made in this field.
This ability of the machines to perform the most complex or mundane tasks efficiently has been made possible by imparting human-like intelligence to the machines and neural networks are at the core of this revolution. There are various variants of neural networks, each having its own unique characteristics and in this blog, we will understand the difference between Convolution Neural Networks and Recurrent Neural Networks, which are probably the most widely used variants. But first, it is imperative that we understand what a Neural Network is.
Introduction to Neural Networks:
The human brain, with approximately 100 billion neurons, is the most complex but powerful computing machine known to mankind. Neural networks aim to impart similar knowledge and decision-making capabilities to machines by imitating the same complex structure in computer systems.
Neural networks are a subset of machine learning. They analyse a training data set, correlate the patterns in the data by assigning weights along different paths and tune parameters like learning rate before being ready for use at the optimal cost function.
Like in the human brain, the basic building block in a neural network is a neuron, which takes in some inputs and fires an output based on a predetermined function, called an activation function, on the inputs. Architecturally, a neural network is modelled using layers of artificial neurons, which apply the activation function on the received inputs and after comparing it with a threshold, determine if the message has to be passed to the next layer.
The first layer is called the input layer, the last layer the output layer and all layers between the input and output layers are called hidden layers. Each layer can contain a single or a collection of neurons. Generally, a neural network with more than one hidden layer is called a deep neural network. Most of the neural networks used today are feed-forward systems.
This means that there is only a unidirectional flow of data from a node to several other nodes in the layer above it. The most basic model to understand the feed-forward neural networks can be done with the help one hidden layer, as shown in the following figure.
While neural networks are extremely powerful to solve even the most complex of problems, they are considered as black-box algorithms since their inner workings are very abstruse and with greater complexity, more resources are needed for the neural network to run.
We initially set random weights and thresholds and the nodes train by themselves by adjusting the weight and threshold according to the training data. To solve complex problems, we can keep on adding a combination of hidden layers, number of neurons in each layer, number of paths in each layer, and the like, but care must be taken as to not overfit the data.
Now that we understand the basics of neural networks, we can wipe deep into understanding the differences between the two most commonly used neural network variants – Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
Convolutional Neural Network (CNN):
These are multi-layer neural networks which are widely used in the field of Computer Vision. CNN’s reduce an image to its key features by using the convolution operation with the help of the filters or kernels. This function is executed by the hidden layers, which are convolution layers, pooling layers, fully connected layers and normalisation layers. A simple architecture of CNN can be shown with the help of the following figure.
The first layer is always the convolution layer. It has three spatial dimensions (length, width and depth). The layers are not fully connected, meaning that the neurons from one layer might not connect to every neuron in the subsequent layer. Mathematically, convolution involves passing the input through filters to transform the data into the relevant output, which serves as the input for the pooling layer. Thus, convolution operates on two matrices, an image matrix and a kernel matrix, to give an output matrix.
Pooling layer is used to reduce the dimensionality of a matrix to help analyse the features in the sub-regions of the image. The common types of pooling functions are max pooling and min pooling. Max pooling filters the maximum value in a sub-region while min pooling filters the minimum value in a sub-region.
Thus, CNN introduces non-linearity with the help of multiple convolution layers and pooling which makes it effective to handle complex spatial data (images). The condensed feature map from the last pooling layer is then sent to the fully connected layer, which flattens the maps and gives the output in the form of a single vector of probabilities organised according to the depth. The class with the highest probability is assumed to be the most accurate solution.
It takes a fixed input and gives a fixed output, which reduces the flexibility of the CNN but helps with computing results faster. They require fewer hyperparameters and less supervision, but are very resource-intensive and needs huge training data to give the most accurate results. The common applications where CNNs are used are object detection, image classification, biometrics, medical analysis and image segmentation.
Recurrent Neural Network (RNN):
These are multi-layer neural networks which are widely used to process temporal or sequential information like natural language processing, stock price, temperatures, etc. They have a memory field which captures the information about the calculations from previous inputs and helps perform the recurrent task efficiently for every element in the sequence.
Thus, the output of a particular step is determined by the input of the particular strep and all the previous outputs until that step. The various forms of conversions from input to output can be one-to-one, one-to-many, many-to-one r many-to-many. RNNs can be explained with the help of the following figure.
RNNs are feedback neural networks, which means that the links between the layers allow for feedback to travel in a reverse direction. This helps the neural network to learn contextual information. For repeated patterns, more weight is applied to the previous patterns than the one being currently evaluated.
As it can be seen from the figure above, RNNs share a parakeet across the subsequent steps. This phenomenon, known as parameter sharing, helps the RNN to create more efficient neural networks by reducing the computational costs since fewer parameters have to be trained.
Theoretically, RNNs store information about all the inputs evaluated till a particular time t. However, this makes it very difficult to train as they are very resource-intensive and inefficient. Therefore, in practice, RNNs are only limited to the memory of a few layers before time t. They are also more flexible with the dimensions of the input and output since they can evaluate inputs and outputs having arbitrary lengths, as opposed to CNN’s.
RNN CNN Hybrid Neural Networks:
Despite their dissimilarity, RNNs and CNNs are not mutually exclusive and can be used in conjunction with the other to solve more complex problems. The advantages of both the neural networks can help solve the problems which require both temporal and spatial characterisation with increased effectiveness, a problem which CNN or RNN cannot individually provide the best results for. This hybrid model, called a CRNN, has a unique architecture.
The input is first fed to CNN layers and the output from CNN is fed to RNN layers, which helps solve both the temporal and spatial problems. Some common examples of such complex problems are video labelling, gesture recognition, DNA sequence prediction, etc.
To encapsulate, both CNN and RNN are very popular variants of Neural Networks, each having their own advantages and disadvantages. Choosing the right variant for a particular application depends on various factors like the type of input and the requirements of the application. While individually they might be able to solve a particular set of problems, more advanced problems can be solved with the help of a hybrid of the two networks.
We hope that this article was informative for you. Do check our website for more details.
By Saarthak Jain