A recurrent neural network (RNN) is a deep learning neural network that is trained to convert sequential inputs into specific sequential outputs. A traditional neural network isn't able to remember past data, which is what makes RNN useful.
A neural network is a computational system inspired by the structure of the human brain, composed of artificial neurons. These networks are engineered to replicate human decision-making processes. Traditional neural networks, however, typically process each input independently, without the capacity to consider the sequence or context of data. For instance, in processing the words "red apple", a standard neural network would fail to recognise "red" as an attribute describing the apple—completely missing the contextual link between the two.
This is a major limitation, and one that could easily prevent machines from ever developing anything close to what could be considered intelligence. To address this shortcoming, researchers developed recurrent neural networks (RNNs). Unlike traditional models, RNNs incorporate mechanisms to retain information over periods, allowing them to maintain a memory of previous inputs. This capability enables RNNs to understand sequences and contexts within data, making them particularly useful for tasks where order is crucial, such as language processing or time series analysis.
A recurrent neural network is designed as a form of AI decision-making capable of recognising and retaining the sequence in which data appears. This is a critical feature for processing sequential information such as text, numbers or time-series data. Unlike traditional neural networks, which treat each input independently, RNNs can connect previous information to present inputs, allowing for a more nuanced understanding of data sequences.
While the concept of the recurrent neural network was a major game changer when it was introduced—setting the foundation for creating deep learning models—it is largely being replaced by transformer-based artificial intelligence and large language models (LLM). These new developments are more efficient at processing sequential data.
RNNs are distinguished by their ability to process sequences of data by recognising relationships and dependencies between individual elements. While all RNNs share this foundational characteristic, each of the following categories are designed to address specific types of data-processing challenges. Here are the four main types:
One-to-one
The simplest form of an RNN, one-to-one describes networks where one input is processed to generate one output. This model serves as the basis for more complex RNN architectures.One-to-many
In this configuration, a single input generates multiple outputs. This type of RNN is ideal for tasks where an input may trigger a series of related but distinct outputs, such as generating a sentence from a single descriptive word or producing a melody from a musical note.Many-to-one
The many-to-one model processes multiple input data points to produce a single output. It is commonly used in applications like sentiment analysis, where various words (inputs) contribute to determining the overall sentiment (output) of a phrase or documentMany-to-many
This last variation handles sequences both in the inputs and outputs. It is suitable for tasks such as language translation, where an input sequence of words in one language is converted into an output sequence in another language. This model can also handle situations where the input and output sequences differ in length.
A neural network that does not have looping nodes is called a feedforward neural network. These kinds of networks are similar to RNN in that both models attempt to process data in a human-like way with many interconnected nodes. However, a feedforward neural network only passes information forward, and the model cannot remember any past input information. Using the above example, this model wouldn't remember red by the time it has processed apple.
Instead, the feedforward neural network works by moving information from the input layer to the output layer, including any hidden layers. This type of model works well for image classification where the input and output are independent. Still, this network differs from the RNN networks because it cannot remember sequence like a recurrent network can.
The RNN architecture has three main variants, each adapted from the basic structure to enhance functionality and performance for specific tasks. This flexibility in design helps cater to the unique demands of various data sequence processing tasks. The following variants modify how data is processed and outputted, allowing for more specialised applications across a range of fields:
A bidirectional recurrent neural network (BRNN) processes data sequences forwards and backwards. The forward layer works very similarly to the RNN first layer, but the backwards layer works differently to move information in a second direction. Combining both layers increases prediction accuracy.
Long short-term memory (LSTM) is a model designed to carry longer memory. A basic RNN can only remember the immediate last input. LSTM can use inputs from previous sequences to improve its prediction accuracy. Consider this simplified data entry example: The apple is red. Ann only loves red apples. A LSTM would remember that the apple is red when processing information about what kinds of apples are important in this situation. An RNN would not remember that the apple is red, because that information was presented in a previous sequence.
A gated recurrent unit (GRU) is a sophisticated variant of the standard recurrent neural network designed to address some of the limitations related to memory retention. GRUs incorporate gates—mechanisms that regulate the flow of information. These include the update gate, which determines how much past information (from previous steps) should be retained, and the reset gate, which decides how much of the past information to forget. This allows GRUs to selectively retain or discard information, making them highly effective for tasks where understanding the context or sequence of events is crucial.
RNNs are highly versatile in handling data that involves sequences, making them suitable for a wide range of applications. Here are some of the most common uses:
Language modelling and generating text
RNNs can predict the next word in a sentence based on previous words, which is crucial for tasks like auto-completion in search engines or generating readable text automatically.Speech recognition
These networks can process audio data over time, making them ideal for recognising spoken words in real-time and converting them into text, as seen in virtual assistants and mobile voice-to-text applications.Machine translation
RNNs can analyse sequences of words in one language and convert them into another, maintaining grammatical and contextual accuracy in the translation process.Image recognition
Although not as common as other models like CNNs for this task, RNNs can be used for analysing sequences within images, such as reading handwritten text or processing video frames sequentially.Time series forecasting
RNNs are well-suited for predicting future values in a series based on historical data, applicable in fields like stock market forecasting, weather prediction and demand forecasting in retail.
There are some challenges that come with using an RNN, which is part of the reasoning behind replacing them with newer neural networks and variations. These are four of the biggest obstacles with using a recurrent neural network:
Exploding gradient
The gradient refers to the sensitivity of the error rate corresponding to the model's parameters. If the gradient increases exponentially, it can become unstable. When that happens, it is considered an exploding gradient. That type of error can lead to overfitting, which is a phenomenon where the model can predict accurately with training data but cannot do the same with real-world data.Vanishing gradient
This challenge arises when the gradient values decrease to near zero during training, which significantly slows down the learning process or stops it altogether. A vanishing gradient makes it difficult for the RNN to capture and learn from the training data effectively, often leading to underfitting, where the model cannot generalise well to new data.Difficulty in processing long sequences
RNNs can struggle with long data sequences. This limitation arises because the relevant information can get diluted over long sequences, hindering the model's ability to learn effectively from such data.Slow training time.
Since an RNN processes data sequentially, it can't process large amounts of information simultaneously. This sequential processing results in longer training times, making RNNs less efficient compared to other models that can process data in parallel, such as transformers.
Besides the ability to process information sequentially, there are a few other main advantages of relying on a recurrent neural network:
RNNs are equipped with structures like long short-term memory (LSTM) units that enable them to remember information over extended periods. This feature is crucial for tasks where understanding past context is necessary to make accurate predictions about future events.
RNNs can be combined with convolutional neural networks (CNNs) to enhance their capability in processing spatial data, such as images and videos. This combination allows RNNs to not only recognise patterns over time but also extend their 'field of view' in terms of pixel data, enhancing the analysis of sequences in visual inputs.
Unlike many other neural network architectures, RNNs can handle input sequences of varying lengths without needing input reshaping or resizing. This makes them highly versatile for applications such as speech recognition where the duration of input data can vary significantly.
RNNs are inherently designed to process sequences where the timing between events is crucial. This makes them exceptionally good for applications like stock price prediction, musical composition and other time-sensitive analyses where the sequence and timing of historical data points are critical for predicting the future.
As stated, RNNs are made up of artificial neurons designed to mimic human decision-making. These artificial neurons are data-processing nodes that work together to perform complex tasks. The neurons are organised into several main layers: input, output and hidden layers. The input layer receives the information to process, and the output layer provides the result. Data processing, analysis and prediction take place in the hidden layer.
An RNN works by passing the sequential data it receives through the hidden layers one step at a time. There is, however, a recurrent workflow or self-looping feature in the hidden layer. The hidden layer can remember and utilise previous inputs for future predictions in its short-term memory. The current input will be stored in the memory to predict in the next sequence.
For example, consider the sequence: Rain is wet. Users want an RNN to predict the idea of wet when it receives the input rain. The hidden layer would process and store the idea of rain. A copy is stored in its memory. Then when it receives wet, it can recall rain from its memory and create a full sequence. That information can then be used to improve accuracy. This function is what makes an RNN useful in speech recognition, translation and other language modelling tasks.
Machine learning engineers often train neural networks like RNNs by feeding the model training data and then refining its performance. Neurons in neural models are given 'weights' that signal how influential information learnt during training is when predicting the output. Each layer in an RNN will initially share the same weight.
Engineers then adjust the weights as the model learns to determine prediction accuracy. To do this, they rely on a technique called backpropagation through time (BPTT) to calculate model error and adjust its weight accordingly. Engineers can do this to identify which hidden state in the sequence is causing a significant error and readjust the weight to reduce the error margin.
Machine learning engineers build out a recurrent neural network using their coding languages of choice, like Python. Regardless of how they choose to do it, these are the general steps to implement an RNN:
Create the input layer
The first step is to create a layer that can gather input data. This layer is made up of artificial neurons.Create hidden states
RNN models can have multiple hidden layers that do the actual processing for the neural network. These layers are also made up of artificial neurons that are interconnected. That helps mimic human predicting abilities and makes sequencing possible.Create the output layer
This final layer predicts the outcomes. Some models could contain further layers downstream as well.Train with weights
The exact parameters and error margins are created after engineers train the model with data. It can take time to adjust the weights exactly and to avoid vanishing or exploding gradients.
Recurrent neural networks have laid a strong foundation for sequential data processing. However, they do have limitations that have led to many companies relying on newer and more advanced models and artificial intelligence for their needs. That's why the Now Platform® from ServiceNow uses advanced machine learning and generative AI. This includes machine learning frameworks, natural language understanding, search and automation, and analytics and process mining—innovative and cutting-edge AI technologies that work together to grow your business.
Demo ServiceNow today to see how new neural network solutions can be your next step on the path to success.