What are transformer models?

What are transfomer models?

Transformer models are neural network models that learn context and meaning by tracking relationships in data through a mechanism called self-attention. They are capable of identifying subtle connections allowing them to take input sequences and use them to create relevant outputs. Transformers revolutionized AI by enabling breakthrough performance in natural language processing, computer vision, and generative AI across large language models, translation, and complex reasoning.

Demo AI

Things to know about transformer models

What is the origin of transformers models?

What are the different types of transformer models?

How are transformers different from other neural networks?

Why are transformers important?

What are key transformer components?

How do transformers work?

What are some transformer use cases?

Transformer models in the ServiceNow AI Platform

In artificial intelligence, accurately understanding and processing human language has always been a significant challenge. Traditional models struggled with capturing complexities and nuances, often falling short in tasks requiring contextual understanding. This need—the demand for more sophisticated language models—grew as applications like real-time translation and intelligent virtual assistants became more integrated into everyday life. But at its core, the problem was one that extended beyond language into other aspects of AI: the difficulty of identifying and understanding the relationships between data points in complex sets.

Transformer models were created to address this issue. Transformer models leverage advanced techniques to understand context and connections within data. Applying detailed mathematical models, they help an AI system organize the chaos of input so that it may comprehend its intended meaning.

Expand All

Collapse All

What is the origin of transformers models?

Transformer models originated from a groundbreaking 2017 research paper titled "Attention is All You Need," which introduced a new neural network architecture that utilized a mechanism called self-attention to process and understand the context within sequences of data. The concept of attention, which is foundational to Transformers, was itself introduced in 2014 by Dzmitry Bahdanau et al. Bahdanau is a Research Scientist at ServiceNow Research. The name "transformer" was chosen to reflect the model's ability to transform input representations into more meaningful output representations.

The development of the first transformer model marked a significant leap in AI capabilities. The model was trained in less than four days—a significant improvement over the longer and more resource-intensive training times of previous models. Coupled with the model's ability to set new accuracy records in machine translation, this highlighted the potential of transformers.

Transformers led to new advancements in natural language processing (NLP) and laid the foundation for future large language models such as new generative AI (GenAI) solutions. The introduction of transformers has not only enhanced the accuracy and efficiency of language processing; it has paved the way for the creation of more versatile AI applications, cementing its role as an essential element of modern AI.

Introducing Now Intelligence

Find out how ServiceNow is taking AI and analytics out of the labs to transform the way enterprises work and accelerate digital transformation.

Get Ebook

What are the different types of transformer models?

As transformer models continue to expand to meet the needs of AI researchers and computer scientists, they are also seeing increased specialization. Distinct categories and types of transformers are evolving to meet specific needs. The following are some of the architectures that are found in modern transformers:

BERT

Bidirectional encoder representations from transformers (BERT) models are designed to understand the context of words based on their surrounding words in a sentence. BERT processes text bidirectionally, capturing nuances and relationships between words more effectively than previous models. It is commonly used for tasks like question answering and language inference.

GPT

Generative pre-trained transformers (GPTs) are autoregressive models that generate text by predicting the next word in a sequence. GPT models, including the popular ChatGPT line, are known for their ability to produce human-like text and are used in many applications, both professional and personal.

BART

Bidirectional and auto-regressive transformers (BART) combine the bidirectional context understanding of BERT with the autoregressive text generation of GPT. It is effective in text generation, summarization, and translation tasks, providing versatile capabilities for processing and creating coherent text outputs.

Multimodal

Multimodal transformers integrate text and image data, making it possible for AI systems to understand and generate content that spans various types of media. These models are foundational for tasks that require simultaneous interpretation of text and visuals, like visual question answering and image captioning.

ViT

Vision transformers (ViT) adapt transformer architecture for image processing by treating images as sequences of patches. Each patch is processed similarly to how words are processed in text, allowing the model to capture contextual relationships within the image. ViTs are used in image classification, object detection, and other computer vision tasks.

How are transformers different from other neural networks?

Transformers are considered deep learning models, which means they fall into the category of neural networks. But that does not mean they are the same as other examples of that technology. Specifically, transformer models differ from recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

Transformers vs. RNNs

Recurrent neural networks address data sequentially, meaning each token is processed one after another, and they may struggle with long-range dependencies because information can get lost over long sequences. Transformers, on the other hand, use self-attention mechanisms that allow them to consider all tokens in the sequence simultaneously. This parallel processing enables transformers to capture long-range dependencies more effectively and train faster than is possible with RNNs.

Transformers vs. CNN

Convolutional neural networks excel at processing grid-like data (such as images) by detecting local patterns. However, CNNs are less effective at capturing global relationships within the data. Transformers overcome this by using self-attention to weigh the importance of different parts of the input data as part of the greater whole. While CNNs are primarily used for tasks like image recognition, transformers have been adapted for both text and image processing, providing a more versatile set of solutions.

Why are transformers important?

As we alluded to above, transformers were just that for the field of AI—a transformative introduction that addressed key limitations and opened the door for significant innovation. The advantages this technology makes possible are many and varied, but some of the most significant benefits include:

Scaling AI models

Transformers have a modular architecture, with layers and attention heads that can be scaled up quite readily. This enables the creation of large-scale models that can efficiently handle extensive sequences of data. By processing long sequences in parallel, transformers significantly reduce training and processing times. This efficiency allows for the development of advanced models (like BERT and GPT) which can capture complex language representations across billions of parameters.

Efficient model customization

Techniques such as transfer learning and retrieval augmented generation (RAG) facilitate faster and more effective customization. Pretrained on large datasets, these models can be fine-tuned on smaller, specific datasets, enabling personalized applications for different industries without the need for extensive investment—in effect, democratizing access to advanced AI.

Integrating multimodal capabilities

Transformers support the development of multimodal AI systems that can interpret and generate content from different data types, such as creating images from textual descriptions. By combining natural language processing and computer vision, transformers enable more comprehensive and human-like understanding and creativity.

Advancing AI research and innovation

Transformers drive significant advancements in AI research and industry innovation, such as positional encoding and self-attention mechanisms. Positional encoding helps models track the position of words in a sequence, while self-attention enables them to weigh the importance of different words based on their relevance to the overall context. These innovations have led to the accelerated development of new AI architectures and applications.

What are key transformer components?

Much like the inputs they receive, transformer models are complex and intricate, built on several software layers that operate in concert to create relevant, intelligent outputs. Each of the following components are essential to this process:

Input embeddings

Input embeddings convert input sequences into mathematical vectors that AI models can process. Tokens (such as words) are transformed into vectors that carry semantic and syntactic information learned during training.

Positional encoding

Positional encoding adds unique signals to each token's embedding to indicate its position in the sequence. This ensures the model can preserve the order of tokens and understand their context within the sequence.

Transformer block

Each transformer block consists of a multi-head self-attention mechanism and a feed-forward neural network. Self-attention weighs the importance of different tokens, while the feed-forward network processes this information.

Linear/softmax blocks

The linear block maps complex internal representations back to the original input domain. The softmax function then converts the output into a probability distribution, representing the model's confidence in each possible prediction.

How do transformers work?

Turning complex input sequences into relevant output is no simple task; it relies on several essential steps that incorporate the key components identified above. These software layers attempt to replicate the function of the human brain, operating together to give the system the processing power it needs to solve difficult problems. These neural networks process each part of the data in sequence simultaneously. As they do, the data goes through the following steps:

The input sequence is transformed into numerical representations called embeddings, which capture the semantic meaning of the tokens.

Positional encoding adds unique signals to each token's embedding to preserve the order of tokens in the sequence.

The multi-head attention mechanism processes these embeddings to capture different relationships between tokens.

Layer normalization and residual connections stabilize and speed up the training process.

The output from the self-attention layer passes through feedforward neural networks for non-linear transformations.

Multiple transformer blocks are stacked, each refining the output of the previous layer.

In tasks like translation, a separate decoder module generates the output sequence.

The model is trained using supervised learning to minimize the difference between predictions and ground truth.

During inference, the trained model processes new input sequences to generate predictions or representations.

What are some use cases for transformer models?

Transformers have almost limitless applications in business, making it possible to automate complex data processing tasks, enhance customer interactions, and drive innovation in fields like healthcare, finance, and creative industries. Some of the more prominent uses for transformer models include:

Natural language processing

Transformers empower machines to understand, interpret, and generate human language more accurately. This supports applications like document summarization and virtual assistants, which rely on a precise grasp language.

Machine translation

Real-time, accurate translations between languages are also made possible. Transformers’ ability to handle long-range dependencies and context significantly improves the accuracy of translations—especially compared to earlier find-and-replace solutions.

Speech recognition

Speech-to-text applications can be enhanced by accurately transcribing spoken language into written text. This is particularly useful in developing voice-controlled applications and improving accessibility for the hearing impaired.

Image generation

Image generation models use transformers to create visual media from textual descriptions, merging natural language processing and computer vision. This capability is used in creative applications, marketing, and more.

DNA sequence analysis

By treating DNA sequences similarly to text, transformers can be trained to predict genetic mutations, understand genetic patterns, and identify disease-related regions.

Protein structure analysis

Transformers can model the sequential nature of amino acids in proteins, predicting their 3D structures. This understanding is vital for drug discovery and understanding biological processes.

ServiceNow Pricing

ServiceNow offers competitive product packages that scale with you as your enterprise business grows and your needs change.

Get Pricing

Transformer models in the ServiceNow AI Platform

By enabling advanced natural language processing, machine translation, speech recognition, and more, transformers have forever changed how businesses use AI, enhancing operations across industries and markets. That said, not every AI approach makes the best possible use of transformer technology.

ServiceNow stands as an essential partner in properly leveraging AI to optimize business. Built on the AI-enhanced N ow Platform®, ServiceNow’s range of applications incorporate AI and transformer models to provide easy access to language understanding, predictive analytics, automated workflows, and more. These tools empower organizations to streamline operations like never before, enhancing their customer interactions, gaining clear insights, and turning complex data into a true competitive advantage.

See how transformers can transform your organization for the better; demo ServiceNow today!

Explore AI Workflows

Uncover how the ServiceNow AI Platform delivers actionable AI across every aspect of your business.

Explore GenAI

Contact Us

Resources

Articles

What is AI?

 What is genAI?

Analyst Reports

IDC InfoBrief: Maximize AI Value with a Digital Platform

Generative AI in IT Operations

Implementing GenAI in the Telecommunication Industry

Data Sheets

AI Search

Predict and prevent outages with ServiceNow® Predictive AIOps

Resource Management

Ebooks

Modernize IT Services and Operations with AI

GenAI: Is it really that big of a deal?

Unleash Enterprise Productivity with GenAI

White Papers

Enterprise AI Maturity Index

GenAI for Telco

The Autonomous Business Services Revolution

Automotive

Banking

Consumer Packaged Goods

Healthcare

Insurance

Life Sciences

Manufacturing

Nonprofit

National Government

Retail

Technology Providers

Telecom

Find a partner

Become a partner

Partner awards

Partner portal

Partner applications

Careers

Investors

ServiceNow AI Research

Leadership

Locations

Newsroom

Analyst Reports

Global impact

Trust and compliance

AI Agents

IT Service Management

ServiceNow AI Control Tower

IT Operations Management

Customer Service Management

Strategic Portfolio Management

IT Asset Management

Governance, Risk, and Compliance

Security Operations

Field Service Management

HR Service Delivery

Employee Center

AI

Data

Workflows

AI Experience

RaptorDB

Infrastructure

AI Agents

ServiceNow AI Control Tower

Security

App Engine

ServiceNow Store

Responsible AI

Provide better experiences

Resolve issues faster

Create and automate workflows

Enterprise Architecture

Service Operations Workspace

Cloud Governance Suite

Operational Technology Management

IT Asset Management

IT Operations Management

IT Service Management

ServiceNow Cloud Observability

Strategic Portfolio Management

Digital End-user Experience

Customer Service Management

Field Service Management

Sales and Order Management

Configure, Price, Quote

Financial Services Operations

Healthcare and Life Sciences Service Management

Sales and Order Management for Technology Providers

Sales and Order Management for Telecommunications

Public Sector Digital Services

Telecommunications Service Management

Technology Provider Service Management

Security Operations

Security Incident Response

Vulnerability Response

Threat Intelligence Security Center

Integrated Risk Management

Third-party Risk Management

Security Posture Control

Privacy Management

HR Service Delivery

Talent Development

Legal Service Delivery

Workplace Service Delivery

App Engine

Integration Hub

Accounts Payable Operations

Sourcing and Procurement Operations

Supplier Lifecycle Operations