Search by Algolia
Removing outliers for A/B search tests
engineering

Removing outliers for A/B search tests

How do you measure the success of a new feature? How do you test the impact? There are different ways ...

Christopher Hawke

Senior Software Engineer

Easily integrate Algolia into native apps with FlutterFlow
engineering

Easily integrate Algolia into native apps with FlutterFlow

Algolia's advanced search capabilities pair seamlessly with iOS or Android Apps when using FlutterFlow. App development and search design ...

Chuck Meyer

Sr. Developer Relations Engineer

Algolia's search propels 1,000s of retailers to Black Friday success
e-commerce

Algolia's search propels 1,000s of retailers to Black Friday success

In the midst of the Black Friday shopping frenzy, Algolia soared to new heights, setting new records and delivering an ...

Bernadette Nixon

Chief Executive Officer and Board Member at Algolia

Generative AI’s impact on the ecommerce industry
ai

Generative AI’s impact on the ecommerce industry

When was your last online shopping trip, and how did it go? For consumers, it’s becoming arguably tougher to ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What’s the average ecommerce conversion rate and how does yours compare?
e-commerce

What’s the average ecommerce conversion rate and how does yours compare?

Have you put your blood, sweat, and tears into perfecting your online store, only to see your conversion rates stuck ...

Vincent Caruana

Senior Digital Marketing Manager, SEO

What are AI chatbots, how do they work, and how have they impacted ecommerce?
ai

What are AI chatbots, how do they work, and how have they impacted ecommerce?

“Hello, how can I help you today?”  This has to be the most tired, but nevertheless tried-and-true ...

Catherine Dee

Search and Discovery writer

Algolia named a leader in IDC MarketScape
algolia

Algolia named a leader in IDC MarketScape

We are proud to announce that Algolia was named a leader in the IDC Marketscape in the Worldwide General-Purpose ...

John Stewart

VP Corporate Marketing

Mastering the channel shift: How leading distributors provide excellent online buying experiences
e-commerce

Mastering the channel shift: How leading distributors provide excellent online buying experiences

Twice a year, B2B Online brings together America’s leading manufacturers and distributors to uncover learnings and industry trends. This ...

Jack Moberger

Director, Sales Enablement & B2B Practice Leader

Large language models (LLMs) vs generative AI: what’s the difference?
ai

Large language models (LLMs) vs generative AI: what’s the difference?

Generative AI and large language models (LLMs). These two cutting-edge AI technologies sound like totally different, incomparable things. One ...

Catherine Dee

Search and Discovery writer

What is generative AI and how does it work?
ai

What is generative AI and how does it work?

ChatGPT, Bing, Bard, YouChat, DALL-E, Jasper…chances are good you’re leveraging some version of generative artificial intelligence on ...

Catherine Dee

Search and Discovery writer

Feature Spotlight: Query Suggestions
product

Feature Spotlight: Query Suggestions

Your users are spoiled. They’re used to Google’s refined and convenient search interface, so they have high expectations ...

Jaden Baptista

Technical Writer

What does it take to build and train a large language model? An introduction
ai

What does it take to build and train a large language model? An introduction

Imagine if, as your final exam for a computer science class, you had to create a real-world large language ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

The pros and cons of AI language models
ai

The pros and cons of AI language models

What do you think of the OpenAI ChatGPT app and AI language models? There’s lots going on: GPT-3 ...

Catherine Dee

Search and Discovery writer

How AI is transforming merchandising from reactive to proactive
e-commerce

How AI is transforming merchandising from reactive to proactive

In the fast-paced and dynamic realm of digital merchandising, being reactive to customer trends has been the norm. In ...

Lorna Rivera

Staff User Researcher

Top examples of some of the best large language models out there
ai

Top examples of some of the best large language models out there

You’re at a dinner party when the conversation takes a computer-science-y turn. Have you tried ChatGPT? What ...

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What are large language models?
ai

What are large language models?

It’s the era of Big Data, and super-sized language models are the latest stars. When it comes to ...

Catherine Dee

Search and Discovery writer

Mobile search done right: Common pitfalls and best practices
ux

Mobile search done right: Common pitfalls and best practices

Did you know that 86% of the global population uses a smartphone? The 7 billion devices connected to the Internet ...

Alexandre Collin

Staff SME Business & Optimization - UI/UX

Cloud Native meetup: Observability & Sustainability
engineering

Cloud Native meetup: Observability & Sustainability

The Cloud Native Foundation is known for being the organization behind Kubernetes and many other Cloud Native tools. To foster ...

Tim Carry

Looking for something?

facebookfacebooklinkedinlinkedintwittertwittermailmail

What do OpenAI and DeepMind have in common?

Give up? These innovative organizations both utilize technology known as transformer models.

What are transformer models?  

The transformer (represented by the T in ChatGPT, GPT-2, GPT-3, GPT-3.5, etc.) is the key element that makes generative AI so, well, transformational.

Transformer models are a type of neural network architecture designed to process sequential material, such as sentences or time-series data.

The concept of a transformer, an attention-layer-based, sequence-to-sequence (“Seq2Seq”) encoder-decoder architecture, was conceived in a 2017 paper authored by pioneer in deep learning models Ashish Vaswani et al called “Attention Is All You Need”. Since then, in the realms of AI and machine learning, transformer models have emerged as a groundbreaking approach to various language-related tasks.

Compared with traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs), transformers differ in their ability to capture long-range dependencies and contextual information.

The transformer “requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets,” notes Wikipedia.

From machine translation to natural language processing (NLP) to computer vision, plus audio and multi-modal processing, transformers have revolutionized the field with their ability to capture long-range dependencies and efficiently process sequential data. They’re used widely in neural machine translation (NMT). They’re used to perform or improve AI and NLP business tasks, as well as streamline enterprise workflows. Transformer technology has also heralded generative pretrained transformers (GPTs) and Bidirectional Encoder Representations from Transformers (BERT).

Multi-head attention

A transformer measures relationships between pairs of input tokens (for example, if the content is text, the tokens are words), known as attention. The attention heads are a key feature of transformers. A transformer uses parallel multi-head attention, meaning the attention module repeats computations in parallel, affording more ability to encode nuances of word meanings. The attention score is computed by combining the similar attention calculations.

In addition to multihead attention mechanisms, transformers rely on layer normalization, residual and feedforward connections, and positional embeddings.

How do transformer models work?

Here’s how the transformer architecture works: 

1. Input embedding 

The first step in transformer operations is understanding the input data. It takes a sentence — or a sequence of data — and turns each word or element into numerical representation known as vector embeddings. The sequence model’s embeddings capture the meanings of the words or elements. Various techniques can be employed for input embedding, such as word embeddings and character embeddings.

This allows the model to work with continuous representations rather than discrete symbols.  

2. Positional encoding 

Next, the transformer model gets to know the order. Transformers don’t naturally understand the order of words, so they use positional encoding to give the model information about the order. This is done by combining the embeddings with sinusoidal functions (remember sine from trigonometry class?), which helps the model understand the relationships between parts of the sequence. For example, if the input sentence is “The cat is on the mat,” the transformer knows “cat” and “mat” are related because they’re both objects.  

3. Encoder layers

The embedded and encoded input sequence is passed through multiple encoder layers. Each layer consists of two sub-layers called the self-attention mechanism and the feed-forward neural network.  

  • The self-attention mechanism allows the model to focus on different parts of the input sequence and capture dependencies. It calculates attention scores for each element based on its relationships with other elements in the sequence.

For each word in a sentence, the self-attention layer computes three vectors (key, value, query). To determine a word’s contextually related words, the dot products of the query vector are considered with the key vectors of the other words.

  •  The feed-forward neural network applies a non-linear transformation to the outputs of the self-attention mechanism, introducing complexity and expressive power to the model. The feed-forward layer makes up two-thirds of the parameters in a transformer model.

4. Decoder layers 

The output is fed into the decoder layers next. Like the encoder layers, each of these consists of two sub-layers: the self-attention mechanism and the encoder-decoder attention mechanism. 

  • The self-attention mechanism in the decoder allows it to attend to different parts within the output sequence, capturing dependencies between elements. It calculates attention scores based on the relationships between positions in the output sequence.  
  • The encoder-decoder attention mechanism enables the decoder to focus on different parts of the input sequence, incorporating information from the encoder. This helps the decoder understand the context of the input sequence, aiding in generating the output sequence.

5. Output projection 

The output of the decoder layers is passed through a linear projection layer. Because the dot products yield values between negative and positive infinity, a softmax activation function is applied; this maps the output to the same size as the vocabulary and generates a probability distribution for each position in the output sequence. The highest probability is considered the predicted output.  

6. Training and optimization 

Transformers are trained using supervised learning. The model’s predictions are compared with the correct target sequence, and optimization algorithms adjust the model’s parameters to minimize the difference between predicted and correct outputs. This is done by going through the training data in batches and improving the model’s performance. 

7. Inference 

A pretrained model can then be used for inference to generate predictions for new input sequences. During inference, the trained model applies the same preprocessing steps as during training (such as input embedding and positional encoding) to an input sequence, then feeds it through the encoder and decoder layers.  

 The model generates predictions for each position in the output sequence, producing the most probable output at each step. The predictions are then decoded into the desired format, such as when generating a translation or sequence of words. 

Applications of transformer models 

Just how much of a help are transformer models in deciphering real-world challenges?

As documented by Google, Vaswani et al’s paper shows that “the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.”

Because of this high level of effectiveness, transformer neural networks are used for various types of applications, including: 

Machine translation 

In earlier times, traditional machine translation approaches relied on statistical methods and phrase-based models, which often struggled with capturing the semantic meaning and syntactic structure of sentences. But with the introduction of transformer models, translation accuracy has significantly improved. 

In the transformer, the self-attention mechanism allows the model to attend to different parts of the input sequence, capturing long-range dependencies and improving the overall translation quality. Because transformer models can effectively learn the patterns in source and target languages, they can generate more-fluent and accurate translations.  

Some of the most successful machine translation systems powered by transformers include Google Translate, Microsoft Translator, and DeepL. This application can improve global communication between organizations as well as fine-tune multilingual chatbot support and content localization.  

Natural language processing

Transformer models’ ability to handle long-range dependencies and capture contextual information makes them super effective in language understanding and humanlike text generation. Their functionality has been applied to tasks such as sentiment analysis, text classification, named entity recognition, and text summarization.  

In sentiment analysis, for example, models powered by transformers can accurately determine the sentiment expressed in text. This enables companies, for instance, to gain insight from customer feedback, identifying areas for improvement and ways to better manage their brand reputation. 

Furthermore, NLP (with a transformer working alongside it) is used in industries such as finance and healthcare to understand and analyze legal and regulatory documents. This ensures compliance and identifies potential risks, as well as detects fraud. 

Speech recognition 

Their ability to capture dependencies and contextual information has enabled transformer models to transcribe spoken language very accurately. This has led to utilization in popular voice assistants such as Amazon’s Alexa, Apple’s Siri, and Google Assistant.  

These models process the audio input, segment it into smaller units, and generate the corresponding text representation. Transformers have improved the accuracy and fluency of the transcriptions.

One result: more-seamless interaction between humans and machines, especially when it comes to chatbots. The ecommerce, finance, and health Industries routinely employ chatbots in their customer service operations. By improving content quality, transformers have ensured that shoppers, clients, and patients can all chat with an AI entity to quickly get the support they need. 

Image captioning 

Images contain rich visual information, while captions provide textual descriptions of the image content. Transformer models encode the visual features of an image and then decode them into corresponding captions.  

The transformer’s ability to capture dependencies and generate coherent text makes it effective in producing accurate and contextually relevant captions. Image captioning powered by transformers has found application in areas such as content understanding, visual search, and accessibility for visually impaired individuals. 

In ecommerce, image captioning is utilized to automatically generate captions for product images. Descriptive captions proactively provide shoppers with valuable information such as product features and dimensions and other specifications, thereby enhancing the shopping experience. 

Transform your outlook 

That’s it for this introduction to how transformers work their magic.

Want to use this technology to transform your ecommerce revenue? Here at Algolia, we’re incorporating transformer models and other amazing technology to improve our clients’ search results and recommendations. We use vector representation, along with machine-learning techniques such as spelling correction, language processing, and category matching, to make sense of language. Our smart search experiences have proven to enhance user engagement and increase conversion for a vast array of clients. 

Want to know more? Let’s chat, or take the next step and request a demo of how our AI-powered NeuralSearch can give your site surprisingly on-target search results.

About the author
Vincent Caruana

Senior Digital Marketing Manager, SEO

Recommended Articles

Powered byAlgolia Algolia Recommend

What are large language models?
ai

Catherine Dee

Search and Discovery writer

Top examples of some of the best large language models out there
ai

Vincent Caruana

Sr. SEO Web Digital Marketing Manager

What does it take to build and train a large language model? An introduction
ai

Vincent Caruana

Sr. SEO Web Digital Marketing Manager