What are large language models?

It’s the era of Big Data, and super-sized language models are the latest stars.

When it comes to the sizes of language models, small models are actually no slouches; they can be highly usable for completing specialized tasks. But it’s the large-scale language models — those comprising massive datasets, such as those powering OpenAI’s GPT (which stands for generative pre-trained transformer), whose advancements have taken the world by storm with their humanlike responses to requests for information.

What’s a language model?

Language models may seem ultramodern, but they date as far back as 57 years ago, to ELIZA, a 1966 cutting-edge computer program that could effectively use natural language processing (NLP) to “converse” in a human-sounding way, for example, as a psychotherapist.

What’s a large language model?

In terms of a plain-English computer science definition, large language models (LLMs) are a type of generative AI that utilizes deep-learning algorithms and simulates the way people might think.

What exactly does “large” entail as it applies to language models?

According to Wikipedia, “a language model…can generate probabilities of a series of words, based on text corpora in one or multiple languages it was trained on.” LLMs are the most advanced kind of language model, “combinations of larger datasets (frequently using scraped words from the public internet), feedforward neural networks, and transformers.

An LLM could have a billion parameters — the factor that impacts its abilities when it generates output — and still be considered average size.

With their giant sizes and wide-scale impact, some LLMs are “foundation models”, says the Stanford Institute for Human-Centered Artificial Intelligence (HAI). These vast pretrained models can then be tailored for various use cases, with optimization for specific tasks.

Transformers: LLMs’ secret sauce

LLMs are a product of machine learning technology, utilizing neural networks whose operations are facilitated by transformers: attention-layer-based encoder-decoder architectures. Transformers were invented in 2017 by deep-learning visionary Ashish Vaswani et al, as introduced in a paper called Attention Is All You Need.

A transformer model observes relationships between items in sequential data, such as words in a phrase, which allows it to thereby determine meaning and context. With text, the focus is to predict the next word. A transformer architecture does this by processing data through different types of layers, including those focused on self-attention, feed-forward, and normalization functionality.

With transformer-based technology, an abundance of parameters, and the ability to stack the processing layers for more powerful interpretation, an LLM can quickly make sense of voluminous input text and provide appropriate responses.

Using statistical models to take notes on patterns and how words and phrases connect, LLMs can make sense of content, even translating it. Then, based on their constructed knowledge bases, they can go a step further and, remarkably, generate new text in seemingly human language.

For instance, many LLMs can instantaneously “write” blog posts and poetry in the same styles as those used by human poets on whose work they’ve been pre-trained (e.g., come up with unique poems that read like existing poetry by Maya Angelou). The multitalented ringleader app, ChatGPT, can answer questions of all sorts, improve poorly written text, change the tone of content from academic to conversational or vice versa, converse with people about whatever’s on their minds, do coding, and even help someone set up an Etsy business.

Examples of large language models

It’s safe to say that large language models are proliferating. In addition to the ChatGPT-powered language models GPT-3 (175 billion parameters) and GPT-4 (more than 170 trillion parameters, used with Microsoft Bing), these large entities include:

BERT (Bidirectional Encoder Representations from Transformers, Google)
BLOOM (BigScience Large Open-science Open-access Multilingual Language Model; started by Hugging Face co-founder)
Claude 2 (Anthropic)
Ernie Bot (Baidu)
PaLM 2 (Pathways Language Model, used with Google BARD)
LLaMA (Meta)
RoBERTa (A Robustly Optimized BERT Pretraining Approach, Google)
T5 (Text-to-Text Transfer Transformer, Google)

How large language models work

Training LLMs using unsupervised learning

LLMs must be trained by feeding them tons of data — a “corpus” — which lets them establish expert awareness of how words work together. The input text data could take the form of everything from web content to marketing materials to entire books; the more information available to an LLM for training purposes, the better the output could be.

The training process for LLMs can involve several steps, typically beginning with unsupervised learning to identify patterns in unstructured data. When creating an AI model using supervised learning, the associated data labeling is a formidable obstacle. By contrast, with unsupervised learning, this intensive process is skipped, which means there’s much more available data for assimilating.

Transformer processing

In the transformer neural network process, relationships between pairs of input tokens known as attention — for example, words — are measured. A transformer uses parallel multi-head attention, meaning the attention module repeats computations in parallel, affording more ability to encode nuances of word meanings.

A self-attention mechanism helps the LLM learn the associations between concepts and words. Transformers also utilize layer normalization, residual and feedforward connections, and positional embeddings.

Incorporating zero-shot learning

What happens when a brilliant but distracted student neglects to go to class or read the textbook? They may still be able to use their powers of reasoning to ace the final and get an A.

That’s kind of the concept of zero-shot learning with large language models. Foundation models are trained for wide application by not feeding them much in the way of how a task is done, in essence, giving them only limited training opportunities to form understanding while having an expectation that they’ll get the basic output right.

Fine-tuning with supervised learning

The flip side is that while zero-shot learning can translate to comprehensive knowledge, the LLM can end up with an overly broad, limited outlook.

This is where companies can start the process of refining a foundation model for their specific use cases. Models can be fine tuned, prompt tuned, and adapted as needed using supervised learning. One tool for fine-tuning LLMs to generate the right text is reinforcement learning.

Content generation

When an LLM is trained, it can then generate new content in response to users’ parameters. For instance, if someone wanted to write a report in the company’s editorial style, they could prompt the LLM for it.

Applications

From machine translation to natural language processing (NLP) to computer vision, plus audio and multi-modal processing, transformers capture long-range dependencies and efficiently process sequential data. They’re used widely in neural machine translation (NMT), as well as to perform or improve AI systems and NLP business tasks and simplify enterprise workflows.

Transformers’ skill sets include:

Chat (through chatbots) and conversational AI
Virtual assistants
Summarizing text
Creating content
Translating content
Classifying/categorizing content
Rewriting content
Annotating images
Synthesizing text to speech
Correcting spelling
Making recommendations (e.g., for products on ecommerce web pages)
Detecting fraud
Generating code
Doing sentiment analysis

Sentiment analysis is one of the more impressive applications. A combination of unsupervised and supervised learning allows LLMs to identify intent, attitudes, and emotions in text. Some algorithms can even pick up specific feelings such as sadness, while others can determine the difference between positive, negative, and neutral.

With so many content-related abilities, LLMs are a desirable asset and natural fit in a multitude of domain-specific industries. They’re especially popular in retail, technology, and healthcare (for example, with the startup Cohere).

Drawbacks

With such an inspiring track record, there’ve gotta be some downsides to LLMs, right? Like the fact that they could tell people how to do questionable things.

Nobody can argue that they aren’t a highly and impressively creative bunch of artificially intelligent beings. They can present work from student assignments to gorgeous art that’s beautiful to behold and sounds like it’s undoubtedly all based in truth.

Wouldn’t it be great if self-supervised large language models could also be trusted and relied on to generate information only for the greater good that’s also 100% accurate?

They can’t. They may be prone to hallucination: producing inaccuracies that don’t reflect the training data.

The risk of their going rogue is undoubtedly their biggest liability, such as when they’re working up award-winning photos or reporting news content — arenas in which errors and inklings that humans aren’t involved could impact their reputation or raise liability issues.

So at this point, LLMs still badly need some level of human fact-checking and sign-off.

Other drawbacks of LLMs include:

Biases in generated text
Significant development expenses, such as investment in graphics processing units (GPUs)
High operating costs
A troubling inability for their results-generation processes to be explained
Difficulty troubleshoot due to complexity
Vulnerability to prompts that could maliciously break the system

Want NLP-enhanced search?

That summarizes what we know about large language models. Did you know that some of this groundbreaking technology’s best principles are applicable (and, thankfully, some of its biggest drawbacks aren’t) in enterprise-level search?

For example, NLP can substantially improve the accuracy of search for ecommerce platforms and apps, ultimately raising revenue without introducing inaccurate information.

Our natural language understanding (NLU) feature combines tunable relevance with AI-driven natural language and real-world understanding. Built partially on technology from OpenAI, it addresses the most difficult natural language questions, producing not just answers but the answers that best address questions.

Want the secret of how state-of-the-art search can boost your organization’s bottom line? Our API is utilized for success by more than 11,000 companies, including Lacoste, Zendesk, Stripe, and Slack. Meet up with us for a fascinating demo or shoot us a note.

What are large language models?

What’s a language model?

What’s a large language model?

Transformers: LLMs’ secret sauce

Examples of large language models

How large language models work

Training LLMs using unsupervised learning

Transformer processing

Incorporating zero-shot learning

Fine-tuning with supervised learning

Content generation

Applications

Drawbacks

Want NLP-enhanced search?

Recommended Content

Get the AI search that shows users what they need

Agentic intelligence layer powering commerce discovery

A leader for the third consecutive year

Increased Operating Profit and Improved Efficiency

Named a leader in knowledge discovery

Top scores across every B2B category