ARTICLE | July 26, 2023

Can we ever trust Generative AI?

Generative AI has taken the world by storm. How will businesses harness its powers without drowning in its wake?

By Dave Wright , Workflow contributor

When Alan Turing, the father of modern computer science, used to visit Zurich, he frequented the Café Bar ODEON. There he met anti-apartheid activist Nelson Mandela, who struck up a conversation with him about the potential of technology to bridge gaps or deepen divisions. The two became fast friends.

A lovely story, except none of that actually happened. The entire colorful account was conjured up by ChatGPT in its answer to the prompt: “Describe in a scene how Alan Turing met Nelson Mandela.”

Generative AI bots like ChatGPT make up stories, or “hallucinate,” all the time. But it’s easy to believe they’re telling the truth. After all, this is what generative AI excels at: giving us logical, coherent, and personable answers to questions, even if the answers aren’t true. As a result, most people say they trust the content that generative AI produces, according to a poll by Capgemini.

But it might not be long before the public’s trust wears thin. Already, we’ve seen high-profile mistakes and bad behavior, sometimes with serious consequences. Earlier this year, ChatGPT accused a law professor of sexual harassment, citing an article in The Washington Post that didn’t exist. In Australia, a mayor is threatening to file the first defamation lawsuit against OpenAI, ChatGPT’s creator, unless it corrects the bot’s false claims that he was imprisoned for bribery. And Google’s AI Bard generates false and harmful narratives more often than not, the Center for Countering Digital Hate reported.

To use generative AI productively at work, we need to be able to trust the bots. But have the bots earned it?

For researchers who study large language models (LLMs), hallucinations aren’t surprising. Despite the name, generative AI is not actually generating anything. It’s instead repurposing the sequences of words and phrases that humans have already generated.

In an article for Scientific American, the cognitive scientist Gary Marcus explains that LLMs are models of how people use language, not models of how the world works. LLMs often get things right because we often successfully use language to capture what we mean to say. But LLMs sometimes get things wrong because we do, too.

Computer scientists Arvind Narayanan and Sayash Kapoor put it a different way, calling ChatGPT and other LLMs “bull**** generators.” They quoted the philosopher Harry Frankfurt, who defined “bull****” as “speech that is intended to persuade without regard for the truth.” That’s exactly what LLMs are designed to do: produce content that seems to answer our questions.

That doesn’t mean generative AI can’t be useful. But it does mean we have to reframe our notions of what it can—and should—be used for. For example, it’s probably not a good idea to replace our search engines with LLMs any time soon. In a recent paper, a pair of researchers argue that LLMs make poor search engines because they present false and misleading information and have the potential to degrade information literacy and human creativity.

Neither should we be using generative AI to answer questions that require precision. Last year, Meta introduced Galactica, an LLM that was trained on 48 million scientific sources, such as articles, textbooks, and lecture notes. Like all generative AI, Galactica cannot tell fact from fiction. While it answers some questions accurately, it also generates fake information that sounds convincing, like an in-depth wiki article about the history of bears in space.

Workflow Guide

AI in the enterprise

Read guide

So how can we use generative AI responsibly?

The first step is to choose our use cases with care. A lot of the panic and excitement over generative AI fundamentally misses—or misunderstands—what it can and cannot do.  Generative AI is not the same as artificial general intelligence, an artificial intelligence that can accomplish any task a human can perform. It’s a model, albeit a clever one, that copies what we give it. To avoid making hasty investments that don’t pan out, executives need to understand exactly what they’re dealing with.

Narayanan and Kapoor identify three kinds of tasks for which LLMs are useful: tasks where it’s easy for users to double-check the bot’s work, where factual truthfulness of the material is irrelevant, and where a subset of the training data can act as a verifiable source of truth (such as translation jobs).

Businesses are already starting to put LLMs to work on these tasks. Some of the smarter and more popular use cases for generative AI include crafting personalized marketing messages for advertising and sales, performing code reviews, parsing complex legal documentation, and performing data analytics, according to research from McKinsey. These use cases play to LLMs’ strengths. None of these tasks requires a bot to understand what’s real and what’s not.

Even responsible use cases require guardrails, however.

This year, Singapore’s newly launched AI Verify Foundation produced a white paper on the risks posed by generative AI. The paper stresses the importance of good governance, arguing that humans must take a practical, risk-based approach to trust and safety when it comes to AI. Further, to mitigate bias and encourage responsible use, developers should be open about how they build and train their models and should regularly invite third parties to check their work. This is similar to the concept of privacy by design, in which privacy and security are incorporated into the very foundation of the technology’s development rather than tacked on at the end.

Further, companies will need to understand the nature of data that they want to include in language models, including to what extent AI-produced content should play a role. According to the white paper, you could fall into an iterative spiral, where AI generates content based on previously produced AI content, which in turn was originally hallucinated by AI. The researchers warn against pulling from multiple iterations of generative cycles that end up confirming bias and reinforcing untruths.

Ultimately, it comes down to putting humans at the center of the development process. Such human-centered design will enable businesses to create AI models that serve people, not each other.

Developers should consider taking a page from Wikipedia and creating ways for LLMs to show their work. Although this isn’t a perfect solution—as we know LLMs can make up sources—creating a paper trail makes it easier for humans to double-check the bots.

In the early days of the internet, responsible users emphasized the role of open-source development. They had little idea where the internet was going, so they needed to be transparent about how they were going to get there.

Today, nobody knows whether generative AI is going to upend the way vast numbers of people do work or simply make it easier for many of us to get our jobs done. But more AI is not going to solve our AI problems.

Recently, members of the AI research community signed an open letter calling for caution in our widespread adoption of AI, arguing that humans need to put rules in place and slow down before we proceed any further. If we want to make the most of our LLMs, and if we want to do so responsibly, then human-centric design is the only path forward. What Alan Turing wrote in his 1950 paper, Computing Machinery and Intelligence, still applies today: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”