
AI is what those of us who study technology call a General Purpose Technology (ironically, also abbreviated GPT). These advances are once-ina-generation technologies, like steam power or the internet, that touch every industry and every aspect of life. And, in some ways, generative AI might even be bigger.
General Purpose Technologies typically have slow adoption, as they require many other technologies to work well. The internet is a great example. While it was born as ARPANET in the late 1960s, it took nearly three decades to achieve general use in the 1990s, with the invention of the web browser, the development of affordable computers, and the growing infrastructure to support high-speed internet.
Solving the problem of understanding language was very complex, as there were many words that could be combined in many ways, making a formulaic statistical approach impossible. The attention mechanism helps solve this problem by allowing the AI model to weigh the importance of different words or phrases in a block of text. By focusing on the most relevant parts of the text, Transformers can produce more context-aware and coherent writing compared to earlier predictive AIs.
These new types of AI, called Large Language Models (LLMs), are still doing prediction, but rather than predicting the demand for an Amazon order, they are analyzing a piece of text and predicting the next token, which is simply a word or part of a word. Ultimately, that is all ChatGPT does technically—act as a very elaborate autocomplete like you have on your phone. You give it some initial text, and it keeps writing text based on what it statistically calculates as the most likely next token in the sequence.
This pretraining phase is one of the main reasons AIs are so expensive to build.
Training an AI to do this is an iterative process, and requires powerful computers to handle the immense calculations involved in learning from billions of words. This pretraining phase is one of the main reasons AIs are so expensive to build. The need for fast computers, with very expensive chips, to run for months in pretraining is largely responsible for the fact that more advanced LLMs cost over $100 million to train, using large amounts of energy in the process.
Because of the variety of data sources used, learning is not always a good thing. AI can also learn biases, errors, and falsehoods from the data it sees. Just out of pretraining, the AI also doesn’t necessarily produce the sorts of outcomes that people would expect in response to a prompt. And, potentially worse, it has no ethical boundaries and would be happy to give advice on how to embezzle money, commit murder, or stalk someone online. LLMs in this pretrained mode just reflect back whatever they were trained on, like a mirror, without applying any judgment. So, after learning from all the text examples in pretraining, many LLMs undergo further improvement in a second stage, called fine-tuning.
After an AI has gone through this initial phase of reinforcement learning, they can continue to be fine-tuned and adjusted. This type of fine-tuning is usually done by providing more specific examples to create a new tweaked model. That information can be provided by a specific customer trying to fit the model to its use case, for example a company feeding it examples of customer support transcripts along with good responses. Or the information could come from watching which kinds of answers get a “thumbs-up” or “thumbs-down” from users. This additional fine-tuning can make the responses of the model more specific to a particular need.
A lot of research is going into considering how to design AI systems aligned with human values and goals, or at least that do not actively harm them. This is not an easy task, as humans themselves often have conflicting or unclear values and goals, and translating them into computer code is fraught with challenges. Moreover, there is no guarantee that an AI system will keep its original values and goals as it evolves and learns from its environment.
The complication is that AI does not really plagiarize, in the way that someone copying an image or a block of text and passing it off as their own is plagiarizing. The AI stores only the weights from its pretraining, not the underlying text it trained on, so it reproduces a work with similar characteristics but not a direct copy of the original pieces it trained on. It is, effectively, creating something new, even if it is a homage to the original. However, the more often a work appears in the training data, the more closely the underlying weights will allow the AI to reproduce the work. For books that are repeated often in the training data—like Alice’s Adventures in Wonderland—the AI can nearly reproduce it word for word. Similarly, art AIs are often trained on the most common images on the internet, so they produce good wedding photographs and pictures of celebrities as a result.
When you pass information to an AI, most current LLMs do not learn directly from that data, because it is not part of the pretraining for that model, which is usually long since completed. However, it is possible that the data you upload will be used in future training runs or to fine-tune the model you are working with. Thus, while it is unlikely that the AI training on your data can reproduce exact details of what you shared, it isn’t impossible.
These early chatbots basically had large, memorized scripts, but soon more advanced chatbots that incorporated elements of machine learning were being developed. One of the most notorious was Tay, a creation of Microsoft in 2016. Tay was designed to mimic the language patterns of a nineteen-year-old American girl, and to learn from interacting with human users of Twitter. She was presented as the “AI with zero chill.” Her creators hoped she would become a fun and engaging companion for young people online. It didn’t work out that way. Within hours of her debut on Twitter, Tay turned from a friendly chatbot into a racist, sexist, and hateful troll.
There is no definitive answer to why LLMs hallucinate, and the contributing factors may differ among models. Different LLMs may have different architectures, training data, and objectives. But in many ways, hallucinations are a deep part of how LLMs work. They don’t store text directly; rather, they store patterns about which tokens are more likely to follow others. That means the AI doesn’t actually “know” anything. It makes up its answers on the fly. Plus, if it sticks too closely to the patterns in its training data, the model is said to be overfitted to that training data. Overfitted LLMs may fail to generalize to new or unseen inputs and generate irrelevant or inconsistent text—in short, their results are always similar and uninspired.
And you can’t figure out why an AI is generating a hallucination by asking it. It is not conscious of its own processes. So if you ask it to explain itself, the AI will appear to give you the right answer, but it will have nothing to do with the process that generated the original result. The system has no way of explaining its decisions, or even knowing what those decisions were. Instead, it is (you guessed it) merely generating text that it thinks will make you happy in response to your query.
Hallucination does allow the AI to find novel connections outside the exact context of its training data. It also is part of how it can perform tasks that it was not explicitly trained for, such as creating a sentence about an elephant who eats stew on the moon, where every word should begin with a vowel. (The AI came up with: An elephant eats an oniony oxtail on outer orbit.) This is the paradox of AI creativity. The same feature that makes LLMs unreliable and dangerous for factual work also makes them useful.
After all, how can AI, a machine, generate something new and creative? The issue is that we often mistake novelty for originality. New ideas do not come from the ether; they are based on existing concepts. Innovation scholars have long pointed to the importance of recombination in generating ideas. Breakthroughs often happen when people connect distant, seemingly unrelated ideas. To take a canonical example, the Wright brothers combined their experience as bicycle mechanics and their observations of the flight of birds to develop their concept of a controllable plane that could be balanced and steered by warping its wings. They were not the inventors of the bicycle, the first to observe birds’ wings, or even the first people to try to build an airplane. Instead, they were the first to see the connections between these concepts. If you can link disparate ideas from multiple fields and add a little random creativity, you might be able to create something new.
When the AI is very good, humans have no reason to work hard and pay attention. They let the AI take over instead of using it as a tool, which can hurt human learning, skill development, and productivity. He called this “falling asleep at the wheel.”
There are many reasons for companies to not turn efficiency gains into head-count reduction or cost reduction. Companies that figure out how to use their newly productive workforce should be able to dominate any company that tries to keep their post-AI output the same as their pre-AI output, just with fewer people. And companies that commit to maintaining their workforce will likely have employees as partners who are happy to teach others about the uses of AI at work, rather than scared workers who hide their AI for fear of being replaced.
When we start to take into account the systems in which jobs operate, we see other reasons to suspect slower, rather than faster, change in the nature of jobs. Humans are built deep into the fabric of every aspect of our organizations. You cannot easily replace a human with a machine without tearing that fabric. Even if you could replace a doctor with an AI overnight, would patients be okay being seen by a machine? How would liability rules work? How would other health-care professionals adjust? Who would do the other tasks that the doctor was in charge of, like training interns or being part of professional organizations? Our systems will prove more resistant to change than our tasks.
In study after study, the people who get the biggest boost from AI are those with the lowest initial ability—it turns poor performers into good performers. In writing tasks, bad writers become solid. In creativity tests, it boosts the least creative the most. And among law students, the worst legal writers turn into good ones. And in a study of early generative AI at a call center, the lowest-performing workers became 35 percent more productive, while experienced workers gained very little. In our study in BCG, we found similar effects. Those who had the weakest skills benefited the most from AI, but even the highest performers gained.
In the longer term, however, the lecture is in danger. Too many involve passive learning, where students simply listen and take notes without engaging in active problem-solving or critical thinking. Moreover, the onesize-fits-all approach of lectures doesn’t account for individual differences and abilities, leading to some students falling behind while others become disengaged due to a lack of challenge.
One solution to incorporating more active learning is by “flipping” classrooms. Students would learn new concepts at home, typically through videos or other digital resources, and then apply what they’ve learned in the classroom through collaborative activities, discussions, or problem-solving exercises. The main idea behind flipped classrooms is to maximize classroom time for active learning and critical thinking, while using athome learning for content delivery. The value of flipped classrooms seems to be mixed, ultimately depending on whether they encourage active learning or not.
But AI allows for more fundamental changes to how we learn, beyond providing classroom activities. Imagine introducing high-quality AI tutors into the flipped classroom model. These AI-powered systems have the potential to significantly enhance the learning experience for students and make flipped classrooms even more effective. They provide personalized learning, where AI tutors can tailor instruction to each student’s unique needs while continually adjusting content based on performance. This means that students can engage with the content at home more effectively, ensuring they come to class better prepared and ready to dive into hands-on activities or discussions.
In field after field, we are finding that a human working with an AI cointelligence outperforms all but the best humans working without an AI. In our study of Boston Consulting Group, where previously the gap between the average performances of top and bottom performers was 22 percent, the gap shrank to a mere 4 percent once the consultants used GPT-4.
So will AI result in the death of expertise? I don’t think so. As we discussed, jobs don’t consist of just one automatable task, but rather a set of complex tasks that still require human judgment. Plus, because of the Jagged Frontier, it is unlikely to do every task that a worker is responsible for. Improving the performance in a few areas need not lead to replacement; instead, it will allow workers to focus on building and honing a narrow slice of area expertise, becoming the human in the loop.
AI-powered robots and autonomous AI agents, monitored by humans, could potentially drastically reduce the need for human work while expanding the economy. The adjustment to this shift, if it were to occur, is hard to imagine. It will require a major rethinking of how we approach work and society. Shortened workweeks, universal basic income, and other policy changes might become a reality as the need for human work decreases over time. We will need to find new ways to occupy our free time in meaningful ways, since so much of our current life is focused around work.