As chatbots improve, humans' unique language abilities are becoming less special

Lisa Lock
scientific editor

Andrew Zinin
lead editor

UC Berkeley researchers say large language models have gained "metalinguistic ability," a hallmark of human language and cognition no other animal has displayed.
AI chatbots can analyze sentences like a trained linguist, new UC Berkeley research shows, providing a glimpse into how AI models are improving while also challenging the idea that humans are unique in our ability to think about language.
AI platforms like ChatGPT are widely understood to be sophisticated prediction machines. Trained on vast troves of content ranging from news articles and books to film scripts and Reddit posts, they anticipate the next most likely letters and words when prompted. While their responses can give the impression they're sentient thinkers, that sci-fi scenario hasn't yet panned out.
But new UC Berkeley research reveals for the first time that AI chatbots can now analyze sentences like a trained linguist. The , published in the journal IEEE Transactions on Artificial Intelligence, provides a glimpse into how AI models are improving and also challenges the idea that humans are unique in our ability to think about language.
With roots in linguistics and philosophy, our ability to think deeply about words and sentence structure is a defining human cognitive feat, said Gašper Beguš, a Berkeley associate professor of linguistics and lead author of the research. But that ability to talk about and manipulate language—a process called metalinguistics—is becoming the domain of AI chatbots, too.
"Our new findings suggest that the most advanced large language models are beginning to bridge that gap," Beguš said. "Not only can they use language, they can reflect on how language is organized."
Beguš and his team fed 120 complex sentences into multiple versions of OpenAI's ChatGPT, as well as Meta's Llama 3.1. With each sentence, they instructed the system to analyze it, assess if it had a specific linguistic quality, and diagram it with what linguists call syntactic trees—visual representations of a sentence's structure and components.
In the sentence "Eliza wanted her cast out," for example, researchers wanted to know if AI could detect what's called ambiguous structure. Did Eliza want someone to be expelled? Or did she want a physical cast to be removed?
ChatGPT versions 3.5 and 4, as well as Llama, failed to detect the confusion. But OpenAI's o1 model, which is designed to "reason" through more complex questions, both spotted the ambiguity and accurately diagrammed it.
That was revealing and signaled the improvements the models were making, Beguš said. But he was especially interested in whether the systems could spot what linguists call recursion, sometimes referred to as "the infinity of language."
First theorized by Noam Chomsky, recursion is the ability for humans to embed phrases within other phrases, as in the sentence "The dog that chased the cat that climbed the tree barked loudly." This can lead to an endless nesting effect of sentences. Chomsky called it a defining feature of human language and one that separates us from other animals.
To test the concept of recursion, Beguš and his team prompted the AI models to identify if a sample sentence had it and what specific linguistic version it had. They also instructed the models to add another similar recursive clause.
Using the sentence "Unidentified flying objects may have conflicting characteristics," OpenAI's o1 detected the recursion—"flying" modifies "objects," and "unidentified" modifies "flying objects." It diagrammed the sentence. And it took the sentence to a new level: "Unidentified recently sighted flying objects may have conflicting characteristics."
Researchers wrote that "o1 significantly outperformed all others."
"This is very consequential," Beguš said, adding that it advances the debate about whether AI "understands" language or merely mimics it. "It means in these models, we have one of the rare things that we thought was human-only."
He added that the approach they used to study AI's understanding of language is one that linguists can use to assess other advances in AI chatbots. That, in turn, can help sort hype around the technology from the facts about how tools are actually improving.
"Everyone knows what it's like to talk about language," he said. "This paper creates a nice benchmark or criterion for how the model is doing. It is important to evaluate it scientifically."
More information: Gasper Begus et al, Large linguistic models: Investigating LLMs' metalinguistic abilities, IEEE Transactions on Artificial Intelligence (2025).
Provided by University of California - Berkeley