In 1637, the French philosopher and probable pothead René Descartes came up with an interesting thought: can a machine think? In 1950, the English mathematician and computer scientist Alan Turing announced the answer to this 300-year-old poser: who cares? A much better question, he said, was something that would come to be known as the “Turing test”: given a person, a machine, and a human interrogator, could the machine ever convince the interrogator that it was actually the person?
Now, another 74 years after Turing reformulated the question in this way, researchers at the University of California, San Diego, believe they have the answer. According to a new study, in which they had human participants talk to either one of a variety of artificial intelligence systems or another human for five minutes, the answer is now a tentative “yes.”
“Participants in our experiment were no better than chance at identifying GPT-4 after a five minute conversation, suggesting that current AI systems are capable of deceiving people into believing that they are human,” confirms the preprint paper, which is not yet peer-reviewed. “The results here likely set a lower bound on the potential for deception in more naturalistic contexts where, unlike the experimental setting, people may not be alert to the possibility of deception or exclusively focused on detecting it.”
Now, while this is certainly a headline-grabbing milestone, it’s by no means a universally accepted one. “Turing originally envisioned the imitation game as a measure of intelligence,” the researchers explain, but “a variety of objections have been raised to this idea.” Humans, for example, are famously good at anthropomorphizing just about anything – we want to empathize with things, regardless of whether they’re another person, a dog, or a Roomba with a pair of googly eyes stuck on top.
On top of that, it’s notable that ChatGPT-4 – and ChatGPT-3.5, which was also tested – only convinced the human participants of its personhood about 50 percent of the time – not much better than random chance. So how do we know that this result means anything at all?
Well, one failsafe that the team built into the experiment design was to include ELIZA as one of the AI systems. She was one of the very first ever such programs, created in the mid-60s at MIT, and while she was undoubtedly impressive for the time, it’s fair to say she’s not much on modern large-language model-, or LLM-, based systems.
“ELIZA was limited to canned responses, which greatly limited its capabilities. It might fool someone for five minutes, but soon the limitations would become clear,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science. “Language models are endlessly flexible, able to synthesize responses to a broad range of topics, speak in particular languages or sociolects and portray themselves with character-driven personality and values. It’s an enormous step forward from something hand-programmed by a human being, no matter how cleverly and carefully.”
In other words, she was perfect to serve as a baseline for the experiment. How do you account for lazy test subjects just randomly choosing between “human” or “machine”? Well, if ELIZA scores as high as random chance, then probably people aren’t taking the experiment seriously – she’s just not that good. How do you tell how much of the effect is just humans anthropomorphizing anything they interact with? Well, how much were they convinced by ELIZA? It’s probably about that much.
In fact, ELIZA scored 22 percent – convincing barely more than one in five people that she was human. This lends weight to the idea that ChatGPT really has passed the Turing test, the researchers write, since test subjects were clearly able to reliably distinguish some computers from people – just not ChatGPT.
So, does this mean we’re entering a new phase of human-like artificial intelligence? Are computers now just as intelligent as us? Perhaps – but we probably shouldn’t be too hasty in our pronouncements.
“Ultimately, it seems unlikely that the Turing test provides either necessary or sufficient evidence for intelligence, but at best provides probabilistic support,” the researchers explain. Indeed, the participants weren’t even relying on what you might consider signs of “intelligence”: they “were more focused on linguistic style and socio-emotional factors than more traditional notions of intelligence such as knowledge and reasoning,” the paper reports, which “could reflect interrogators’ latent assumption that social intelligence is has become the human characteristic that is most inimitable by machines.”
Which raises a worrying question: rather than the rise of the machines, is the greater problem rather the fall of the humans?
“Although real humans were actually more successful, persuading interrogators that they were human two thirds of the time, our results suggest that in the real-world people might not be able to reliably tell if they’re speaking to a human or an AI system,” Cameron Jones, co-author of the paper, told Tech Xplore.
“In fact, in the real world, people might be less aware of the possibility that they’re speaking to an AI system, so the rate of deception might be even higher,” he cautioned. “I think this could have implications for the kinds of things that AI systems will be used for, whether automating client-facing jobs, or being used for fraud or misinformation.”
The study, which has not yet been peer-reviewed, has been posted as a preprint to the arXiv.