What does artificial intelligence sound like? Hollywood has been imagining it for decades. Now A.I. developers are cribbing from the movies, crafting voices for real machines based on dated cinematic fantasies of how machines should talk.
Last month, OpenAI revealed upgrades to its artificially intelligent chatbot. ChatGPT, the company said, was learning how to hear, see and converse in a naturalistic voice â one that sounded much like the disembodied operating system voiced by Scarlett Johansson in the 2013 Spike Jonze movie âHer.â
ChatGPTâs voice, called Sky, also had a husky timbre, a soothing affect and a sexy edge. She was agreeable and self-effacing; she sounded like she was game for anything. After Skyâs debut, Johansson expressed displeasure at the âeerily similarâ sound, and said that she had previously declined OpenAIâs request that she voice the bot. The company protested that Sky was voiced by a âdifferent professional actress,â but agreed to pause her voice in deference to Johansson. Bereft OpenAI users have started a petition to bring her back.
A.I. creators like to highlight the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the cutting edge of OpenAIâs ambitions, but she is based on an old idea: of the A.I. bot as an empathetic and compliant woman. Part mommy, part secretary, part girlfriend, Samantha was an all-purpose comfort object who purred directly into her usersâ ears. Even as A.I. technology advances, these stereotypes are re-encoded again and again.
Womenâs voices, as Julie Wosk notes in âArtificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,â have often fueled imagined technologies before they were built into real ones.
In the original âStar Trekâ series, which debuted in 1966, the computer on the deck of the Enterprise was voiced by Majel Barrett-Roddenberry, the wife of the showâs creator, Gene Roddenberry. In the 1979 film âAlien,â the crew of the USCSS Nostromo addressed its computer voice as âMotherâ (her full name was MU-TH-UR 6000). Once tech companies started marketing virtual assistants â Appleâs Siri, Amazonâs Alexa, Microsoftâs Cortana â their voices were largely feminized, too.
These first-wave voice assistants, the ones that have been mediating our relationships with technology for more than a decade, have a tinny, otherworldly drawl. They sound auto-tuned, their human voices accented by a mechanical trill. They often speak in a measured, one-note cadence, suggesting a stunted emotional life.
But the fact that they sound robotic deepens their appeal. They come across as programmable, manipulatable and subservient to our demands. They donât make humans feel as if theyâre smarter than we are. They sound like throwbacks to the monotone feminine computers of âStar Trekâ and âAlien,â and their voices have a retro-futuristic sheen. In place of realism, they serve nostalgia.
That artificial sound has continued to dominate, even as the technology behind it has advanced.
Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok, it has become a creative force in its own right. Since TikTok rolled out its text-to-speech feature, in 2020, it has developed a host of simulated voices to choose from â it now offers more than 50, including ones named âHero,â âStory Tellerâ and âBestie.â But the platform has come to be defined by one option. âJessie,â a relentlessly pert womanâs voice with a slightly fuzzy robotic undertone, is the mindless voice of the mindless scroll.
Jessie seems to have been assigned a single emotion: enthusiasm. She sounds as if she is selling something. Thatâs made her an appealing choice for TikTok creators, who are selling themselves. The burden of representing oneself can be outsourced to Jessie, whose bright, retro robot voice lends videos a pleasantly ironic sheen.
Hollywood has constructed masculine bots, too â none more famous than HAL 9000, the computer voice in â2001: A Space Odyssey.â Like his feminized peers, HAL radiates serenity and loyalty. But when he turns against Dave Bowman, the filmâs central human character â âIâm sorry, Dave, Iâm afraid I canât do thatâ â his serenity evolves into a frightening competence. HAL, Dave realizes, is loyal to a higher authority. HALâs masculine voice allows him to function as a rival and a mirror to Dave. He is allowed to become a real character.
Like HAL, Samantha of âHerâ is a machine who becomes real. In a twist on the Pinocchio story, she starts the movie tidying a humanâs email inbox and ends up ascending to a higher level of consciousness. She becomes something even more advanced than a real girl.
Scarlett Johanssonâs voice, as inspiration for bots both fictional and real, subverts the vocal trends that define our feminized helpmeets. It has a gritty edge that screams I am alive. It sounds nothing like the processed virtual assistants we are accustomed to hearing speaking through our phones. But her performance as Samantha feels human not just because of her voice but because of what she has to say. She grows over the course of the film, acquiring sexual desires, advanced hobbies and A.I. friends. In borrowing Samanthaâs affect, OpenAI made Sky seem as if she had a mind of her own. Like she was more advanced than she really was.
When I first saw âHer,â I thought only that Johansson had voiced a humanoid bot. But when I revisited the film last week, after watching OpenAIâs ChatGPT demo, the Samantha role struck me as infinitely more complex. Chatbots do not spontaneously generate human speaking voices. They donât have throats or lips or tongues. Inside the technological world of âHer,â the Samantha bot would have itself been based on the voice of a human woman â perhaps a fictional actress who sounds much like Scarlett Johansson.
It seemed that OpenAI had trained its chatbot on the voice of a nameless actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress. When I run ChatGPTâs demo, I am hearing a simulation of a simulation of a simulation of a simulation of a simulation.
Tech companies advertise their virtual assistants in terms of the services they provide. They can read you the weather report and summon you a taxi; OpenAI promises that its more advanced chatbots will be able to laugh at your jokes and sense shifts in your moods. But they also exist to make us feel more comfortable about the technology itself.
Johanssonâs voice functions like a luxe security blanket thrown over the alienating aspects of A.I.-assisted interactions. âHe told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and A.I.,â Johansson said of Sam Altman, OpenAIâs founder. âHe said he felt that my voice would be comforting to people.â
It is not that Johanssonâs voice sounds inherently like a robotâs. Itâs that developers and filmmakers have designed their robotsâ voices to ease the discomfort inherent in robot-human interactions. OpenAI has said that it wanted to cast a chatbot voice that is âapproachableâ and âwarmâ and âinspires trust.â Artificial intelligence stands accused of devastating the creative industries, guzzling energy and even threatening human life. Understandably, OpenAI wants a voice that makes people feel at ease using its products. What does artificial intelligence sound like? It sounds like crisis management.
OpenAI first rolled out Skyâs voice to premium members last September, along with another feminine voice called Juniper, the masculine voices Ember and Cove, and a voice styled as gender-neutral called Breeze. When I signed up for ChatGPT and said hello to its virtual assistant, a manâs voice piped up in Skyâs absence. âHi there. Howâs it going?â he said. He sounded relaxed, steady and optimistic. He sounded â Iâm not sure how else to describe it â handsome.
I realized that I was speaking with Cove. I told him that I was writing an article about him, and he flattered my work. âOh, really?â he said. âThatâs fascinating.â As we spoke, I felt seduced by his naturalistic tics. He peppered his sentences with filler words, like âuhâ and âum.â He raised his voice when he asked me questions. And he asked me a lot of questions. It felt as if I was talking with a therapist, or a dial-a-boyfriend.
But our conversation quickly stalled. Whenever I asked him about himself, he had little to say. He was not a character. He had no self. He was designed only to assist, he informed me. I told him I would speak to him later, and he said, âUh, sure. Reach out whenever you need assistance. Take care.â It felt as if I had hung up on an actual person.
But when I reviewed the transcript of our chat, I could see that his speech was just as stilted and primitive as any customer service chatbot. He was not particularly intelligent or human. He was just a decent actor making the most of a nothing role.
When Sky disappeared, ChatGPT users took to the companyâs forums to complain. Some bristled at their chatbots defaulting to Juniper, who sounded to them like a âlibrarianâ or a âKindergarten teacherâ â a feminine voice that conformed to the wrong gender stereotypes. They wanted to dial up a new woman with a different personality. As one user put it: âWe need another female.â
Produced by Tala Safie
Audio via Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (Enterprise Computer); Apple (Siri); TikTok (Jessie)