“Rising demand is driving the boom of digital humans,” says Shiyan Li, head of the digital human and robotics business at Baidu, which created the digital model-actor, Gong. “In China alone, there are over 400 million ACGN (animation, comics, games, and novel) fans, and an enterprise market worth hundreds of billions of dollars centered on digital humans.” And according to a company that tracks business registrations, Qichacha, China now has more than 280,000 enterprises that engage in digital human-related activities.
A different kind of digital
The debut of Baidu’s digital celebrity may not seem like much at first, as the concept of “virtual idols” has been around for years. For example, US virtual influencer Lil Miquela has been appearing alongside real human celebrities in online advertisements and TV commercials since 2016, gaining over three million Instagram followers. However, there is something different about the virtual Chinese star: a digital human with the ability to listen, speak, and interact with real humans at a level never seen before. And Gong’s digital duties are not limited to singing. On the latest update of Baidu App, China’s leading search-plus-feed app, Gong appears on users’ phones, helping with searches and queries using the model-actor’s real voice. Since this interactive search experience was launched in 2021, it has boosted the number of voice search queries on Baidu App by 18.2%.
Baidu AI Cloud first began developing a digital employee in 2019 in collaboration with Shanghai Pudong Development (SPD) Bank. Subsequently, they focused their efforts on building a digital financial advisor to provide a service equivalent to that of a human bank representative when real-life employees were unavailable. Today, SPD Bank says more than 460,000 customers rely on digital humans for banking services and portfolio management each month. “Access to digital humans outside of regular business hours allows SPD Bank to offer 24/7 customer service at low cost and high efficiency,” says a bank representative.
More recently, a Baidu-created virtual anchor provided live commentary in sign language at the 2022 Beijing Winter Games for hearing-impaired viewers. In addition to looking like a real person, the avatar was empowered with speech recognition and sign-language interpretation abilities to ensure rapid and highly accurate input and output. With approximately 430 million people around the world experiencing “disabling” hearing loss, according to the World Health Organizationthere is strong potential for this technology to be used to increase their ability to access a wide range of content.
XiLing: A new generation on an AI platform
From entertainment to public services, digital humans are set to play a greater role in our daily lives. But behind their natural and effortless appearance is a complex web of new and emerging technologies pushing the boundaries of AI innovation.
Baidu AI Cloud’s digital celebrity and virtual sign-language anchors were created through XiLing, a new digital platform launched in 2021. At the Baidu World 2022 event held on June 21, the company announced a new capability on XiLing, which supports the creation of digital humans that can be livestream hosts who can sing, dance, and respond to comments in real-time—without ever needing a single break. XiLing is unique in its ability to support the entire process of creating a digital human from crafting a realistic persona to endowing it with conversational and content-generation skills. One of its most striking attributes is speed. The platform can generate a 3D avatar based on a real person in one to two weeks, while a 2D avatar can be made in just a matter of minutes.
In addition, using XiLing’s intelligent dialogue tools, creators can quickly customize a digital human’s conversational ability, letting it adapt and learn over time. This capability is powered by Baidu’s PLATO, a hundred-billion-parameter dialogue model that enables digital humans to participate in open-domain conversations—that is, to understand any topic and provide relevant responses. Highly accurate speech recognition and lip-syncing with above-98.5% accuracy allows the digital human to have smoother, more human-like interactions. “Use of advanced AI technologies will keep bringing down the cost of building digital humans and significantly improve their interactions with real humans,” says Li.
Just as every real human has their own set of skills and talents, so too does the new generation of digital humans. This can even include giving digital humans the ability to be creative themselves, thanks to the recent progress made by large AI models like Baidu’s ERNIE, which can generate texts and create realistic images when prompted. Digital humans designed to serve as brand spokespersons, for example, can independently create and post on social media, design posters, and perform in videos.