Human-like voicebots? How our conversation designers teach voicebots to speak
Thanks to advancements in the voice tech industry, the synthetic voices of voicebots are starting to resemble realistic-sounding voices with all the features and tiny details which make human speech unique. Artificial intelligence has enabled us to make robotic voices sound more natural. Today's voicebots can alter their speech tempo or tone of voice. They can also sound like they are breathing or even show emotions in their voice, just like a human would. All of that is achieved by conversation designers.
What we need to teach our voicebots
So that conversations with them feel as natural as possible, voicebots must learn to understand speech and respond to it quickly and appropriately. AI helps voicebots with understanding human speech: they use speech recognition systems (NLP/NLU = natural language processing / natural language understanding). Other technologies used include STT (speech-to-text) and TTS (text-to-speech), which help voicebots convert speech into text and vice versa.
However, every conversation has two sides: it is an exchange of information and factual content, but also of emotions. The point of a conversation is not just what the voicebot says, but also how it says it. The “how” is also the responsibility of conversation designers. They humanize voicebots to a certain level and make sure that conversations with voicebots follow the principles of natural human communication.
The biggest challenges of conversational design
Moods and emotions: Understanding sentiment
The biggest challenge with understanding human speech lies in the voicebot's (in)ability to read emotions; i.e., understanding sentiment: being able to identify age, mood, urgency, attitude, level of interest, and other attributes in the caller's speech. For example, if an elderly person or a person in a stressful situation is calling, it is the voicebot's job to adapt not only the content of the message, but also the speech tempo and communication style. By doing that, we can make sure that the conversation is pleasant, natural and leads to a successful outcome. On the other hand, the fact that voicebots never experience stress can also serve as an advantage. In situations where a human operator, though well-trained to handle non-standard situations, might fail, voicebots keep their 'emotions' in check. This can be useful both in the commercial sector and in social services, healthcare and other areas where voicebots help people.
'Humanizing' our voicebots
Making voicebots sound more natural and human-like has been a challenging task. In VOCALLS, we have managed to teach voicebots how to emphasize selected words and how to change the pitch or tone of their voice. We can also alter the dynamics of their speech by making it seem like they are thinking through adding breathing noises and interjections ("hmm"), as well as by making their voice resonate in the environment.
Moreover, we can modify their voice to some extent to show emotion. For example, they can sound apologetic, excited, stern, or angry.
“Teaching a voicebot is significantly more difficult than teaching a human. While a baby will very quickly understand adult behavior and naturally start to imitate it, voicebots cannot do that without our constant assistance. Training them requires a lot of patience and determination. But if you don't give up, they will reward you with speech which sounds so human that it catches you off guard!” says Anna Ješátková, Conversation Designer at VOCALLS.
A voicebot has its own personality
Conversation designers are also tasked with adapting the speech of voicebots to the needs of a specific client or for a specific situation. The content and communication style will be different in industries such as banking, entertainment or consumer goods. And even in banking itself, we can observe a clear difference in communication when discussing debt collection or apologizing to the client for a system outage.
Conversation designers work with the "persona" of the voicebot, much like marketing professionals do. In doing so, they actually specify human characteristics that the voicebot and its voice should have. What are all the things they have to consider when adjusting synthetic voices and creating flows?
External circumstances and environment: the industry; the identity, vision, mission, values and goals of the given company; who the target customer is, their most frequent requests; the most important information provided to customers.
The personal identity of the voicebot: what role they will have in relation to the customer (will they be a trainer/coach/caregiver/teacher?); name, gender and age; appearance, personality and character; personal story; language and dialect.
The voice and its specifics: the type of voice (a generic voice, a clone of a real human voice etc.); the tempo, speed, pitch, tone, energy, style, accent; breathing; various interjections, sounds and phrases typical of the industry or situation.
To err is human also applies to voicebots
Combining all these elements into the final voice of a voicebot is a daunting task. Most synthetic voices are still too robotic and somewhat monotonous, despite the inclusion of various 'emotional' elements. What makes human voices so special is their inconsistency, expressiveness, and ability to deliver the same message in completely different ways depending on the context. And, especially with longer messages, voicebots have not quite managed to master that aspect. Working on these subtle details is currently the biggest challenge for conversation designers, as well as AI developers. This is borne out by a survey on the most common "bot" mistakes reported by companies using these intelligent assistants. Among the top five errors we can see “misunderstanding the nuances of human dialogue” or “difficulty understanding accents”.
Errors Organizations Have Encountered Using Intelligent Assistants or Chatbots in the Workplace
Among organizations currently using intelligent assistants or chatbots
At VOCALLS, we want to stay one step ahead in product development. Our team consists of experienced linguists and enthusiastic conversation designers who have set up a division specializing in the "humanization" of voicebots. We have established collaborations with world-leading experts in voice tech who lead workshops for our conversation designers. At the same time, we are finding our own ways to approach all sorts of challenges. We have cloned our colleague (in Czech) Franta and created his double FrantaBot. We have tried to teach voicebots to breathe by giving them 'lungs'. Last but not least, our voicebots can even sing Christmas carols (in Czech).
Listen to how talented they are! And sign up for our newsletter (in Czech) so you don't miss any more of their songs.