Voice User Interface (VUI) has become one of the most natural and intuitive methods of human-machine interaction. Today, we already use it to control cars, smartphones, and numerous connected home devices. We even communicate with businesses such as banks or insurance companies via self-service voice applications, without using touch-tones (DTMF). Take a look at our previous posts on Connected Customers and Contact Centers where we cover these topics in more detail.
In fact, there are several reasons why voice UI is so cool:
All these topics are really interesting and we will cover them one by one in future blog posts.
Here, I’d like to concentrate on a more general use case, when a system takes natural language as input, processes it, creates some business logic, and properly responds with a natural output. So it looks more like a dialogue or conversation with a machine, which ends with a specific business transaction being performed, e.g. transferring money, ordering pizza or making a medical appointment. The diagram below illustrates the simplified architecture of such a system:
Anna wrote a few words about Lekta NLP in our first ever blog post; essentially, Lekta is an advanced Spoken Dialogue Framework that allows the creation of conversational voice interfaces for business applications. Imagine an automatic system for making an appointment with a dentist, ordering a takeaway or automating banking customer services. We won’t go too deeply into discussing NLP as such just now, rather we will concentrate on Lekta’s interface from the user’s perspective.
One possible model is based on receiving the output from the speech recognizer (ASR – automated speech recognition) and then responding through the speech synthesizer (TTS – text-to-speech). Basically, ASR takes the acoustic signal as an input and tries to determine which words were actually spoken. The output typically consists of a word graph – a lattice made up of word hypotheses.
In a future post, we will show exactly how Lekta benefits from using such a lattice and can even help to improve the quality of the speech recognizer itself. Here, we omit the communication details, whether it’s a phone call or a mobile app voice interaction.
Actually, Ratel (Contactis Group’s omnichannel business communications operator) fills in the missing piece by providing feature-rich, context-based communication allowing customers to connect to businesses in the most natural and intuitive way. But that’s another topic entirely.
For now, let’s say that the ASR module returns text results, each result with a certain level of confidence. Lekta then takes this data and extracts a meaning representation (NLU – natural language understanding), which is used by the Dialog Manager (DM) responsible for conversation management and business logic integration. The output from the DM, which is a non-linguistic meaning representation, is then taken to the NLG (natural language generation) which converts it into natural language. The final part of the whole process is produced by the TTS module, which converts the natural language into speech. We will cover each step in more detail in future posts.
The obvious question is this: what if Lekta receives the wrong speech recognition results from the ASR? For instance, Barcelona and Pampeluna could sound very similar in Spanish. Well, here we can, in fact, have a couple of options. The first is based on the business logic – let’s say a client wants to book a flight on a specific date. Lekta can check with the database to confirm if there are flights to Pampeluna on that day, and if there aren’t any, it will be assumed that the client meant Barcelona. However, if there are flights to both cities on the same day, and we know that these names often cause a recognition problem, the system could confirm if the customer requested exactly that particular city (using a “yes-no” question). This may worsen the user experience a little but at least we could proceed with their request, which in this case is more important.
In general, Lekta tries to solve these kinds of problems by controlling the ASR with so-called expectations. Let’s consider an automated medical appointment system, where, at a certain moment, the system asks for the ID of a caller. In this case, Lekta can inform the ASR that it expects digits by providing the more specific (smaller) grammar. In the end, this improves the overall speed and quality of speech recognition.
An important thing about Lekta is that it is language-agnostic and completely independent from the ASR/TTS, so any vendor can be used. It could be more cost effective if a business has already been using an ASR/TTS engine, or uses solutions developed by an affiliated company, which in the case of Lekta is Techmo, a company with some of the best voice technologies in Europe on offer.
As we mentioned at the beginning of this post, our voice contains more than just words. There are solutions available that can detect emotional state, age or gender with a certain level of confidence. Techmo’s solutions can be used for this as well.
According to “Gender recognition from vocal source” research, the male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%. This is very important for languages with a grammatical gender like Polish or Spanish, and it mostly affects the NLG module.
The emotional state (like joy, sadness, anger, fear, surprise, or a neutral state) in a voice channel is a harder thing to determine. According to various reports, the efficiency is around 45% in the case of male voices and in the case of female voices, it’s around 48%. Lekta can be configured to adjust the dialog strategy depending on the caller’s emotional state. For example, if the customer is detected to be angry or rude, Lekta can transfer the call to a live agent.
Also, Lekta can switch the dialog strategy depending on the age of a caller, for example, by making it more informal while talking to younger customers. All these dynamic parameters (age, gender, emotional state, etc.) can be used by Lekta in order to improve the overall user experience. It’s, therefore, quite difficult to show all the powers of Lekta in one article… which has become quite lengthy with my musing at Lekta’s possibilities! I do hope you enjoyed it though :).
To summarize, in this article I’ve tried to give you a short overview of how Lekta is available via voice interface and what additional features it can leverage. With this in mind, I’d also like to start a series of blog posts in which we are going to cover the details of Lekta NLP.
Today, we as a society feel more and more comfortable with hands-free, non-visual interactions. Voice interfaces will definitely continue extending into other areas of our lives and activities. And thus, so will Lekta.
As we’re enjoying listening to some amazing speakers at FETLT2016, co-organized by our very own CTO – José F. Quesada, there are some things we would like to share with you about another language technologies-related conference, which we attended last week in Brussels – LT-Accelerate.
LT-Accelerate is the premier European conference focusing on building value through language technology. The purpose of the conference is to connect text, speech, social and big data analysis technologies to a spectrum of corporate and public sector applications and also to present the state of language technologies in the industry today.
José had the opportunity to talk about Lekta in the context of Conversational Interaction for Businesses and presented the results of more than 4 years of intensive work focused on the creation of advanced, collaborative and fluent conversational interfaces.
Conversational interfaces have become a hot topic. Many companies have been making huge investments in researching technologies related to artificial intelligence, with a special emphasis on machine learning, deep neural networks and natural language understanding. Their aim is mostly to create intelligent assistants that will enable users to interact with information and services in a more natural and conversational way.
Companies have been using dialogue systems or conversational technologies in general for a number of years, mainly for customer service and typically to replace or assist live agents in call centers or as an alternative to point-and-click interfaces for their websites. But lately, a number of factors are ushering them in a new era of conversational interaction.
Advances in cognitive technologies are making it possible to provide increasingly accurate and relevant automated dialogues. For example, speech recognition software has made advances in reducing word error rates, and machine translation has improved thanks to deep learning techniques. Moreover, improvements in speech and language processing technologies are making conversational interaction more capable, expanding their potential applications across the enterprise.
As technology is evolving faster than ever before, consumer preferences undergo their own fundamental change as well. According to some observers, the app ecosystem appears to be burdened by a kind of “app fatigue”—a declining willingness among consumers to install and use new mobile apps. Quite unexpectedly though, during this shift of the app ecosystem, messaging has emerged as a dominant online activity, with brands trying to take advantage of conversational technologies as a new consumer interaction channel.
Deloitte Global predicts that by the end of 2016 more than 80 of the world’s 100 largest enterprise software companies by revenues will have integrated cognitive technologies into their products. That’s a 25 percent increase on the prior year. By 2020, it’s expected that the number will rise to about 95 out of 100.
Specifically, during 2016 and the next few years, the cognitive technologies that are and will be the most important in the enterprise software market will include advanced Speech Recognition, Natural Language Understanding and Machine Learning Technologies.
Providing computers with the human capability of language understanding has proven to be one of the most complex computational challenges in Artificial Intelligence development. At the same time though, the opportunity created in the industry at large – as we overcome the last technical challenges – cannot be overlooked. Conversational business interaction is already transforming Customer Support, User Experience and Business Intelligence, among other fields. At the same time, new terms like “conversational commerce” are being coined.
In this ever-changing landscape, the only thing that remains clear about the future is that no successful business can afford to ignore this trend.
Alex Waibel, from Carnegie Mellon University and Karlsruhe Institute of Technology, raised this point during his speech after receiving the META Prize at the recent META-FORUM event held in Lisbon on 4/5 July 2016. Perhaps you could consider any of the multiple other languages spoken in Europe.
By the way, have you ever thought about how many languages, or dialects, are spoken world-wide? Although there are some 7,000 languages registered, the list of the top 25 languages only represent around 50% of the world population. Curiously enough, some publications mention that there are 46 languages that have just a single speaker.
And what about Europe? Well, in his presentation about the digital vitality of European language, András Kornai from the Hungarian Academy of Sciences mentioned a list of 283 European languages and dialects.
By the way, the difference between what’s a language and what’s a dialect can sometimes be very diffuse. Don’t forget the famous quote on this point: „A language is a dialect with an army and navy”. But even with 283, Europe is not the richest linguistic area in the world. For example, more than 850 languages are spoken in Papua New Guinea alone, a country with less than 8 million people.
But in any case, language is currently a major barrier for the economic and social development of Europe. This is the key motto of the Multilingual Single Digital Market (MSDM). Georg Rehm, from DFKI, current META-NET General Secretary, summarized this challenge with the sentence „Don’t understand, won’t buy” during his presentation of a new version (0.9) of The Strategic Agenda for the MDSM.
However, I would like to highlight two key, inspiring ideas mentioned during the two intense working days in Lisbon.
Ryan McDonald from Google focused on Multilingual Europe as a Challenge for Language Technologies.
The key points he presented were quite strong and very relevant for this community:
António Branco, Principal Researcher of one of the most prominent EU-funded projects on Machine Translation (qtleap), used an insightful idea for motivation in his talk.
In the past, with the advent of PCs, companies reached out to their customers with websites. Currently, with the consolidation of smartphones, the strategy for reaching clients is dominated by the use of mobile apps.
Recently, a CEO of a large social network at an annual conference proposed that, in the future, companies will reach their clients using chatbots.
Summing up, Multilinguality, Mobile and Conversational Interfaces will play a critical role in the immediate future. It’s important to create solutions that won’t be limited to English or any other single language.
Fortunately, Lekta has been designed to take into account all these challenges, and now we are ready to put it into action. Stay tuned!
Pictures and graphics:
Artificial intelligence (AI) is taking over the world, and intelligent B2C communications are no exception. State of the art Spoken Dialogue Systems are here to stay and they will become smarter and smarter every year.
Here are the 3 ways that NLP Dialog Systems are moving the needle today:
You’re probably familiar with Queen’s song „I Want It All„. You’ve probably also heard the expression „connected customer” or „hyper-connected consumer” (used to describe people who are almost constantly connected to the internet in some way via computers, tablets, smartphones, e-readers, IoT devices, etc.). But do you know what these two things have in common?
Yeah, you guessed it – it’s mostly the fact that some of the lyrics very nicely describe the idea of connected customer:
So much to do in one lifetime (people do you hear me)
Not a man for compromise and ‚wheres’ and ‚whys’ and living lies
So I’m living it all, yes I’m living it all
And I’m giving it all, and I’m giving it all
It ain’t much I’m asking, if you want the truth
Here’s to the future, hear the cry of youth
I want it all, I want it all, I want it all, and I want it now
So, let’s break those lyrics down and analyze their meaning for businesses that want to stay close to their customers it the era of Everything Connected.
We’re all very busy, and it almost seems like the more life-enhancing technologies there are, the busier we become. We don’t have time to overanalyze every single purchase (although, personally, I’m often guilty of doing just that – and then I realize that I really shouldn’t have wasted several hours on deciding which external hard drive to buy). We turn to people we trust for their advice and when they’re unable to help, we search for reviews online.
Only in the final stage of the buying decision process do we maybe have the need to contact a sales rep or customer support agent. And when we do, we want them to answer our query as quickly and efficiently as possible. „Do you have it in blue? No? OK, what color would you recommend? Great, sold.” And what happens if the person on the other side of the phone or chat isn’t able to help? We probably just move on to a different store.
Not having enough time also means that we use many devices simultaneously, like when we browse Facebook on our smartphones while watching a movie on Netflix. We also spend more and more time on mobile devices – while commuting, exercising, cooking, just before going to sleep, etc. They’ve become a standard part of our lives. Instead of reading a book on a bus, we take out our smartphones, tablets, smartwatches or e-readers. Those are the moments when we can look at product reviews, browse other products and services online and even actually buy something.
We often finish the buying process on a different device than the one we started on. We want this experience being truly omnichannel – a seamless transition from mobile to desktop website, for example, without losing the content of our cart. Or, being able to start a chat with a customer support agent on the computer while browsing the same e-commerce store on our tablet. And yes, we expect those agents to know the context in which we’re contacting them. The same goes for communicating with businesses via social media, phone or any other channel.
We have so much to do in one lifetime that we don’t want to waste our time on talking to agents who not only don’t know who we are but also have no idea what we’re contacting them about or how to answer our question quickly.
If you don’t have it, I’ll get it somewhere else. That’s today’s customer’s approach these days.
As Brian Solis put it in his e-book Digital Darwinism and the Dawn of Generation C, „digital natives are connected, discerning, informed and elusive.” But, „they’re among the most loyal customers you could hope to attract”. Because, once customers are treated in a friendly and highly personalized manner in one place, they have no reason to go anywhere else. Connected consumers expect instant gratification and highly individualized treatment. And they know when you’re only pretending to be friendly.
An amazing customer experience is paramount – not the price or product. It has to be hyper-personalized, timely (even on weekends) and right to the point. It doesn’t mean that all customer support agents have to be available 24/7, though. The clients are very willing to answer their own questions as long as they’re enabled to do so. According to Forrester, web self-service was the communication channel most often used for customer service in 2014 – more even than phone contact.
It’s easy to find everything online and customers know it. Make sure that once they get to your website or app that they don’t have to look any further.
No matter if it’s Snapchat, web chat or phone – customers want to be able to contact all businesses wherever and whenever it is most convenient for them. It’s important to build brand presence at every possible touchpoint and to deliver excellent customer experience everywhere.
These days it’s more about interacting with customers wherever and whenever they feel like interacting, as opposed to bothering them with content they don’t want to see. This means that you have to be ready to interact at any given point in any given channel without ever losing the context or history of that interaction. If someone has been your customer for a couple of years already, they want you to remember that they always buy a pack of socks in January in June. And they’ll be glad if you remind them about it at the right time.
Today customers are willing to share more information with businesses than ever before but they do it only because they want to get a personalized experience. If they don’t get it, they’ll feel like they’ve been cheated. You can’t afford to let that happen because – as you already know – customers are fickle. And they will take offense. And probably never come back to you.
So, gather only the data you really need and make good use of it. Don’t require customers to provide you with weird information „just in case”. Believe me, it will pay off. Need some stats to prove it?
If you think about it, connected customers don’t ask for much – they just want to be treated like humans. OK, more than just humans. They want businesses to be like good friends they can trust. And all they ask is that trust to never be broken.
As in every friendship, it can have its ups and downs. And that’s OK (really!), as long as you, as a business or brand, are always honest with your customers.
Trust, honesty, transparency, humility and a willingness to learn from the customer – those are the five things connected customers expect. In other words, the same things you would expect from a good friend. But let’s add two more things – accessibility and timeliness. Because customers want to be your friends but they don’t want to have to wait for you.
Bear in mind that you have to adjust to how each of your customers wants to interact with you. Don’t try to be best buds with someone who only wants to ask about their order status and really doesn’t want to build any further relationship with you. You can try to talk to that person at a more convenient time and slowly build their profile.
Most customers will be very open with you if you give them a reason for that openness. But people are different and in business, as in life, you can’t make someone like if they don’t want to. And as long as you know you’ve done everything you could, and you weren’t too bothersome while doing it, it’s fine. Focus on the customers who do appreciate the relationship they have with you.
The digital customer revolution is just starting. If you think you’ve planned everything ahead and are good to go for another five or ten years, then think again. The connected customers of today are nothing compared to the connected customers of tomorrow.
Generation C, as some like to call the connected consumers, are not defined by age but by their connectedness. They use multiple devices, they multitask and they expect an omnichannel experience.
But you should pay attention to the younger group of consumers, who are not only connected but hyper-connected: Generation Y (or: Millennials; born between 1980 and 2000). They adopt the newest technologies quicker than any consumers before them. They know everything about being constantly online, the Internet of Things and Virtual Reality. They’re often driven by a sudden need to own something and their buying decisions tend to be quick and guided by emotion rather than reason.
They expect to get a seamless, hyper-personalized, meaningful customer experience across all channels, in both the old and newly emerging ones, and they feel deprived if they don’t get it. They’re not afraid to voice their opinions. And since they’re hyperconnected to people and things around them, those opinions are always heard loud and clear. I guess I don’t have to explain what implications that might have for your business.
Connected customers, especially the Millennials, are impatient. If they want something, they need to get it right away. That means you have to constantly tend to their needs and be around when they want you. At the very least, you have to be on their mental radar. Otherwise, they’ll quickly find another store or another provider of services.
According to the Huffington Post, 2016 is the Year of the Connected Customer. This is only the beginning and now is the time to get ahead of this trend.
Build trust, build loyalty and use your customers’ data wisely to show them you care – wherever and whenever they need you.
And yes, if you need any help, Lekta is here for you. And so is Ratel (our context-driven business communications platform/operator that integrates perfectly with Lekta). But that’s a topic for another day :).
Feel free to contact me if you want to talk about the right tools for your business to make sure you stay connected to your customers easily and cost-effectively.
And now, to bring everything full circle, let’s listen to some Queen:
The technology landscape is moving fast. In this ever-changing environment, Natural Language has re-surfaced as one of the biggest and most important AI components in the industry today. Here’s what 2017 will bring to A.I. and B2C communications in general.