Loading
Loading

Talking to machines more naturally than ever before—voice Interface for Lekta NLP

Talking to machines more naturally than ever before—voice Interface for Lekta NLP

Voice User Interface (VUI) has become one of the most natural and intuitive methods of human-machine interaction. Today, we already use it to control cars, smartphones, and numerous connected home devices. We even communicate with businesses such as banks or insurance companies via self-service voice applications, without using touch-tones (DTMF). Take a look at our previous posts on Connected Customers and Contact Centers where we cover these topics in more detail.

In fact, there are several reasons why voice UI is so cool:

  • very often, it’s a more suitable way of, let’s say, providing a POI (point of interest) into navigation or making a money transfer while driving a car, or any other activity that requires our (almost) full attention
  • In most cases, we speak faster than we type
  • our voice contains more than just words, i.e. our emotional state, age, and even gender information, which can be leveraged by business logic to improve the customer experience
  • voice biometrics is a great way of solving privacy and security issues
  • it can significantly simplify the user interface, i.e. decrease the number of tabs, menus, and other navigation elements. It’s like making responsive web design (RWD) more “responsive” in a literal sense.

All these topics are really interesting and we will cover them one by one in future blog posts.

Processing natural language – naturally

Here, I’d like to concentrate on a more general use case, when a system takes natural language as input, processes it, creates some business logic, and properly responds with a natural output. So it looks more like a dialogue or conversation with a machine, which ends with a specific business transaction being performed, e.g. transferring money, ordering pizza or making a medical appointment. The diagram below illustrates the simplified architecture of such a system:

Natural language processing system: from ASR to NLP (Lekta) to TTS

Anna wrote a few words about Lekta NLP in our first ever blog post; essentially, Lekta is an advanced Spoken Dialogue Framework that allows the creation of conversational voice interfaces for business applications. Imagine an automatic system for making an appointment with a dentist, ordering a takeaway or automating banking customer services. We won’t go too deeply into discussing NLP as such just now, rather we will concentrate on Lekta’s interface from the user’s perspective.

One possible model is based on receiving the output from the speech recognizer (ASR – automated speech recognition) and then responding through the speech synthesizer (TTS – text-to-speech). Basically, ASR takes the acoustic signal as an input and tries to determine which words were actually spoken. The output typically consists of a word graph – a lattice made up of word hypotheses.

In a future post, we will show exactly how Lekta benefits from using such a lattice and can even help to improve the quality of the speech recognizer itself. Here, we omit the communication details, whether it’s a phone call or a mobile app voice interaction.

Actually, Ratel (Contactis Group’s omnichannel business communications operator) fills in the missing piece by providing feature-rich, context-based communication allowing customers to connect to businesses in the most natural and intuitive way. But that’s another topic entirely.

For now, let’s say that the ASR module returns text results, each result with a certain level of confidence. Lekta then takes this data and extracts a meaning representation (NLU – natural language understanding), which is used by the Dialog Manager (DM) responsible for conversation management and business logic integration. The output from the DM, which is a non-linguistic meaning representation, is then taken to the NLG (natural language generation) which converts it into natural language. The final part of the whole process is produced by the TTS module, which converts the natural language into speech. We will cover each step in more detail in future posts.

Recognizing the context

The obvious question is this: what if Lekta receives the wrong speech recognition results from the ASR? For instance, Barcelona and Pampeluna could sound very similar in Spanish. Well, here we can, in fact, have a couple of options. The first is based on the business logic – let’s say a client wants to book a flight on a specific date. Lekta can check with the database to confirm if there are flights to Pampeluna on that day, and if there aren’t any, it will be assumed that the client meant Barcelona. However, if there are flights to both cities on the same day, and we know that these names often cause a recognition problem, the system could confirm if the customer requested exactly that particular city (using a “yes-no” question). This may worsen the user experience a little but at least we could proceed with their request, which in this case is more important.

In general, Lekta tries to solve these kinds of problems by controlling the ASR with so-called expectations. Let’s consider an automated medical appointment system, where, at a certain moment, the system asks for the ID of a caller. In this case, Lekta can inform the ASR that it expects digits by providing the more specific (smaller) grammar. In the end, this improves the overall speed and quality of speech recognition.

An important thing about Lekta is that it is language-agnostic and completely independent from the ASR/TTS, so any vendor can be used. It could be more cost effective if a business has already been using an ASR/TTS engine, or uses solutions developed by an affiliated company, which in the case of Lekta is Techmo, a company with some of the best voice technologies in Europe on offer.

Adjusting strategy depending on who is calling

As we mentioned at the beginning of this post, our voice contains more than just words. There are solutions available that can detect emotional state, age or gender with a certain level of confidence. Techmo’s solutions can be used for this as well.

According to “Gender recognition from vocal source” research, the male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%. This is very important for languages with a grammatical gender like Polish or Spanish, and it mostly affects the NLG module.

The emotional state (like joy, sadness, anger, fear, surprise, or a neutral state) in a voice channel is a harder thing to determine. According to various reports, the efficiency is around 45% in the case of male voices and in the case of female voices, it’s around 48%. Lekta can be configured to adjust the dialog strategy depending on the caller’s emotional state. For example, if the customer is detected to be angry or rude, Lekta can transfer the call to a live agent.

Also, Lekta can switch the dialog strategy depending on the age of a caller, for example, by making it more informal while talking to younger customers. All these dynamic parameters (age, gender, emotional state, etc.) can be used by Lekta in order to improve the overall user experience. It’s, therefore, quite difficult to show all the powers of Lekta in one article… which has become quite lengthy with my musing at Lekta’s possibilities! I do hope you enjoyed it though :).

More to come

To summarize, in this article I’ve tried to give you a short overview of how Lekta is available via voice interface and what additional features it can leverage. With this in mind, I’d also like to start a series of blog posts in which we are going to cover the details of Lekta NLP.

Today, we as a society feel more and more comfortable with hands-free, non-visual interactions. Voice interfaces will definitely continue extending into other areas of our lives and activities. And thus, so will Lekta.

Stay tuned!

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me

Daniel Slavetskiy

view all post
Leave a comment

Please be polite. We appreciate that.

Recent posts

Conversational interface illustrated with two cog-filled heads
Conversational Interaction for Business: Key takeouts from LT-Accelerate 2016

As we’re enjoying listening to some amazing speakers at FETLT2016, co-organized by our very own CTO – José F. Quesada, there are some things we would like to share with you about another language technologies-related conference, which we attended last week in Brussels – LT-Accelerate.

LT-Accelerate is the premier European conference focusing on building value through language technology. The purpose of the conference is to connect text, speech, social and big data analysis technologies to a spectrum of corporate and public sector applications and also to present the state of language technologies in the industry today.

José had the opportunity to talk about Lekta in the context of Conversational Interaction for Businesses and presented the results of more than 4 years of intensive work focused on the creation of advanced, collaborative and fluent conversational interfaces.

Lekta’s CTO José F. Quesada presenting at LT–Accelerate. Image originally posted on Twitter by @LTInnovate

Here are the key takeouts:

2016 IS THE YEAR OF EVERYTHING CONVERSATIONAL

Conversational interfaces have become a hot topic. Many companies have been making huge investments in researching technologies related to artificial intelligence, with a special emphasis on machine learning, deep neural networks and natural language understanding. Their aim is mostly to create intelligent assistants that will enable users to interact with information and services in a more natural and conversational way.

THE ADVENT OF THE DIALOG SYSTEMS & THE EMERGENCE OF MESSAGING

Companies have been using dialogue systems or conversational technologies in general  for a number of years, mainly for customer service and typically to replace or assist live agents in call centers or as an alternative to point-and-click interfaces for their websites. But lately, a number of factors are ushering them in a new era of conversational interaction.

Advances in cognitive technologies are making it possible to provide increasingly accurate and relevant automated dialogues. For example, speech recognition software has made advances in reducing word error rates, and machine translation has improved thanks to deep learning techniques. Moreover, improvements in speech and language processing technologies are making conversational interaction more capable, expanding their potential applications across the enterprise.

As technology is evolving faster than ever before, consumer preferences undergo their own fundamental change as well. According to some observers, the app ecosystem appears to be burdened by a kind of “app fatigue”—a declining willingness among consumers to install and use new mobile apps. Quite unexpectedly though, during this shift of the app ecosystem, messaging has emerged as a dominant online activity, with brands trying to take advantage of conversational technologies as a new consumer interaction channel.

COGNITIVE TECHNOLOGIES ARE HERE TO STAY

Deloitte Global predicts that by the end of 2016 more than 80 of the world’s 100 largest enterprise software companies by revenues will have integrated cognitive technologies into their products. That’s a 25 percent increase on the prior year. By 2020, it’s expected that the number will rise to about 95 out of 100.

Specifically, during 2016 and the next few years, the cognitive technologies that are and will be the most important in the enterprise software market will include advanced Speech Recognition, Natural Language Understanding and Machine Learning Technologies.

Summary

Providing computers with the human capability of language understanding has proven to be one of the most complex computational challenges in Artificial Intelligence development. At the same time though, the opportunity created in the industry at large – as we overcome the last technical challenges – cannot be overlooked. Conversational business interaction is already transforming Customer Support, User Experience and Business Intelligence, among other fields. At the same time, new terms like “conversational commerce” are being coined.

In this ever-changing landscape, the only thing that remains clear about the future is that  no successful business can afford to ignore this trend.

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
NLP market is growing
Natural Language Processing is worth it—the market size of one of the biggest AI components

Before we go into why it’s worth investing in NLP from a company’s point of view, let’s take a look at whether all this artificial intelligence talk could be just a passing trend that your business can do without. One way to think about it: If the market is growing, there must be something in it.

So how does the AI/NLP market look? Well, it’s a little complicated to define what exactly constitutes artificial intelligence but if you think of the all-encompassing robots and artificial intelligence market, like Bank of America Merrill Lynch did, you can be looking at a market worth $70bn by 2020 for just the AI part of it. WOW, right?

Global AI and robots market

Now, according to Tractica, natural language processing is “emerging as one of the most highly utilized technologies in the broader field of AI”. This is due mostly to the increasingly recognized value of data and the fact that a lot of it comes from naturally spoken and written words. Because a lot of data is simply human.

Natural language processing is already used to some extent in healthcare, e-commerce, and IT and telecom industries. Those are also the NLP market segments to grow the most over the next few years.

The overall NLP market size is predicted to arrive at a whopping $16.07bln by 2021, at a Compound Annual Growth Rate of 16.1%. This will be caused mainly by increasing demand for better customer experience, as well as increasing usage of smart devices. The fastest market growth will be experienced in Asia and North America.

In that prediction, a very broad definition of NLP is used, and it includes information retrieval, information extraction, automatic summarization, machine translation and dialogue systems. So, it can mean anything from a virtual assistant to tools for extracting data from huge amounts of spoken and written words, numbers, phrases and sentences.

At Lekta, we chose to concentrate on conversational interfaces and just that has a whole lot of usage possibilities. But obviously, since Lekta is a framework that is not rigidly defined, it can have other applications as well. And we plan to enable developers to use Lekta to create all sorts of amazing things in the near future.

But let’s get back to the topic at hand. Natural language processing is a great technology for automating human tasks without losing the human touch. And yes, that also means that people who did those jobs will have to move on to doing something else. It isn’t necessarily a bad thing, though.

As one Oxford study predicts, “low-skill workers will reallocate to tasks that are non-susceptible to computerisation – i.e., tasks requiring creative and social intelligence. For workers to win the race, however, they will have to acquire creative and social skills.”

We’ll cover the topic of NLP’s influence on the future of employment in one of our future posts, so please do let me know your thoughts on the topic, in the comments section or in a direct e-mail!

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
AI predictions for B2C communications – An infographic

What 2017 will bring for NLP-based dialog systems:

The technology landscape is moving fast. In this ever-changing environment, Natural Language has re-surfaced as one of the biggest and most important AI components in the industry today. Here’s what 2017 will bring to A.I. and B2C communications in general.

 

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
Lekta talking to a human-machine
Talking to machines more naturally than ever before—voice Interface for Lekta NLP

Voice User Interface (VUI) has become one of the most natural and intuitive methods of human-machine interaction. Today, we already use it to control cars, smartphones, and numerous connected home devices. We even communicate with businesses such as banks or insurance companies via self-service voice applications, without using touch-tones (DTMF). Take a look at our previous posts on Connected Customers and Contact Centers where we cover these topics in more detail.

In fact, there are several reasons why voice UI is so cool:

  • very often, it’s a more suitable way of, let’s say, providing a POI (point of interest) into navigation or making a money transfer while driving a car, or any other activity that requires our (almost) full attention
  • In most cases, we speak faster than we type
  • our voice contains more than just words, i.e. our emotional state, age, and even gender information, which can be leveraged by business logic to improve the customer experience
  • voice biometrics is a great way of solving privacy and security issues
  • it can significantly simplify the user interface, i.e. decrease the number of tabs, menus, and other navigation elements. It’s like making responsive web design (RWD) more “responsive” in a literal sense.

All these topics are really interesting and we will cover them one by one in future blog posts.

Processing natural language – naturally

Here, I’d like to concentrate on a more general use case, when a system takes natural language as input, processes it, creates some business logic, and properly responds with a natural output. So it looks more like a dialogue or conversation with a machine, which ends with a specific business transaction being performed, e.g. transferring money, ordering pizza or making a medical appointment. The diagram below illustrates the simplified architecture of such a system:

Natural language processing system: from ASR to NLP (Lekta) to TTS

Anna wrote a few words about Lekta NLP in our first ever blog post; essentially, Lekta is an advanced Spoken Dialogue Framework that allows the creation of conversational voice interfaces for business applications. Imagine an automatic system for making an appointment with a dentist, ordering a takeaway or automating banking customer services. We won’t go too deeply into discussing NLP as such just now, rather we will concentrate on Lekta’s interface from the user’s perspective.

One possible model is based on receiving the output from the speech recognizer (ASR – automated speech recognition) and then responding through the speech synthesizer (TTS – text-to-speech). Basically, ASR takes the acoustic signal as an input and tries to determine which words were actually spoken. The output typically consists of a word graph – a lattice made up of word hypotheses.

In a future post, we will show exactly how Lekta benefits from using such a lattice and can even help to improve the quality of the speech recognizer itself. Here, we omit the communication details, whether it’s a phone call or a mobile app voice interaction.

Actually, Ratel (Contactis Group’s omnichannel business communications operator) fills in the missing piece by providing feature-rich, context-based communication allowing customers to connect to businesses in the most natural and intuitive way. But that’s another topic entirely.

For now, let’s say that the ASR module returns text results, each result with a certain level of confidence. Lekta then takes this data and extracts a meaning representation (NLU – natural language understanding), which is used by the Dialog Manager (DM) responsible for conversation management and business logic integration. The output from the DM, which is a non-linguistic meaning representation, is then taken to the NLG (natural language generation) which converts it into natural language. The final part of the whole process is produced by the TTS module, which converts the natural language into speech. We will cover each step in more detail in future posts.

Recognizing the context

The obvious question is this: what if Lekta receives the wrong speech recognition results from the ASR? For instance, Barcelona and Pampeluna could sound very similar in Spanish. Well, here we can, in fact, have a couple of options. The first is based on the business logic – let’s say a client wants to book a flight on a specific date. Lekta can check with the database to confirm if there are flights to Pampeluna on that day, and if there aren’t any, it will be assumed that the client meant Barcelona. However, if there are flights to both cities on the same day, and we know that these names often cause a recognition problem, the system could confirm if the customer requested exactly that particular city (using a “yes-no” question). This may worsen the user experience a little but at least we could proceed with their request, which in this case is more important.

In general, Lekta tries to solve these kinds of problems by controlling the ASR with so-called expectations. Let’s consider an automated medical appointment system, where, at a certain moment, the system asks for the ID of a caller. In this case, Lekta can inform the ASR that it expects digits by providing the more specific (smaller) grammar. In the end, this improves the overall speed and quality of speech recognition.

An important thing about Lekta is that it is language-agnostic and completely independent from the ASR/TTS, so any vendor can be used. It could be more cost effective if a business has already been using an ASR/TTS engine, or uses solutions developed by an affiliated company, which in the case of Lekta is Techmo, a company with some of the best voice technologies in Europe on offer.

Adjusting strategy depending on who is calling

As we mentioned at the beginning of this post, our voice contains more than just words. There are solutions available that can detect emotional state, age or gender with a certain level of confidence. Techmo’s solutions can be used for this as well.

According to “Gender recognition from vocal source” research, the male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%. This is very important for languages with a grammatical gender like Polish or Spanish, and it mostly affects the NLG module.

The emotional state (like joy, sadness, anger, fear, surprise, or a neutral state) in a voice channel is a harder thing to determine. According to various reports, the efficiency is around 45% in the case of male voices and in the case of female voices, it’s around 48%. Lekta can be configured to adjust the dialog strategy depending on the caller’s emotional state. For example, if the customer is detected to be angry or rude, Lekta can transfer the call to a live agent.

Also, Lekta can switch the dialog strategy depending on the age of a caller, for example, by making it more informal while talking to younger customers. All these dynamic parameters (age, gender, emotional state, etc.) can be used by Lekta in order to improve the overall user experience. It’s, therefore, quite difficult to show all the powers of Lekta in one article… which has become quite lengthy with my musing at Lekta’s possibilities! I do hope you enjoyed it though :).

More to come

To summarize, in this article I’ve tried to give you a short overview of how Lekta is available via voice interface and what additional features it can leverage. With this in mind, I’d also like to start a series of blog posts in which we are going to cover the details of Lekta NLP.

Today, we as a society feel more and more comfortable with hands-free, non-visual interactions. Voice interfaces will definitely continue extending into other areas of our lives and activities. And thus, so will Lekta.

Stay tuned!

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
3 ways NLP Dialog Systems are changing B2C communication – An infographic

NLP Dialog Systems are changing the rules of B2C communication

Artificial intelligence (AI) is taking over the world, and intelligent B2C communications are no exception. State of the art Spoken Dialogue Systems are here to stay and they will become smarter and smarter every year.

Here are the 3 ways that NLP Dialog Systems are moving the needle today:

 

3-ways

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me

By Daniele Zedda • 18 February

← PREV POST

By Daniele Zedda • 18 February

NEXT POST → 34
Share on