Loading
Loading

Talking to machines more naturally than ever before—voice interface for Lekta NLP

Talking to machines more naturally than ever before—voice interface for Lekta NLP

Voice User Interface (VUI) has become one of the most natural and intuitive methods of human-machine interaction. Today, we already use it to control cars, smartphones, and numerous connected home devices. We even communicate with businesses such as banks or insurance companies via self-service voice applications, without using touch-tones (DTMF). Take a look at our previous posts on Connected Customers and Contact Centers where we cover these topics in more detail.

In fact, there are several reasons why voice UI is so cool:

  • very often, it’s a more suitable way of, let’s say, providing a POI (point of interest) into navigation or making a money transfer while driving a car, or any other activity that requires our (almost) full attention
  • In most cases, we speak faster than we type
  • our voice contains more than just words, i.e. our emotional state, age, and even gender information, which can be leveraged by business logic to improve the customer experience
  • voice biometrics is a great way of solving privacy and security issues
  • it can significantly simplify the user interface, i.e. decrease the number of tabs, menus, and other navigation elements. It’s like making responsive web design (RWD) more “responsive” in a literal sense.

All these topics are really interesting and we will cover them one by one in future blog posts.

Processing natural language – naturally

Here, I’d like to concentrate on a more general use case, when a system takes natural language as input, processes it, creates some business logic, and properly responds with a natural output. So it looks more like a dialogue or conversation with a machine, which ends with a specific business transaction being performed, e.g. transferring money, ordering pizza or making a medical appointment. The diagram below illustrates the simplified architecture of such a system:

Natural language processing system: from ASR to NLP (Lekta) to TTS

Anna wrote a few words about Lekta NLP in our first ever blog post; essentially, Lekta is an advanced Spoken Dialogue Framework that allows the creation of conversational voice interfaces for business applications. Imagine an automatic system for making an appointment with a dentist, ordering a takeaway or automating banking customer services. We won’t go too deeply into discussing NLP as such just now, rather we will concentrate on Lekta’s interface from the user’s perspective.

One possible model is based on receiving the output from the speech recognizer (ASR – automated speech recognition) and then responding through the speech synthesizer (TTS – text-to-speech). Basically, ASR takes the acoustic signal as an input and tries to determine which words were actually spoken. The output typically consists of a word graph – a lattice made up of word hypotheses.

In a future post, we will show exactly how Lekta benefits from using such a lattice and can even help to improve the quality of the speech recognizer itself. Here, we omit the communication details, whether it’s a phone call or a mobile app voice interaction.

Actually, Ratel (Contactis Group’s omnichannel business communications operator) fills in the missing piece by providing feature-rich, context-based communication allowing customers to connect to businesses in the most natural and intuitive way. But that’s another topic entirely.

For now, let’s say that the ASR module returns text results, each result with a certain level of confidence. Lekta then takes this data and extracts a meaning representation (NLU – natural language understanding), which is used by the Dialog Manager (DM) responsible for conversation management and business logic integration. The output from the DM, which is a non-linguistic meaning representation, is then taken to the NLG (natural language generation) which converts it into natural language. The final part of the whole process is produced by the TTS module, which converts the natural language into speech. We will cover each step in more detail in future posts.

Recognizing the context

The obvious question is this: what if Lekta receives the wrong speech recognition results from the ASR? For instance, Barcelona and Pampeluna could sound very similar in Spanish. Well, here we can, in fact, have a couple of options. The first is based on the business logic – let’s say a client wants to book a flight on a specific date. Lekta can check with the database to confirm if there are flights to Pampeluna on that day, and if there aren’t any, it will be assumed that the client meant Barcelona. However, if there are flights to both cities on the same day, and we know that these names often cause a recognition problem, the system could confirm if the customer requested exactly that particular city (using a “yes-no” question). This may worsen the user experience a little but at least we could proceed with their request, which in this case is more important.

In general, Lekta tries to solve these kinds of problems by controlling the ASR with so-called expectations. Let’s consider an automated medical appointment system, where, at a certain moment, the system asks for the ID of a caller. In this case, Lekta can inform the ASR that it expects digits by providing the more specific (smaller) grammar. In the end, this improves the overall speed and quality of speech recognition.

An important thing about Lekta is that it is language-agnostic and completely independent from the ASR/TTS, so any vendor can be used. It could be more cost effective if a business has already been using an ASR/TTS engine, or uses solutions developed by an affiliated company, which in the case of Lekta is Techmo, a company with some of the best voice technologies in Europe on offer.

Adjusting strategy depending on who is calling

As we mentioned at the beginning of this post, our voice contains more than just words. There are solutions available that can detect emotional state, age or gender with a certain level of confidence. Techmo’s solutions can be used for this as well.

According to “Gender recognition from vocal source” research, the male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%. This is very important for languages with a grammatical gender like Polish or Spanish, and it mostly affects the NLG module.

The emotional state (like joy, sadness, anger, fear, surprise, or a neutral state) in a voice channel is a harder thing to determine. According to various reports, the efficiency is around 45% in the case of male voices and in the case of female voices, it’s around 48%. Lekta can be configured to adjust the dialog strategy depending on the caller’s emotional state. For example, if the customer is detected to be angry or rude, Lekta can transfer the call to a live agent.

Also, Lekta can switch the dialog strategy depending on the age of a caller, for example, by making it more informal while talking to younger customers. All these dynamic parameters (age, gender, emotional state, etc.) can be used by Lekta in order to improve the overall user experience. It’s, therefore, quite difficult to show all the powers of Lekta in one article… which has become quite lengthy with my musing at Lekta’s possibilities! I do hope you enjoyed it though :).

More to come

To summarize, in this article I’ve tried to give you a short overview of how Lekta is available via voice interface and what additional features it can leverage. With this in mind, I’d also like to start a series of blog posts in which we are going to cover the details of Lekta NLP.

Today, we as a society feel more and more comfortable with hands-free, non-visual interactions. Voice interfaces will definitely continue extending into other areas of our lives and activities. And thus, so will Lekta.

Stay tuned!

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me

Daniel Slavetskiy

view all post
Leave a comment

Please be polite. We appreciate that.

Recent posts

Conversational interface illustrated with two cog-filled heads
Conversational Interaction for Business: Key takeouts from LT-Accelerate 2016

As we’re enjoying listening to some amazing speakers at FETLT2016, co-organized by our very own CTO – José F. Quesada, there are some things we would like to share with you about another language technologies-related conference, which we attended last week in Brussels – LT-Accelerate.

LT-Accelerate is the premier European conference focusing on building value through language technology. The purpose of the conference is to connect text, speech, social and big data analysis technologies to a spectrum of corporate and public sector applications and also to present the state of language technologies in the industry today.

José had the opportunity to talk about Lekta in the context of Conversational Interaction for Businesses and presented the results of more than 4 years of intensive work focused on the creation of advanced, collaborative and fluent conversational interfaces.

Lekta’s CTO José F. Quesada presenting at LT–Accelerate. Image originally posted on Twitter by @LTInnovate

Here are the key takeouts:

2016 IS THE YEAR OF EVERYTHING CONVERSATIONAL

Conversational interfaces have become a hot topic. Many companies have been making huge investments in researching technologies related to artificial intelligence, with a special emphasis on machine learning, deep neural networks and natural language understanding. Their aim is mostly to create intelligent assistants that will enable users to interact with information and services in a more natural and conversational way.

THE ADVENT OF THE DIALOG SYSTEMS & THE EMERGENCE OF MESSAGING

Companies have been using dialogue systems or conversational technologies in general  for a number of years, mainly for customer service and typically to replace or assist live agents in call centers or as an alternative to point-and-click interfaces for their websites. But lately, a number of factors are ushering them in a new era of conversational interaction.

Advances in cognitive technologies are making it possible to provide increasingly accurate and relevant automated dialogues. For example, speech recognition software has made advances in reducing word error rates, and machine translation has improved thanks to deep learning techniques. Moreover, improvements in speech and language processing technologies are making conversational interaction more capable, expanding their potential applications across the enterprise.

As technology is evolving faster than ever before, consumer preferences undergo their own fundamental change as well. According to some observers, the app ecosystem appears to be burdened by a kind of “app fatigue”—a declining willingness among consumers to install and use new mobile apps. Quite unexpectedly though, during this shift of the app ecosystem, messaging has emerged as a dominant online activity, with brands trying to take advantage of conversational technologies as a new consumer interaction channel.

COGNITIVE TECHNOLOGIES ARE HERE TO STAY

Deloitte Global predicts that by the end of 2016 more than 80 of the world’s 100 largest enterprise software companies by revenues will have integrated cognitive technologies into their products. That’s a 25 percent increase on the prior year. By 2020, it’s expected that the number will rise to about 95 out of 100.

Specifically, during 2016 and the next few years, the cognitive technologies that are and will be the most important in the enterprise software market will include advanced Speech Recognition, Natural Language Understanding and Machine Learning Technologies.

Summary

Providing computers with the human capability of language understanding has proven to be one of the most complex computational challenges in Artificial Intelligence development. At the same time though, the opportunity created in the industry at large – as we overcome the last technical challenges – cannot be overlooked. Conversational business interaction is already transforming Customer Support, User Experience and Business Intelligence, among other fields. At the same time, new terms like “conversational commerce” are being coined.

In this ever-changing landscape, the only thing that remains clear about the future is that  no successful business can afford to ignore this trend.

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
Lekta - the multilingual NLP framework
Multilingualism and technology—key takeouts from META-FORUM 2016

Are you wondering which language you should learn in the post-Brexit Europe?

Meta-Forum-2016

Alex Waibel, from Carnegie Mellon University and Karlsruhe Institute of Technology, raised this point during his speech after receiving the META Prize at the recent META-FORUM event held in Lisbon on 4/5 July 2016. Perhaps you could consider any of the multiple other languages spoken in Europe.

Metaforum-Waisel-Brexit

By the way, have you ever thought about how many languages, or dialects, are spoken world-wide? Although there are some 7,000 languages registered, the list of the top 25 languages only represent around 50% of the world population. Curiously enough, some publications mention that there are 46 languages that have just a single speaker.

world-languages-graphics

And what about Europe? Well, in his presentation about the digital vitality of European language, András Kornai from the Hungarian Academy of Sciences mentioned a list of 283 European languages and dialects.

By the way, the difference between what’s a language and what’s a dialect can sometimes be very diffuse. Don’t forget the famous quote on this point: “A language is a dialect with an army and navy”. But even with 283, Europe is not the richest linguistic area in the world. For example, more than 850 languages are spoken in Papua New Guinea alone, a country with less than 8 million people.

MSDM

But in any case, language is currently a major barrier for the economic and social development of Europe. This is the key motto of the Multilingual Single Digital Market (MSDM). Georg Rehm, from DFKI, current META-NET General Secretary, summarized this challenge with the sentence “Don’t understand, won’t buy” during his presentation of a new version (0.9) of The Strategic Agenda for the MDSM.

MSDM-01

However, I would like to highlight two key, inspiring ideas mentioned during the two intense working days in Lisbon.

Multilinguality, Mobile and Language Technologies (R. McDonald / Google)

Ryan McDonald from Google focused on Multilingual Europe as a Challenge for Language Technologies.

The key points he presented were quite strong and very relevant for this community:

  • Mobile is the future
  • Language technologies are key to the mobile experience
  • Users demand native language support

Meta Forum 2016 - Ryan McDonald

From Apps to Chatbots (A. Branco / University of Lisbon)

António Branco, Principal Researcher of one of the most prominent EU-funded projects on Machine Translation (qtleap), used an insightful idea for motivation in his talk.

In the past, with the advent of PCs, companies reached out to their customers with websites. Currently, with the consolidation of smartphones, the strategy for reaching clients is dominated by the use of mobile apps.

Recently, a CEO of a large social network at an annual conference proposed that, in the future, companies will reach their clients using chatbots.

Multilingual Conversational Interfaces in the Mobile

Summing up, Multilinguality, Mobile and Conversational Interfaces will play a critical role in the immediate future. It’s important to create solutions that won’t be limited to English or any other single language.

Fortunately, Lekta has been designed to take into account all these challenges, and now we are ready to put it into action. Stay tuned!

 


Pictures and graphics:

http://www.meta-net.eu/events/meta-forum-2016/programme

http://voices.nationalgeographic.com/2011/03/01/language_diversity_index_tracks_global_loss_of_mother_tongues/

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
contact centers
Contact centers (yesterday, now, tomorrow)

Not everyone feels confident that they are getting the best possible help when they call a contact center for assistance. This skepticism often extends to the technology being used when a customer calls with a question. This is important since the technology that the person handling the call can determine how well they do their jobs as well as the kind and amount of information they have about the caller. Technology that provides context about the customer is still a rarity.

Let’s take a look at how contact centers have changed over the years and the technology, information and tools they have put in the hands of employees.

Contact centers yesterday

Not so long ago, contact centers weren’t that different from a secretary’s office. Agents working there had a telephone and email. Some of them had an IVR system that could be programmed to route certain calls to certain agents.

It’s safe to assume that agents had very little information about callers – who they were or why they were calling. They had to find out everything during the course of the call, which of course made it last much longer than it had to.

Contact centers today

Today, contact centers have more advanced technology that lets them significantly raise the level of customer service they deliver. Apart from telephones, text messages, email and IVR there are chat capabilities and basic bots that can take care of simple questions or set up appointments.

Agents in call centers get help from ACD (automatic call distribution), IVR (automatic voice service connections) and CTI (integration of telecommunications and information technology). These three tech solutions allow calls to be connected to specific agents thanks to ACD, basic verification of the nature of the call with IVR and quick verification of the caller with CTI.

In addition to these tools, agents also have access to video connections that significantly improve service, especially in IT. The ability to see the caller on a video screen makes setting up the parameters of the situation and solving problems much easier.

Contact centers tomorrow

Some technologies of tomorrow can be found today in certain companies even though not all of them see the full potential of those technologies yet. It’s worth noting though that even the most cutting edge technology has its pluses.

Imagine a situation when you call your bank and you hear “Welcome, how can I help you?” It’s just like when you walk into the local branch, except… you’re not talking to a human. It’s an advanced information system that is able to have a conversation with you just like the friendly agent in the bank would.

The advantages of such a system are obvious:

  • A customer service center, available day and night, with the ability to carry on voice contact and resolve problems over the phone.
  • Callers can ask to speak with an actual representative at any time but most inquiries can be solved with just the human-machine interaction.

How is this possible?

The Lekta NLP system is making its debut. It implements the logic of leading dialogue and easily adapts to the specific topic of a conversation and takes the conversation in the right direction. Callers have control over the conversation – Lekta just shares information or asks for necessary details.

dialog interface

Lekta can reliably identify problems and deliver relevant information or direct the conversation to the right person along with all the data gathered from the caller.

It may sound like science fiction but this is simply very advanced technology that makes it possible to talk with machines in a conversation that’s no different than a chat between two people. Lekta is the dialogue interface of the future and is available today.

Support, not competition

Implementing new technologies often causes anxiety among employees since it raises questions about their future. But not every tech solution means reducing staff. More and more systems are being created to support and help the work of employees and that’s the case with Lekta.

The system is able to handle about 80% of the conversations that usually go to call center agents. From a business point of view, this is a huge help and takes small matters out of the hands of agents, letting them concentrate on more complex matters and customer issues.

Lekta frees up the time of not only employees of call centers but the management team as well. It doesn’t require a hiring process or training and it works all day, every day. It also offers continuous insight into the course of any conversation and makes it possible to optimize offers that are better matched to the needs and expectations of customers.

These features make it worth trusting this technology and integrating it into the customer relations in your business. After all, if you’re not moving forward, you’re moving back.

Feel free to contact us if you want to talk about the Lekta.

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
brain-artificial-intelligence-ai
Natural Language Processing, Artificial Intelligence, Machine Learning, bots —a passing trend or much more?

Artificial intelligence is taking over the world whether we want it or not, so we can either make it work for us or against us. I choose the former. But how? That’s one of the things we talked about with some fellow startuppers last week.

The wonders of the SaaS startup community

It’s good to experience the startup community firsthand from time to time. It helps you keep your finger on the pulse much more effectively than just gathering data online or talking to people on Facebook and Twitter. That’s exactly why we decided to go to the SaaS Meetup, to see what the perceived trends in the startup world are, and to talk to some founders to find out whether these trends actually mean something. We also wanted to introduce some of our ideas and get feedback on how to best develop, launch, and promote our project.

So, we got into a few discussions on AI, NLP, machine learning bots taking over the world and all that. But… before I go on to talking about the outcomes, maybe I should start by telling you what it is we’re doing at Lekta :).

Lekta – the voice interface of the future

Well, in a nutshell, we’re doing some really awesome stuff. I’m not saying that simply because it’s my job to put Lekta in the best light possible but because I’m honestly and extremely excited about the possibilities that lie ahead of us. And by us, I don’t just mean the Lekta team, I mean the entire world; the world of everything connected: connected customers, connected devices, connected businesses.

But anyway, I’m starting to babble and you probably just want to finally find out what Lekta is. Right?

Lekta is an advanced spoken and written Dialogue System Framework. It redefines the way people communicate with businesses. Lekta’s cutting edge technology can automatically manage millions of conversations with customers, users, and devices.

And what exactly does that mean? It basically means that you can use Lekta as a voice (and also text) interface for anything you want. It can be a 24/7 virtual agent in a contact center, e-commerce store or a doctor’s office. It can add voice control to any app. It can enable your IoT devices to have a real conversation with you. It can help you to easily connect apps to each other, apps to devices, and more.

I call it “Siri on steroids”, but Lekta is so much more than that. I’m not going to go into too many technical details because I’m not the one who should do it, the creator of Lekta, Jose Quesada, and our Lead Developer, Daniel Slavetskiy, are way more qualified to do that. And don’t worry, they will. Soon. On this blog.

In the meantime, let me go back to our SaaS Meetup discussions.

Are AI and NLP the way to go?

We mainly discussed AI and NLP and specifically tried to answer the question whether the two actually make our lives easier. Do people prefer touchscreens and clicking a couple of buttons instead of using voice commands? And do they prefer talking to a human instead of a virtual assistant? Well, the opinions vary but we did come to some conclusions*:

  1. Some people just don’t want to talk to robots however, most people don’t mind talking to a robot, as long as doing so actually speeds up the process of, say, ordering a pizza or scheduling a medical appointment. So, as long as talking to a bot makes things easier and faster, the users won’t mind not being able to talk to a customer support agent, a secretary or a sales assistant.
  2. People don’t want to be tricked into thinking they’re talking to a real human. They actually prefer knowing they’re talking to a robot from the very beginning, as opposed to discovering the fact after engaging in the conversation.
  3. There might be an issue with the security of ASR-enabled solutions. When you allow a device to detect your voice at any point in time, the sensible assumption would be that it’s always listening. Take Amazon’s Echo, for example. You put it on a table in your living room and when you want to have a private conversation… are you sure Amazon doesn’t hear you? We obviously don’t want to get paranoid and, personally, I try not to care about stuff like that but, a lot of people can get freaked out if you tell them that the reason why they can voice-control everything around them is that every single device is listening to them. All. The. Time.

So those are the main things we managed to discuss between keynotes and lunch :).

The future of AI

The keynotes though! The speakers only assured us in thinking that we’re right in the eye of the AI storm. The predictions are that basically everything will be AI-enabled sooner or later. Starting with healthcare, marketing and business intelligence.

According to Nick Franklin‘s (CEO of ChartMogul) keynote, AI will also be crucial in customer service, sales, and bookkeeping. The UI paradigm also shifts toward more natural language in both text and voice communication.

AI-heat-map-featured-June2016
“Artificial Intelligence: Sub-Industry Heatmap” by CBInsights
nick-franklin-saas-meetup-krakow
Nick Franklin’s keynote at SaaS Meetup in June 2016 in Krakow, Poland

Since Lekta is all about AI and NLP, I’d say we have a great chance of becoming part of something big – bigger than just a passing trend and temporary hype. AI is here to stay, so let’s use its full potential to make our lives… even more enjoyable!


*Remember, these are just conclusions from a bunch of startuppers, based on what they heard, read, saw, and experienced – so not exactly a sample representative of our entire society. But still, I think we managed to touch on some important issues that can be a base for further discussion. So… is there anything AI/NLP-related that you would like to discuss?

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me
Lekta talking to a human-machine
Talking to machines more naturally than ever before—voice interface for Lekta NLP

Voice User Interface (VUI) has become one of the most natural and intuitive methods of human-machine interaction. Today, we already use it to control cars, smartphones, and numerous connected home devices. We even communicate with businesses such as banks or insurance companies via self-service voice applications, without using touch-tones (DTMF). Take a look at our previous posts on Connected Customers and Contact Centers where we cover these topics in more detail.

In fact, there are several reasons why voice UI is so cool:

  • very often, it’s a more suitable way of, let’s say, providing a POI (point of interest) into navigation or making a money transfer while driving a car, or any other activity that requires our (almost) full attention
  • In most cases, we speak faster than we type
  • our voice contains more than just words, i.e. our emotional state, age, and even gender information, which can be leveraged by business logic to improve the customer experience
  • voice biometrics is a great way of solving privacy and security issues
  • it can significantly simplify the user interface, i.e. decrease the number of tabs, menus, and other navigation elements. It’s like making responsive web design (RWD) more “responsive” in a literal sense.

All these topics are really interesting and we will cover them one by one in future blog posts.

Processing natural language – naturally

Here, I’d like to concentrate on a more general use case, when a system takes natural language as input, processes it, creates some business logic, and properly responds with a natural output. So it looks more like a dialogue or conversation with a machine, which ends with a specific business transaction being performed, e.g. transferring money, ordering pizza or making a medical appointment. The diagram below illustrates the simplified architecture of such a system:

Natural language processing system: from ASR to NLP (Lekta) to TTS

Anna wrote a few words about Lekta NLP in our first ever blog post; essentially, Lekta is an advanced Spoken Dialogue Framework that allows the creation of conversational voice interfaces for business applications. Imagine an automatic system for making an appointment with a dentist, ordering a takeaway or automating banking customer services. We won’t go too deeply into discussing NLP as such just now, rather we will concentrate on Lekta’s interface from the user’s perspective.

One possible model is based on receiving the output from the speech recognizer (ASR – automated speech recognition) and then responding through the speech synthesizer (TTS – text-to-speech). Basically, ASR takes the acoustic signal as an input and tries to determine which words were actually spoken. The output typically consists of a word graph – a lattice made up of word hypotheses.

In a future post, we will show exactly how Lekta benefits from using such a lattice and can even help to improve the quality of the speech recognizer itself. Here, we omit the communication details, whether it’s a phone call or a mobile app voice interaction.

Actually, Ratel (Contactis Group’s omnichannel business communications operator) fills in the missing piece by providing feature-rich, context-based communication allowing customers to connect to businesses in the most natural and intuitive way. But that’s another topic entirely.

For now, let’s say that the ASR module returns text results, each result with a certain level of confidence. Lekta then takes this data and extracts a meaning representation (NLU – natural language understanding), which is used by the Dialog Manager (DM) responsible for conversation management and business logic integration. The output from the DM, which is a non-linguistic meaning representation, is then taken to the NLG (natural language generation) which converts it into natural language. The final part of the whole process is produced by the TTS module, which converts the natural language into speech. We will cover each step in more detail in future posts.

Recognizing the context

The obvious question is this: what if Lekta receives the wrong speech recognition results from the ASR? For instance, Barcelona and Pampeluna could sound very similar in Spanish. Well, here we can, in fact, have a couple of options. The first is based on the business logic – let’s say a client wants to book a flight on a specific date. Lekta can check with the database to confirm if there are flights to Pampeluna on that day, and if there aren’t any, it will be assumed that the client meant Barcelona. However, if there are flights to both cities on the same day, and we know that these names often cause a recognition problem, the system could confirm if the customer requested exactly that particular city (using a “yes-no” question). This may worsen the user experience a little but at least we could proceed with their request, which in this case is more important.

In general, Lekta tries to solve these kinds of problems by controlling the ASR with so-called expectations. Let’s consider an automated medical appointment system, where, at a certain moment, the system asks for the ID of a caller. In this case, Lekta can inform the ASR that it expects digits by providing the more specific (smaller) grammar. In the end, this improves the overall speed and quality of speech recognition.

An important thing about Lekta is that it is language-agnostic and completely independent from the ASR/TTS, so any vendor can be used. It could be more cost effective if a business has already been using an ASR/TTS engine, or uses solutions developed by an affiliated company, which in the case of Lekta is Techmo, a company with some of the best voice technologies in Europe on offer.

Adjusting strategy depending on who is calling

As we mentioned at the beginning of this post, our voice contains more than just words. There are solutions available that can detect emotional state, age or gender with a certain level of confidence. Techmo’s solutions can be used for this as well.

According to “Gender recognition from vocal source” research, the male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%. This is very important for languages with a grammatical gender like Polish or Spanish, and it mostly affects the NLG module.

The emotional state (like joy, sadness, anger, fear, surprise, or a neutral state) in a voice channel is a harder thing to determine. According to various reports, the efficiency is around 45% in the case of male voices and in the case of female voices, it’s around 48%. Lekta can be configured to adjust the dialog strategy depending on the caller’s emotional state. For example, if the customer is detected to be angry or rude, Lekta can transfer the call to a live agent.

Also, Lekta can switch the dialog strategy depending on the age of a caller, for example, by making it more informal while talking to younger customers. All these dynamic parameters (age, gender, emotional state, etc.) can be used by Lekta in order to improve the overall user experience. It’s, therefore, quite difficult to show all the powers of Lekta in one article… which has become quite lengthy with my musing at Lekta’s possibilities! I do hope you enjoyed it though :).

More to come

To summarize, in this article I’ve tried to give you a short overview of how Lekta is available via voice interface and what additional features it can leverage. With this in mind, I’d also like to start a series of blog posts in which we are going to cover the details of Lekta NLP.

Today, we as a society feel more and more comfortable with hands-free, non-visual interactions. Voice interfaces will definitely continue extending into other areas of our lives and activities. And thus, so will Lekta.

Stay tuned!

Daniel Slavetskiy
Follow me

Daniel Slavetskiy

Tech Lead at Lekta.ai
software dev passionate about real-time communications, AI and NLP.
Daniel Slavetskiy
Follow me

By Daniele Zedda • 18 February

← PREV POST

By Daniele Zedda • 18 February

NEXT POST → 34
Share on