Axon - Voice Assistant


Axon - Voice Assistant is an application that allows voice commands on smartphone: calling the contact by name or by phone, messaging (even Viber, Skype, WhatsApp), initiating navigation, voice input notes, etc.

Key features:

  • Voice dialing (contact names or phone numbers)
  • Address book and contact management
  • Managing short text messages
  • Call log management
  • Easy calendar and alarm handling
  • Changing system settings (date/time, network...)
  • Launching applications
  • Support for Serbian and Croatian

How it works

Dialog manager

This module is responsible for entire behavior of the system. It uses the output of the speech recognizer as its input, and performs the appropriate action.

A set of tasks is defined, with a precise specification of the information required for their execution. Some examples of these tasks are: calling a contact, sending a text message, managing the calendar and the log, starting an application, changing system settings, etc.

Should the system fail to recognize its verbal input, an appropriate message will be conveyed to the user. If the user fails to provide all the necessary information at once, the system will ask additional questions.

Natural Language Understanding

The NLU module converts the user’s query into a form suitable to the dialog manager. For instance, if the user query is recognised as 'I want to send an SMS message to Vesna Petrović', the dialog manager will receive: 'command: SEND_SMS; contact: Vesna Petrović'.

Natural Language Generation

The function of this module is dual to the function of NLU. Namely, it converts the information from the format suitable to the dialog manager to a sentence of the natural language.

Implementation of speech technologies on mobile platforms

Until recently speech recognition was limited to small vocabularies and to a PC platform. The vocabulary of this product is significantly larger, and the software is optimized so as to conform to resource limitations of portable devices.

As to speech synthesis, our previous solutions were of high quality, but also restricted to a PC. We have now developed a less resource demanding version compatible to the operating systems of smart phones. This was possible with a slight degradation of synthesis quality, acceptable from the point of view of the target application.

The accuracy of speech recognition

It is well known that a number of recognizers by renowned manufacturers function with insufficient accuracy, even for major languages, which leads to user frustration and dissatisfaction.
In order to increase recognition accuracy, the language model used in ASR is particularly tailored to suit the functionality used at a particular moment, and in case of uncommon words (e.g. infrequent proper nouns) the users will always be allowed to resort to typing.