Static

Details: Published: 26 January 2012

Integration into existing applications

AlfaNum ASR and TTS, as basic components for speech recognition and synthesis, are primarily intended for companies dealing with software companies and system integrators. In most cases such companies already have deployed fully functional solutions, but ASR and TTS can provide additional functionality or attractiveness of an application or service. Speech technologies also open the door to the developent of entirely new applications and services, which were completely impossible to create using conventional methods of communication with the user.

Upgrade of call centres and interactive voice response (IVR) systems

ASR and TTS can be used to upgrade call centres and IVR systems, allowing automatic recognition of sequences of digits (PIN codes), amounts, dates, proper names etc., as well as automatic creation of flexible voice prompts and reading out any textual information using speech synthesis. Furthermore, addressing the users by name and/or company name provides additional quality to the service which was impossible to achieve until now.

Extended dialling by voice

Making telephone calls was never simpler. Each organisation, regardless of its size, can now have its private telephony operator. It is no longer necessary to memorise dozens of telephone numbers or browse endless menus. One needs just to pick up the receiver, dial a number and say aloud the name of the person or the department of interest. The system is connected to the existing private branch exchange, and is able to transfer the call to the desired person or department, or to initiate an outbound call. Each employee can define their own personalised phonebook using a simple web interface. This functionality can be used by both employees and outside callers.

Information related to timetables

Using an interactive voice response (IVR) system with ASR capabilities, it is easy to obtain the necessary information related to any timetable. For instance, if the caller states the desired destination and the day by voice, the system provides the requested information (the times of departures) through speech synthesis. Such a system can be applied at bus or railway stations, airports...

TV schedule

In the car, on a bus, on a beach, on a hill... To find out what is on TV it does not matter where you are. An efficient solution is that the caller gets all the desired information at one place, through a range of available queries. Another solution is that each broadcasting company sets up its own interactive voice response system.

White and yellow pages

The users can say the names and addresses of private individuals or organisations of interest (e.g., Petar Petrović, Novi Sad), and the system will find the requested data (e.g. telephone number) in the appropriate database and reply by synthesised speech. More flexible searches can also be supported – by business activities, keywords...

Sports results

Many bookmakers want to keep track of sports results, betting odds and timetables of sports events at any moment. By means of a simple phone call all such information becomes available at any time of day or night, through efficient and intuitive communication with an interactive voice response system.

Entertainment

Using ASR or TTS introduces new quality to many existing information services, but also allows the creation of many more, such as televoting, lottery, personal ads services, horoscope... The callers do not need to memorise the digit(s) to be keyed in, they just say aloud the word or phrase of interest (e.g. “Scorpio”), and the system recognises it, retrieves the desired information from a database and replies to the caller by synthesised speech.

Medical appointment scheduling

The callers identify themselves by their social security numbers, and in a later phase also by their telephone numbers (possibly with additional verification such as: “Is this Mr Petar Petrović?”). The callers then state the name of the desired department (surgery, orthopaedics...), and/or physician. After making this selection, the callers are offered the list of time slots available for appointment, out of which they choose one.

Adding speech functionality to websites

In the world of global networking and abundance of information it is not enough to be one of many and have all that everybody else has. Adding speech functionality to your website can be a feature that will make the difference. The TTS technology can convert the textual content of the website into speech, offering a new dimension of surfing. The feature is particularly useful to the visually impaired, but to many other visitors as well.

Subtitling TV shows

The AlfaNum ASR system can be used for subtitling shows in the Serbian language. At the moment this technology can be applied whenever the textual transcription of the show is available, and the system can then perform automatic synchronisation by comparing the audio content of the show with the transcription. The synchronised subtitles are displayed through teletext to the viewers who opt for it. This feature is of particular interest to the hearing impaired and the elderly, but to many others who, for some reason, have a need for it.

Do you have an idea?

Speech technologies are so versatile that it is hard to imagine a branch of human activity where they cannot be applied. On the webpage of the AlfaNum project you can see many other examples, and you should feel free to suggest your own idea to us as well.

Details: Published: 26 January 2012

Audiomemo recording system can increase the efficiency and security of business activities, and it can be applied e.g. within:

Call centres,
Financial and brokerage institutions,
Government institutions,
Emergency services,
and other applications.

AlfaNum recommends the audiomemo system for recording telephone conversations to companies that want to minimise the risk of inconvenience that exists when there is no solid evidence of the contents of previous telephone communication. This primarily refers to the risks associated with false calls, business transactions based on speech communication, as well as handling customer complaints.

Recording telephone conversations provides better control over the daily business, and it also contributes to better evaluation of the quality of service that the company provides to customers. It can also save the company from much of the trouble caused by the transfer of inaccurate information, and thus significantly improve its profitability. The scope of recording can vary from several telephone extensions to the entire enterprise network with hundreds of telephones, radio stations or microphones.

Call centres

Call centres are an effective means of communication for all companies that require a complete multimedia contact with customers. The advantages of using call centres Audiomemo recording system are manifold. Quality control over telephone conversations is ensured, which is essential for the improvement of the work of agents. Live monitoring enables supervisors to monitor calls using multiple security clearance levels. The system is also able to record calls at multiple locations, with all records stored in a single centralized database, which is also remotely accessible. There is also the possibility of creating and saving reports on the results on various user-required analyses. The training of users for working with the system is quite simple and not at all time-consuming.

Call centres can also be applied in telemarketing, by companies that need to provide technical support to their customers, for providing various information and entertainment services, by banks and insurance companies, as well as government institutions such as Inland Revenue service or police.

Financial institutions

Financial institutions use Audiomemo recording system in order to improve the quality of service they offer to their customers. The most common users of Audiomemo include banks and insurance companies, brokerage firms and stock exchanges.

A specific advantage offered by this system is live monitoring, enabling call supervisors to monitor calls using multiple security clearance levels. The recordings of telephone communication may also serve as evidence in case of legal actions, as well as verification of recordings by independent auditors.

Government institutions

Audiomemo recording system is also very useful to government institutions due to its contribution to the security of their business activities. Furthermore, under the new legislation, government agencies are obliged to use a reliable system for recording telephone conversations in accordance with current laws and regulations. Audiomemo recording system achieves high reliability through full system redundancy based on RAID technology. The recorded conversations are easily accessible from remote locations as well, and the system also supports a range of search capabilities. A system of auditory and visual alarms ensures uninterrupted work.

The system is currently used by a number of government institutions, including police and military institutions.

Emergency

Audiomemo recording system is also used by public protection sector, where reliability is a highly significant requirement. As a quite stable system, Audiomemo does not require any special maintenance and allows continuous recording. Resistance to errors and high reliability ensure uninterrupted communication between citizens and services. The ease and efficiency of access to the recorded calls is also very important, as the system is used by dispatchers in their communication. Furthermore, it is important to note the possibility of integration of the system with existing databases.

The public protection sector includes emergency services and health facilities, fire and rescue services, as well as security agencies.

Other applications

In addition to the abovementioned areas of application, Audiomemo recording system as a ready made solution for recording telephone calls is used by a range of companies from different branches of business.

Details: Published: 25 January 2012

The development of speech technologies is especially significant for the disabled:

computers can read books, news from the Internet, e-mail and SMS messages to the visually impaired,
computers can convert what a speech impaired person writes into speech,
the physically disabled can communicate with the devices in their environment,
automatically recognized speech is easily translated into text which can then be read by people with hearing disabilities.

Speech technologies help the disabled overcome their disabilities to a certain degree, enabling them to become more independent and perform tasks they were not able to do before. The primer applications in Serbian-speaking community were created for the visually impaired, for whom a number of aids have been developed.

AnReader

Text-to-speech system primarily intended for the visually impaired.

Audio library

A client-server system which provides the visually impaired with the access to a large database of books and other texts through the local network or the Internet.

Adding speech features to web sites

In the world of global networking and abundance of information it is no longer sufficient to be one of the many and have all that everyone else has.

Details: Published: 25 January 2012

Speech technologies have the potential of introducing essential changes to human interaction with their environment. If a person is able to speak to a computer, and if the computer is able to speak back, then similar communication with other devices can be realized, starting from home equipment through industrial machines, cars, robots and toys, all the way to remote computers which can retrieve the required information and deliver it by speech.

ASR and TTS represent very complex multidisciplinary problems, and their successful treatment requires not only technical skills, but also expert knowledge in areas such as linguistics, psychoacoustics and speech perception, acoustics and digital signal processing. All these elements need to be combined and implemented into available hardware and software resources in order for a computer to be enabled to understand human utterances and to generate speech of its own.

AlfaNum ASR

Continuous speech recognition system for most South Slavic languages.

AlfaNum TTS

System for conversion of text to highly intelligible speech with elements of natural intonation.

Details: Published: 25 January 2012

TV antenna

Advertising monitor is a system intended for automatic monitoring of advertisements and musical content in radio and TV stations.

Specified sound recordings are automatically recognized in the received signal and the exact moments of their appearances are logged. Users can get detailed reports on the broadcasting of any audio material, automatically generated by the system based on recognition data.

The system contains a number of FM and TV tuners which receive signals from different radio and TV stations via antennas.

Each of these tuners passes the audio (not the video) signal to a special sound card able to record multiple channels simultaneously. The recordings are compressed as they arrive and archived on local hard disks. Depending on the number of channels recorded and the sizes of hard disks, the archiving period can extend up to several months.

For each search, the administrator defines the search parameters such as target channels, and imports target sound recordings. The search is carried out by independent processes which can be activated on all computers in the system (of which there can be an arbitrary number) and the results are stored in a shared database, which is used to create reports for the clients.

Besides searching for advertisements or musical content, Advertising Monitor can be used for other purposes as well.

The software toolkit contains:

Applications for recording of audio material
Applications for automatic search for given audio content in the recorded audio material
Applications for system monitoring and administration.

The system has the following features:

Multichannel recording of audio material with compression in real time,
Automatic recognition of advertisement, jingles, or songs,
Possibility of retroactive monitoring
Unlimited system expandability in order to provide faster searches and higher capacities,
Automatic deletion of the oldest recordings in order to make room for new ones
Quick search by time and date of recording or by station or type of material,
Visual representation of the recorded material
Possibility of combining the software with the Word Spotter application