AlfaNum is an innovative company gathering a team of experts which, since 2003., has been striving to bring the world of modern speech technologies closer to a wide range of users.

Until today the company has developed software components for high quality speech recognition and synthesis in Serbian, Croatian, and Montenegrin. In the development of these technologies, which were originally intended for the visually impaired, AlfaNum has established itself as the leader in the region. More information on these software components can be found on our products and services pages, and they can also be tested on our demo pages.

Besides speech technologies, AlfaNum also works on the development and implementation of products intended for the disabled, as well as the implementation of call centres, information centres and databases. If you are considering an upgrade to your system based on cutting-edge speech technologies or you need some other expert service in the field, feel free to contact us and we will justify your confidence.

As a result of the recent progress in computer technology, the personal computers of today are sufficiently efficient to recognize speech with high accuracy and synthesize speech of high quality. Furthermore, their relatively low price allows a wide range of users to feel the benefits of speech technologies. On the other hand, South Slavic languages are extremely complex and each one has its own particularities, which makes the development of speech technologies for any of these languages quite a difficult task. Furthermore, relatively small and closed markets related to each of these languages are an additional reason for which none of the leading companies in the world has attempted to develop speech synthesis or recognition for them until lately. Having recognized this potential strategic advantage in time, we have begun the development ourselves and achieving main objectives and activities.

  • The development of flexible text-to-speech synthesis (TTS) of high quality
  • The development of large vocabulary continuous automatic speech recognition (ASR)
  • The research and development of emotion speech recognition
  • The development of speech morphing systems
  • The development of natural language processing modules including dialogue management
  • The application of the developed speech technologies in Western Balkan countries:
    • in multimodal human-machine dialogue systems (IVR, smart phones, smart homes)
    • for purposes such as: text reading, text dictation, speech transcription
    • within aids for the physically disabled, visually impaired, speech impaired, hearing impaired.

Scientific research and development within the AlfaNum team are carried out by a group of experts working at the Department of Communications and Signal Processing at the Faculty of Technical Sciences in Novi Sad, Serbia. More details can be found at the AlfaNum project website.

 

 

The most important innovative results:

AlfaNum has already developed both small-to-medium vocabulary ASR and high-quality TTS in Serbian, Croatian and Montenegrin.

A number of valuable speech and language resources for Serbian and kindred South Slavic languages have been created within several projects over the last decade. Apart from these resources, a number of expert systems, machine learning systems as well as mathematical models have been developed and deployed in the first speech enabled products in Serbia, Croatia, Bosnia and Herzegovina, Montenegro, and North Macedonia - the countries where South Slavic languages are predominantly spoken. For example, one can listen to news at a number of speech-enabled web sites (Radio Television of Serbia - RTS, Radio Television of Vojvodina - RTV, eUprava, as well as several municipalities) using a computer or a smart phone. The visually impaired can listen to any text displayed on the screen using the software anReader based on AlfaNumTTS. The AlfaNumASR and AlfaNumTTS components have provided smart phones with basic speech generation and understanding functionalities in Serbian.

Further development of both large vocabulary ASR and more advanced TTS is based on the aforementioned speech and language resources. Both technologies will enable a much wider range of applications and will contribute to the preservation of Serbian and kindred languages in this new domain of communication – spoken dialogue between humans and machines.

 

Speech is the basic means of communication between humans. Using speech, humans can convey their thoughts and feelings to others in a way much more intricate than in any other animal species, and thus the human speech system is the most complicated one and comprises a number of organs - from lungs, trachea (windpipe), larynx and vocal folds, to oral cavity with tongue, teeth and lips, and nasal cavity.

Speech considered as a sound signal contains a multitude of information. Beside what has been said, it includes information on the speaker that reveal the emotional state, the identity of a known speaker or the gender and age of an unknown one. We understand the meaning, perceive the speaker's dialect, education level and culture. We understand what has been said relying of our knowledge of the language and on context. Thus, segmentation of the sequence of sounds that we hear is possible only if we are familiar with the language. Speech perception is, therefore, not an inherited but a learned ability. Furthermore, one can focus on a particular speaker among many, estimate the position of the speech source, and often understand things that have not been actually said, but rather implied.

Acquisition of the sound signal is the first step in speech perception. The brain has to determine whether the received sound indeed originates from speech, because speech is processed in a way fundamentally different from music or ambiance noise. The brain also has to identify whether the language used is one the listener is familiar with. A real-time phonetic analysis of the content is then carried out, without waiting for the speaker to finish the utterance, and ignoring non-speech sounds such as filled pauses, throat clearing etc.

The reconstruction of the entire utterance is performed based on the sequence of the obtained phones, taking into account semantic context as well. The meaning of the utterance will thus most probably be reconstructed correctly even if certain phones are missing or are poorly articulated, which is often the case in spontaneous speech.

 

 

 

Magnifying glass

Word Spotter is a system that enables highly efficient and reliable search for predefined keywords in a large quantity of audio material. It is based on the automatic speech recognition (ASR) technology, but is optimised for locating particular words and phrases, disregarding any of the remaining speech, background noise or music.

With Word Spotter it is no longer necessary to listen to all of the existing audio material in search of some words or phrases. Users can specify a list of words or phrases to be detected and import appropriate sound files. After a certain time needed for processing, a list of appearances of target words or phrases in the sound files is ready. The user just needs to go through the list and select the occurrences of interest.

The system has the following features:

  • Search for an arbitrary number of words or phrases in an unlimited quantity of audio material
  • Automatic inflection of key words – the application attempts to find all grammatical forms of a word (if so specified)
  • Support for a range of formats of audio files
  • Support for multiple parallel searches in the background (leaving the user free to do something else in the meantime)
  • Manual verification of the results
  • Support to modern multicore and multiprocessor platforms
  • Possibility of distribution over multiple computers and load balancing, which is of crucial importance in highly demanding environments
  • Software can be obtained in several forms: as a stand-alone application with its own GUI, or as an API or library, integrated into some of our other products (Audiomemo recording system)
  • The integration with a module for detection of high levels of emotion is also under way, which will contribute to the efficiency and applicability of the system.

     

 

Vlado Delić

Vlado Delić, PhD (1964). Professor, researcher and project manager at Faculty of Technical Sciences, University of Novi Sad (UNS), Serbia. Prof. Delić has created curricula in acoustics, audio engineering and signal processing, as well as speech technologies at FTN-UNS. He has become the chief of the Chair for Telecommunications and Signal Processing.

He has been leading major projects in the field of speech technologies in Serbia including the largest on-going regional R&D project "Development of spoken dialogue systems for Serbian and other South Slavic languages".

He has (co-)authored several books, 4 patents and 10 acknowledged technical solutions, and more than 250 research and technical articles in scientific journals and conference proceeding.



Darko Pekar

Darko Pekar, PhD (1972). Graduated at FTN in 1998, in the field of speech technologies. Up to 2003 he was the leading expert of a highly successful R&D group in the field of speech technologies at FTN, where he obtained wide-ranging experience in the field of speech technologies and their applications, as well as management of scientific and technological projects.

In 2003 he became the CEO of the company AlfaNum, and in cooperation with FTN he continues to manage teams working on development and application of speech technologies. He is the sole author of several successful products and services based on speech technologies. Although he is mainly focused on practical development and realisation of market-ready ASR&TTS products, he has also published more than 70 papers in national and international scientific journals and conference proceedings, and has also co authored more than 20 acknowledged technical solutions and 5 patents.




 

Milan Sečujski

Milan Sečujski, PhD (1975) is Associate Professor and researcher at the Faculty of Technical Sciences. His research is oriented on speech technologies, particularly their linguistic aspects. He has worked with the research team at the Faculty and AlfaNum for more than 15 years, mostly in the field of speech synthesis, but has also worked on the development of various linguistic resources used in a number of applications of speech technologies.

His expert knowledge of phonetics, morphology, syntax as well as intonational phonology of South Slavic languages is one of the main strengths of the research team and contributes to the leading position of this team in the region to a great extent. He has (co-)authored more than 100 research and technical articles in scientific journals and conference proceedings, including 10 acknowledged technical solutions. For his work related to the introduction of natural intonation into synthesised speech in Serbian, he has received the prestigious Pupin’s award of Matica srpska.



Goran Đaković

Goran Djakovic (1961) finished American High School in Hague, Holland, and 1st degree University of Electrical Engineering (University of Belgrade). In 1989, he established the Saga Company, which grew in a company that is for almost a decade a number one system integrator in Serbia (according to the official revenue reports).

Mr. Djakovic has long-running experience in doing business with prominent multinational companies and dealing with “C” level executives, as well as government and political executives. Mr. Djakovic is member of Serbian Association of Managers, chairman of the Board of Informatics Association of Belgrade Chamber of Commerce, Board member of Belgrade and Serbian Chamber of Commerce, and member of many other business and professional associations.



Fathy Yassa
Fathy Yassa

Fathy Yassa, PhD (1950) has over 30 years of management as well as R&D experience with small, medium and large software companies, taking basic ideas and inventions to product and markets as a lifetime commitment.

Prior to founding SMI, the strategic partner of AlfaNum, he held the position of CEO at Yvent Networks, a company he founded in 2002. He previously held the positions of Sr. Director of Engineering at Neomagic Corporation, Director of Product Development with Pulsent Corporation, Director of Business Development and Director of Engineering with Synopsys, Product Design Manager with Motorola, and Manager of Coding and Image processing department. He is the primary author and co-author of over 40 US patents and two European patents.

He holds a PhD in Electrical Engineering, a Master’s of Science in Mathematics, a Bachelor of Science in Mathematics, and a Bachelor of Engineering in Electrical & Electronics Engineering.