Significant results have been achieved in the field of speech conversions and speech style changes


 

During the past and current year, AlfaNum's team has been working intensively on innovations that relate to the possibility of synthesizing speech with different characteristics, if available:

  1. a quality acoustic model, that is, a synthesis of the speech of the initial characteristics;
  2. a small sample of speech (lasting from a few seconds to a few minutes) of different characteristics.

You can listen to the results:

A sample of Donald Trump's original speech:

Synthesized Trump’s voice reads the same text:

Synthesized Obama’s voice reads Trump's text:

 

Change of the speech characteristics relates to:

  • Changing the identity of the speaker (the initial acoustic model corresponds to the voice of one speaker, and after conversion, the voice of another speaker is received).
  • Changing the style of speech (the initial acoustic model refers to a common, neutral style of speech, and after the conversion, for instance, we can get an expressive style that expresses some emotion - joy, anger, etc.)

Examples of speech style change:

 

The possible application of these innovations are enormous. First of all, they enable the generation of new TTS voices. Namely, the cost of developing a single TTS voice is very high, as it is evident from the fact that largest companies in this field do not have more than a few voices per language, and for "smaller" languages usually only one voice. On the other hand, the need for different TTS voices definitely exists - in interactive voice systems, video games, book reader applications, audio-textbooks... Furthermore, there is also a demand for adjusting the synthesis to the voice of the user (for reading messages from social networks, IM and e-mail messages, as well as the use of speech translation applications) or the voice of another person (in synchronizing movies using the voices of the original actors)

Published 15.06.2017.