The system for dictation of medical findings, i.e. automatic creation of medical findings based on dictated speech, is aimed at increasing the efficiency of medical staff and allowing them to focus on more important aspects of their work. The system can be adapted to the vocabulary of any area of medicine, and it can also be easily integrated into existing systems and applications already in use, with minimal need for additional training of end users.


The system for dictation of medical findings:

  • recognizes speech delivered naturally with hardly any errors, on a computer of average performance, without any special microphone, in real time – without delay;
  • recognizes and correctly interprets abbreviations, punctuation, capital letters;
  • recognizes and correctly interprets latin medical terminology, and successfully combines recognition of Latin and Serbian (e.g. status post hysterectomiam in October two thousand and twelve);
  • supports special commands according to user requirements („delete word/sentence“ etc.);
  • allows the user to manually correct an incorrectly recognized word.

A demo of the system (in Serbian) can be found at:


The system is based on client-server architecture, which means that recognition is carried out by a centralized server, which is either cloud based, or located within the premises of the institution. The server receives speech sound recordings from computers of end users and returns the recognized text. This approach has two significant advantages:

  • recordings never reach a public network, which implies that their privacy is absolutely safe;
  • acquisition of new and more powerful computers for end users is not needed, which significantly lowers the cost of the system in comparison with a scenario in which recognition is performed locally.

Hardware requirements of the system principally depend on the maximum number of simultaneous requests for service, but to some extent on the size of the vocabulary as well. One standard CPU core is typically able to service one recognition channel. The use of graphical processing units (GPU) significantly increases the number of channels that can be serviced.