Schlagworte: Speech Synthesis

Uncanny Valley Effect and Synthetic Voices

Google Duplex convinced not only with its voice, but also with its emphasis and way of speaking. It was imperfection that gave rise to perfection. What if you take a human-like voice and design the emphasis so that it doesn’t match what the machine says? This creates an effect that can be associated with the Uncanny Valley effect. In this example, produced using SSML, Allison tells you something very sad with a euphoric voice. It sounds a little creepy. Then she tells you something positive with an expression of regret. That sounds pretty weird. Of course, one can imagine a certain context. One can assume that the robot would also like to be a human being and that it is therefore envious of humans and full of sadness about its existence. Then you’d have a jealous, sad robot that doesn’t say what it means. Research is already underway in this area. In 2014 Jan Romportl published the paper „Speech Synthesis and Uncanny Valley“. More research is needed to better understand the effect of the synthetic voice. An interesting method is to make the emphasis inappropriate.

Fig.: The emphasis is inappropriate

Reflections on Individual Synthetic Voices

The synthetization of voices, or speech synthesis, has been an object of interest for centuries. It is mostly realized with a text-to-speech system (TTS), an automaton that interprets and reads aloud. This system refers to text available for instance on a website or in a book, or entered via popup menu on the website. Today, just a few minutes of samples are enough in order to be able to imitate a speaker convincingly in all kinds of statements. The article „The Synthetization of Human Voices“ by Oliver Bendel (published on 26 July 2017) abstracts from actual products and actual technological realization. Rather, after a short historical outline of the synthetization of voices, exemplary applications of this kind of technology are gathered for promoting the development, and potential applications are discussed critically in order to be able to limit them if necessary. The ethical and legal challenges should not be underestimated, in particular with regard to informational and personal autonomy and the trustworthiness of media. The article can be viewed via

Fig.: Can you hear my voice?

Liebeserklärung einer Roboterfrau

In einem KI-(Kunst-)Projekt von Oliver Bendel wurde am 26. Juli 2017 ein Gedicht mit Hilfe der Text-to-Speech-Engine von IBM Watson eingesprochen, unter Verwendung der Speech Synthesis Markup Language (SSML). Nachdem die Metainformationen von einer normalen künstlichen Stimme vermittelt wurden, wendet sich eine offensichtlich verliebte Roboterfrau an das menschliche Objekt ihrer Begierde. Es wurden Tags wie <voice-transformation> und <express-as> verwendet, zudem unterschiedlich lange Pausen eingebaut. Das Gedicht von Oliver Bendel kann über im Format .ogg heruntergeladen und beispielsweise über den VLC Media Player angehört werden. Bereits im März des Jahres wurde ein Haiku veröffentlicht. Es wurden Pausen am Anfang eingebaut, damit der Titel und die Metainformationen (Autor, System, Stimme, Datum) nicht zu schnell nacheinander erklingen, zudem Pausen zwischen den Zeilen des Gedichts.

Abb.: Verliebte Roboterfrau