Natural sounding digital assistants

Nobody likes people who state nonsense in a factual tone, so why would digital assistants act like that? We develop speech models for digital assistants to sound uncertain when they need to. There is a lot of computation happening to select the right utterance to a request from a user. It tries to determine the intent of the user, keywords that are important and or quantify units of measurement. All this needs to be understood correctly by the digital assistant. If some of the parts are misunderstood but the reply does not include any information of this misunderstanding, it decreases the perceived trustworthiness towards the assistant.

We as humans communicate this information constantly and consciously, to inform our party of any (un)certainties or doubts we have. This information is called paralinguistics, information transferred through the use of e.g., falling or rising intonations, pauses, certain words or loudness. These sound characteristics are mostly similar in a language and can be reproduced. Think about asking a question, you might pronounce the important part with increased loudness and have a rising intonation at the end.

Our research tries to find the paralinguistic characteristics of certain speech concepts like (un)certainty, doubt and confusion. These characteristics are used to synthesize speech for digital assistants with a controlled layer of paralinguistics to convey the inner uncertainties of AI models.

To define the right coefficients for these synthesis models we need a lot of speech. Our current experiment is about the collection of speech. If you are a native Flemish/Dutch speaker, and you like to participate in this research please follow the link Speech Experiment Sign-up. We are very grateful for your participation.

This work is part of the Flanders AI Research Program www.airesearchflanders.be in the work-package: WP3 Interaction, Personalization and Recommendation