AI for Voice and Vocal Expressions

May 13, 2018

AI for Voice and Vocal Expressions | Blog - Soham

In 2016, Adobe released its Photoshop equivalent for voice, Voco [1]. They demonstrated that *any* texts could be spoken by a certain voice, if they have sample data-set of that particular voice. [2]

In 2018, Google demonstrated in Google I/O that its artificially intelligent assistant can carry out normal conversation with human beings, in restricted domains [3]. In the demo, its assistant called a restaurant to book an appointment on behalf of someone. The restaurant attender did not recognize that she was speaking to a machine, not a human. However, Google admitted that its technology could not work in all situations; it only works in certain domains.

Let's, now, look at a not-so-distant future. We know, Google has our voice samples. It collects those samples when we ask Google any question like "Ok Google, how is the weather today?" It's pretty clear at this point where this is going. With enough voice samples, any voice can not only be mimicked to say anything, any texts, but subtle human elegance with fine nuance can also be added to the conversations. An artificially intelligent agent can be used to mimic exactly our voice with our speaking style. The person on the other end may not be able to even doubt that a machine is speaking.

At this point, it's not a question whether we should build this technology or not, because one or the other will be able to build this kind of technology. More pertinent question seems to be how we can defend its abuse. Its potential abuse could be done even within restrictive domain, like in politics. We need to think hard about safety and security issues with the advancement of computer science technologies.