Home Entrepreneurship Society Making the Leap from Speech to Dialogue: The Challenge for Human to...
Close-up Of A Man Holding Remote Control Of Robotic Vacuum Cleaner To Start Cleaning

Making the Leap from Speech to Dialogue: The Challenge for Human to Machine Communication

1
1455
Thumb1

Robots are everywhere and doing virtually everything. We have even begun conversing with them in situations that are beginning to resemble interpersonal communication. Right now these spoken dialogue systems (SDS) tends to be limited to a “command-based” approach, which can be seen with a number of recently introduced commercial implementations, like Apple’s Siri for the iOS, Amazon’s Echo/Alexa, and the social robot Jibo.

The command-based approach to SDS design works reasonably well, as it predetermines much of the semantic context, communicative structure, and social variables by keeping conversational interactions within manageable boundaries. Yet, the development of more robust SDS will rely not only on advancements in engineering, but will also require better understanding and modeling of the actual mechanisms and operations of human-to-human communicative behaviors.

Unfortunately, the two disciplines that deal with these subjects—engineering and interpersonal communication—have not recognized and/or exploited this interdisciplinary opportunity and challenge. Engineers, for their part, either have tried to reinvent the wheel themselves or have sought advice from research and researchers in other disciplines, like social linguistics or psychology. Communication scholars have often limited their research efforts and findings to human communication. When they have dealt with computers or bots, they have typically considered the mechanism as a medium of human communicative exchange—what is called “computer mediated communication” or CMC.

From the beginning, it is communication—in the form of conversational interpersonal dialogue—that provides AI with its definitive characterization. This is immediately evident in Alan Turing’s “Computing Machinery and Intelligence,” which was first published in the journal ‘Mind’ in 1950. “The idea of the test,” Turing explained in a BBC interview from 1952, “is that the machine has to try and pretend to be a man, by answering questions put to it, and it will only pass if the pretense is reasonably convincing. A considerable proportion of a jury aren’t allowed to see the machine itself. So, the machine is kept in a faraway room and the jury are allowed to ask it questions, which are transmitted through to it”.

Related Article:   Engage Profitably on a Nonprofit Basis

According to Turing, if a computer is capable of successfully simulating a human-being in communicative exchanges (albeit exchanges that are constrained to the rather artificial situation of typewritten questions and answers) to such an extent that the jury cannot tell whether they are talking with a machine or another human being, then that device would need to be considered intelligent.

Derived from this original proposal of Turing, all chatterbots, irrespective of design, inherit two important practical limitations:

  1. The mode of interaction is restricted to a very narrow range of interpersonal behaviors. Chatterbots have been designed as question answering systems. That is, their social involvement is intentionally limited to situations where human interrogators asks questions and the machine is designed to provide responses.
  2. These Q&A interactions are restricted to typewritten text. For Turing, and the chatterbots that follow his lead, the use of textual interaction is a necessary and deliberate element of the imitation game’s design. The main reason for limiting the interrogation to text form is to level the playing field: “In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten.

Recent developments in SDS implementations, especially commercially available products like Siri, Echo/Alexa, and others, are not one technology but consist of an ensemble of several different but related technological innovations:

  • “automatic speech recognition (ASR), to identify what a human says;
  • dialogue management (DM), to determine what that human wants; actions to obtain the information or perform the activity requested; and
  • text-to-speech (TTS) synthesis, to convey that information back to the human in spoken form.”
Related Article:   Innovation or just insanity?

Despite their apparent complexity and technical advancement beyond text-based chatterbots like ELIZA, SDSs are still designed for and operate mainly with text data.

conversational interaction

A good deal of conversational interaction is negotiated through nonverbal elements, which can include, visual cues, or “body language”. Right now commercially available SDS applications, like Siri and Echo/Alexa, are only attending to what is said. How it is said and in what particular fashion it is articulated is not necessarily part of the current implementations.

Such an effort requires the development of an interface between the fields of engineering and communication studies. Doing so will involve making theory computable so that the insights that have been generated by decades of communication research are not just human readable but are also rendered machine executable. At the same time, and on the other hand, engineers will need to learn to recognize and to appreciate how this so-called “soft science” can speak to and contribute the data necessary to address many of the open problems in SDS development.

Human-like conversation generally is considered to be a natural, intuitive, robust and efficient means for interaction. The ability to handle phenomena commonly used in human conversations could ultimately make systems more natural and easy to use by humans, but they also have the potential to make things more complex and confusing.

Related Article:   Hurtle diversity barriers with eyes wide open

Modeling human-to-machine (h2m) communication on human-to-human (h2h) communication might be the wrong place to begin, just as modeling “machine intelligence” on human cognition turned out to be a significant impediment to progress in artificial intelligence (AI). Identifying this assumption, however, does not mitigate against the argument for including interdisciplinary collaboration in SDS development.

Previous articleTechFin vs. FinTech – What’s the difference?
Next articleHow Can AI and Humans Work Together?
Ayse Kok
Ayse completed her masters and doctorate degrees at both University of Oxford (UK) and University of Cambridge (UK). She participated in various projects in partnership with international organizations such as UN, NATO, and the EU. She also served as an adjunct faculty member at Bosphorus University in her home town Turkey. Furthermore, she is the editor of several international journals, including those for Springer, Wiley and Elsevier Science. She attended various international conferences as a speaker and published over 100 articles in both peer-reviewed journals and academic books. Having published 3 books in the field of technology & policy, Ayse is a member of the IEEE Communications Society, member of the IEEE Technical Committee on Security & Privacy, member of the IEEE IoT Community and member of the IEEE Cybersecurity Community. She also acts as a policy analyst for Global Foundation for Cyber Studies and Research. Currently, she lives with her family in Silicon Valley where she worked as a researcher for companies like Facebook and Google.

1 COMMENT

  1. Needed to post you that bit of remark just to thank you again for your striking solutions you’ve featured on this site. I’m sure there are millions of more enjoyable sessions in the future for those who looked over your blog post.

LEAVE A REPLY

Please enter your comment!
Please enter your name here