Back to blog

Hello, and Welcome to MovieFone

Many customer service tasks can be automated, but there still remain a handful that are best handled by a person. Disputing a bill, negotiating a fee, changing a flight booking – you still need to get on the phone with someone. The system where you dial-in and punch numbers corresponding to menu items is called an IVR (Interactive Voice response) and was born in the 1960s.

IVR technology began with DTMF (Dial Tone Multi-Frequency, or touch-tone) and has since, with the maturity of voice recognition engines, evolved to include ASR (Automated Speech Recognition).1 Unfortunately, it largely remains an archaic system. The system simply listens for the sound of dial tones corresponding to the menu items it lists out.

In theory IVR systems makes perfect sense. From the business point of view, it is useful not only for efficiently routing calls to the proper channels but also for gathering data on customer needs. Providing call segmentation, 24-hour services and privacy are other added benefits.2 From the customer point of view, it helps in performing simpler tasks without having to wait to speak to an operator.

In practice – it sucks.

It goes the same way every time: call in and then weed through a multilevel jungle of options. It’s not uncommon to forget earlier options while continuing to listen to the later ones. Listening to sometimes as many as 8 options before moving forward requires a huge cognitive load. When the system has finally figured out how to route the call, you are inevitably put on hold. Periodically, machine speech breaks the hold music to tell you how much waiting time is left.

The Idea

Using existing operating system styles to illustrate the idea, this is the IVR system that we wish we had. What about accompanying the call with visual feedback? Would that solve some of the pain that comes from dealing with these systems? We took a cue from Apple’s re-imagining of voicemail and we came up with an idea that might solve some of the problems we talked about earlier.

Menus corresponding to Audio

Options at each stage of the call could be displayed so you don’t have to wait before moving forward. Selecting an item still sends the dial tone through.

A Better Hold Screen

Pre-entering information while you wait could speed up the call.

A little more helpful

The helper could serve up relevant content once it knows you are travelling to Japan.

See it in action

How it might work

Chances are you don’t know the customer service number and probably need to look it up. In this case, we’re guessing that most calls coming from a webpage are initiated through a link. IVR scripts could very easily be provided in an XML schema that accompany the presentation layer of the call. When a call is made through a browser, this data would be passed to the dialer to help inform the helper.

Using existing standards

In researching for the idea we came across VoiceXML (Voice Extensible Markup Language). As described on W3C:

“…a modular XML language for creating interactive media dialogs that feature synthesized speech, recognition of spoken and DTMF key input, telephony, mixed initiative conversations, and recording and presentation of a variety of media formats…”

There is already work being done to bring the advantages of Web-based development and content delivery to interactive voice response applications.3 Companies are already benefitting by building applications around this standard to handle calls.

For the customer, using a standard like VoiceXML for structuring IVR scripts could be huge – it enables the data to be platform agnostic. For our idea, scripts would be passed from the browser to the dialer, and then to the helper, but could just as easily be passed from any application as well.

Why it can’t be done

Presently, it is not possible to implement this functionality on Android, let alone iOS. The reason is because an API to generate DTMF tones over the uplink audio path does not exist.4 Without this feature, there is no way to advance through the call system during an active call. The good news is that there has been some work done in the project pipeline to enable it in a future build of Android.5 One current work-around is to prompt the user to step through the menus before the call, and then inject DTMF tones (with the necessary delays) during the dial process.

With big contributions from: Matt Hodgins, Jonas Naimark and Peter Nitsch


  1. Enhancing the Self-Service Experience
    Richard Feinberg, Ph.D, Dept. Consumer Sciences and Retailing, Purdue
  2. IVR Improvement Strategies 2011
    The Ascent Group
  3. Voice Extensible Markup Language (VoiceXML) 3.0
    The World Wide Web Consortium
  4. ToneGenerator
    Android Developer Reference
  5. Change If60d9f81: Telephony: Add support for sending DTMF codes.
    android-review.googlesource Code Review
Nelson Leung More posts by Nelson Leung