*Home | *Essays | *Professional | *Productions | *Personal

Blog entries by Bradley Lehman: IVR Design

These are excerpts from my eleven postings to "The Angel Voice" blog, which was active from autumn 2008 to the end of 2011. The blog was a professional forum sponsored by, my employer at the time.

My topics were aspects of audio design and production for telephone IVR systems.

  • Lily Tomlin's Ernestine, and bad VUI confirmation/re-prompting (9/10/08) - why a badly designed IVR system frustrates users, or comes across as obnoxious

  • Awkward Phrases in the Auto-Attendant (9/17/08) - the cliches and off-putting verbiage that cause customers to hate telephone systems

  • How's the front door? (9/21/08) - perspectives on the importance of a company's phone-answering system

  • Simple editing of WAV files for your phone system (5/05/09) - tools and techniques

  • Speech recognition: a fruit by "any other" name (11/20/09) - why directed dialog works better than "say anything" design

  • Strategies for "Caller First" design: advertising over the phone? (12/11/09) - crafting prompts for best usability

  • Press 4 for "Funner Options", and use our Facebook fan page! (3/04/10) - on flair

  • IVR Design: Just Kiss Her (7/14/11) - using silence

  • IVR Design: What Does the Listener Need to Know, and When? (8/30/11) - organizing a message for speech is profoundly different from writing a piece for print

  • Why Can't I Have Stereo CD-Quality Audio On My Phone System? (9/21/11) - a primer of digital sampling rates and the infrastructure's standards

  • Dial (888)583-2801 for IVR Horrors (10/21/11) - avoiding IVR design errors

Most of my section of the blog is archived in the Internet Archive Wayback Machine here, here, and here, but many of the linked audio resources and navigational links are broken inside those historical snapshots. I have reproduced several of the best postings below, correcting their links for easier reading.

These entries are about thoughtful wording, quality control, and the use of audio-editing strategies to make phone systems more easily usable.

+ - + - + - + - + - + - + - + - + - + - + - +

How's the front door?

By Brad Lehman on September 21st, 2008 door

How's the front door of your company's phone presence in the world?

Here's a useful little test. Every week, assign two or three people from your own company to call into your own phone line. Not the same people every week. Rotate it.

Take notes on the total user experience.

If there was a transfer to an agent, how long did it take?

Were there any obvious problems with the automated prompts that could be fixed with a common-sense approach? Or, any more profound problems that might require some consulting? (It is your company's own front door, on the phone, so things "should" run correctly!)

If your own CEO leaves a voicemail message in the company's system, how long does it take for the promised callback?

Did the call get dropped at any weird place?

Was there anything that a brand-new customer, or a potential customer, would find confusing or off-putting?

The customers might not be able to report problems to you, and might not bother to do so. Competitors certainly won't.

Have fun! Break your own system and find any problems before your customers do!

+ - + - + - + - + - + - + - + - + - + - + - +

IVR Design: Just Kiss Her

By Brad Lehman on July 14th, 2011 lips

Has this ever happened to you: you call someone's automated phone system, or it robo-calls you, and then it jabbers at you?

The jabbering phone system does any of these things:

  • It blasts you with monologues before you get to do anything.
  • It begs you to "please listen carefully," which is insulting, because it implies that you already weren't listening adequately!
  • It asks 40-word questions with no clear point.
  • It begs you to "please choose from the following options," as if you weren't clever enough to recognize that a menu is coming. (Why not just say "Subscriber Menu," or start the list with a one-second silence or a chime?)

It eventually gets you to some list of options, but then they all run together too fast, not letting you think or respond.

How does this make you feel, as the customer? Phone robots are supposed to be interactive, right, not like TV or radio ads?

This robot, however, is talking so fast, or so much, that you just stop listening. You don't know when it's your turn to do anything. It's not letting you cooperate, even if you could figure out what it wants. It seems that if the company built it badly, they must not really care about treating customers well.

Imagine it like this famous scene in "It's A Wonderful Life:"

Wonderful Life

Jimmy Stewart is walking with Donna Reed in the moonlight. He's prattling on and on, trying to say romantic things to impress her. Donna is obviously distracted, even turning away. Jimmy, oblivious, keeps talking.

The guy who's been watching from the porch next door yells: "Why don't ya kiss her, 'stead of talkin' her to death?"

Jimmy: "How's that?"

Neighbor, slower: "Why don't ya kiss her, 'stead of talkin' her to death?!"

How does this relate to IVR?

Brash young Jimmy still can't take the hint, but maybe IVR designers can.

Just like with a bad IVR system, first, Jimmy has lost Donna's attention with his rambling. Does he really want her to participate in anything interactive, or does he just enjoy hearing himself talk?

Second, the neighbor, who first yells quickly, has to slow down his speech to get his point across.

Third, Jimmy continues with yet more talk that intimidates Donna further, and she never gets her kiss. She's the customer who goes away unsatisfied.

Is Jimmy really worth her attention, if he doesn't let her respond?

Now, suppose you're in the IVR equivalent of Jimmy's shoes, and it's your job to prepare a phone robot for your own customers. This is your company's public "face" over the phone. It's important. You want your customers to have an efficient experience, getting what they need. You want them to respond with delight when they deal with your company. You want them to come back to you for more.

How can you avoid the mistakes that the above IVR system (and the characters in "It's a Wonderful Life") made?

Here's an important principle for building your robot:

Whenever you really want to grab or re-grab your caller's attention, lips stop talking.

Nothing says "it's your turn to respond" like a carefully-controlled piece of dead air. When the line goes silent, the listener thinks: "Hey, what's gonna happen next? I'd better pay attention!"

Don't believe me? Listen to it yourself.

In these recorded samples, I took the same seven-item menu, and changed only the amount of silence that happens between the options. That silence is the time for the caller to accomplish many things: to hear what was said, understand it, make a decision about it, and then press something.

  • Medium Fast -- perky, with a short breathing space after each option.
  • Medium Slow -- with 0.25 second of extra time to think about or respond to each option, before the next one comes.
  • Slow -- only 0.25 second more between options than "Medium Slow," but "get on with it!"
  • Fast -- almost no breathing space between options. (This was the actress's original pace of the delivery.)
  • Super Fast -- all options run together, and it sounds as if the key commands belong with the next option!

Which one of those would you most want to hear on the phone, as the customer?

My own preference is for the "Medium Slow" version, but I could also live with "Medium Fast."

The others, especially the fast ones, make me want to slam down the phone. The robot's owner must not want my business if it drags on or doesn't give me time to think.

Consider your caller's (customer's) perspective when the pace is wrong:

  • Customers can't see the menu, or know what's coming next. They are going by only what they hear. So, the pacing had better be immediately intelligible!
  • Callers need clearly distinguishable options, and enough time to decide what to do. They have to figure out which option matches the reason they are calling, and hear what the system expects them to do in response.
  • Then, after they realize it's their turn to do something, and they've made a decision what to do, they also need some time to get a finger to the right button, or to speak a command.
  • Humans can't do all of that instantaneously, especially when new information keeps arriving without a pause.

Isn't it surprising how much difference a quarter-second of silence makes? It changes how the user feels about wanting to respond, or being able to respond.

Best of all, it's easy.

It's vital to make the caller feel respected and easily able to cooperate, if you want that person to remain your customer. If the robot's menu is too fast, or too slow, the caller will get confused, frustrated, impatient, or might just give up.

It took only a few minutes to make these adjustments to the recording, adding some milliseconds of silence using an audio-editing tool. But, those few minutes of work make a huge difference in the way customers will perceive and respond to the robot. The timing is at least as important as the words that are said, and maybe more so.

So: when you're designing and developing an IVR system, silence is your friend. Treat it well. In the systems I build, I re-edit every recording with an ear for the pacing, usually to add more silence and cut more words.

A one-second silence grabs my attention and cooperation better than any "please listen carefully" begging will. For example: "Buffalo gals, won't you come out tonight... (silence) ... and dance by the light of the moon?!"

The ultimate goal? Customers just might kiss back if we stop talkin' 'em to death.

+ - + - + - + - + - + - + - + - + - + - + - +

Why Can't I Have Stereo CD-Quality Audio On My Phone System?

By Brad Lehman on September 21st, 2011 stereo

I'm occasionally asked: "We hired our own voice talent for our IVR, but why doesn't the speech sound nearly as good over the phone as it did in the recording studio?" They then present a stereo CD-quality recording, and ask how it must be processed to make it work properly on the phone.

There are various relevant issues in digital recording and network transmission, but for simplicity we'll focus on frequency response, sampling rate, and perception of speech.

Human Perception

People are good at making sense of what they hear, even when most of the information is missing. Our brains guess to fill the gaps. To demonstrate this phenomenon, I have simulated a call with cell phone drop-outs, no high frequencies, and the distraction of dogs barking.

You probably understood at least 90 percent of these recordings, and could figure out what the speech instructs you to do. Thanks to the listener's ability to make good guesses, dealing with errors in the signal, the phone's audio transmission doesn't have to be excellent; a clearly-spoken message is still comprehensible, even over distractions and drop-outs.

The samples above have a digital sampling rate of 8000 per second--the industry standard for speech over a phone--while "CD quality" sampling is much higher at 44100 per second. Additionally, I applied "lowpass" filtering, which discards all the high-pitched (treble) content. That's why the result sounds so muffled and un-lifelike.

Digital Sampling Rates and the Frequency Limit

A sound wave in air is a pattern of rapid changes in air pressure. To make a recording of that sound, we have to measure that pattern. It looks like a complicated set of wiggles, if we graph it. Here is a representation of a spoken syllable:


This example has several interlocked patterns that are repeating, but also subtly changing. Notice how the big wiggles have several layers of sub-wiggles within them. The low and high parts of the sound, ranging from bass to treble, are different frequencies of vibration happening simultaneously: slower or faster wiggles. Frequency is measured in Hertz (Hz), cycles per second.

When our recording method is digital, the measurement of the sound wave is done by "sampling," which is capturing the size of the wiggles. At a steady rate of every 1/Nth of a second, we store the measurement at that instant. To have the sound still seem realistic in playback, in a digital recording, we need at least twice as many samples as there are wiggles in the air wave: to catch the wiggle somewhere near its top and bottom points. We can't measure or capture any frequency that is higher than 1/2 of our sampling rate. High-frequency wiggles might be happening between the samples, faster than we're looking at their size, but we can't see that they are happening.

If we're taking 44100 samples per second, we can measure frequencies near 22500 Hz, but just barely. We can see that such a fast wiggle is up or down, but nowhere in between. Now, does that matter to a listener? In frequency range, human hearing falls off at about 16000 to 20000 Hz. If there are errors in playback of any high sounds above 20000 Hz, we can't hear those errors, so we don't care. 44100 is a reasonably good sampling rate for music.

The Phone Network's Technical Requirements

Now, what can be transmitted to a phone? If we take a 44100-sample recording and change it to a lower sampling rate, for example, 8000, we have a smaller signal which takes less storage space and transmits much faster. More than 4/5 of the information is thrown away, to be able to send a signal more quickly and reliably! A sampling rate of 8000 can't store any frequency above 4000 Hz, but that's OK, because most of the sounds in speech happen below 4000 Hz (except for some consonants such as S and F, and some of the tone color within vowels). These parts of the signal get lost between the samples: they can't be played back, because the slower sampling rate couldn't store their measurements. The speech seems muffled over the phone because we're missing most of the information, especially in the high frequency range.

Here are some audio examples for you to compare, all stemming from the same studio recording:

We've all heard phones that sound as bad as that last one, right?

The North American Public Switched Telephone Network (PSTN) has a frequency response range of 300 to 3400 Hz. If there's any bass below 300 Hz, or treble above 3400 Hz, your phone won't receive it across this network. As long as the digital sampling rate is more than double 3400, though, the signal can hold the information. So, with the speech-industry standard of (**) 8000 samples, there is a decently clear sound for talking.

Phone lines, wireless networks, phones and speakers are each going to cause more damage to the sound quality. The signal can't be stereo, because the phone has only one speaker, and the network has only one channel rather than two. It's pointless to include much bass, because a phone's tiny speaker can't play it, and the network won't transmit frequencies below 300 Hz. There will also be occasional drop-outs if the network temporarily loses the signal.

What's Left?

Listen again to the first example of "Speech with drop-outs but no dogs." It doesn't sound pretty, but it works. We might not even consciously notice that the sound is terrible, because we're accustomed to it. Our brains fill in missing parts and block out some distractions. Yes, there is a huge loss of audio fidelity, but we can live with it in favor of getting the phone to work reliably, delivering our message.

Our expectation in an IVR system can't be audio fidelity. To send a clear message to a person on a phone, we have only a small frequency range and crude sampling rate to work with. The user will encounter further distractions, and further loss or distortion of the signal. So, we must optimize what we can control: careful organization of the information, good writing of prompts, correct sampling rate, correct volume, timing, diction, and the voice talent's projection of personality.

+ - + - + - + - + - + - + - + - + - + - + - +

Awkward phrases in the auto-attendant

By Brad Lehman on September 17th, 2008

Does your phone attendant have any of these greatest hits?

pepper "Please listen carefully, as our menu options have changed." (A one-second silence is better than this cliche. Did the callers really have your old menu memorized? When did it change? Why should they care?)

pepper Convoluted, impenetrable, obfuscatory, constipated bureaucratic verbiage. (Keep things simple for your customer.)

pepper Three nouns in a row, or three adjectives in a row, on the phone. (Use short sentences with one noun and one verb.)

pepper "Momentarily" and "shortly" instead of "soon". (Strive for a third-grade reading level.)

pepper "Your call is important to us, so please stay on the line." (Unnecessary and insulting; the caller may be thinking, "If my call were really important to your company, you'd have more agents on duty to help me!")

pepper "Eastern Standard Time", as in: "Our business hours are 8:00 a.m. to 6:00 p.m., Eastern Standard Time." (It's wrong half the year because of Daylight Saving time changes. Just say "Eastern Time".)

pepper "Seven days a week" = "Every day".

pepper "If you know your party's extension, enter it now." (It sounds 40 years out of date to say "your party".)

pepper "Visit us on the web at w-w-w, blah-blah-blah-blah" before the caller gets to make any choice. (The caller may be here using the phone because the web site was already inadequate for his or her needs. The phone system should respect the caller's choice and invested time to make a call, and not beg the caller to hang up. In effect, such an advertisement says that the caller was wrong or unwelcome to call.)

pepper "Please press" with every number, where the repeated "please" gets annoying. (Keep it short.)

pepper "Sorry, I didn't get that." (Computers aren't sorry.)

pepper "For all other questions, including fruit bats and breakfast cereals, press 5." (Superfluous "including" clause. If the fruit bats thing really is an option, give it its own earlier number: both for easier usability and to make its usage easier to measure.)

pepper "For general information, press 1. For information about dingo's kidneys, press 2...." (General info can be offered as a catch-all option only after all the options of more specific info. If it's first, the menu is useless.)

pepper "We are currently assisting other customers. Your call will be answered in the order in which it was received." (It's unnecessary to say any of this.)

pepper "For more information, call 847-273-7502 during regular business hours. Thank you." (The caller has no warning to note a phone number, or time to catch it.)

[I annotated each of those lines with further analysis and remarks. I showed more deeply why they don't fit principles of best practices in the industry.]

+ - + - + - + - + - + - + - + - + - + - + - +

IVR Design: What Does the Listener Need to Know, and When?

By Brad Lehman on August 30th, 2011 lamb

Recently, I came across a fictitious radio advertisement in Henry Reed's Baby-sitting Service, a children's book from 1966 by Keith Robertson. Two teenagers want to be hired for temporary childcare. The staff at the local radio station had written and produced the ad for their summer business:

("Mary Had a Little Lamb" jingle sung by a girl)

Male announcer: "Folks, Mary wouldn't have much trouble today. Her education wouldn't be interrupted by her little lamb following her to school. Because she would simply telephone the Henry Reed Baby-Sitting Service, and in the twitch of a lamb's tail, a competent baby-sitter would be there.

Henry Reed and his partner Miss Margaret Glass offer reliable, efficient baby-sitting at prices you can afford. Call Henry at HA 9-1234 or Margaret at HA 9-1763.

If your little lamb wants to learn to dance, Margaret can teach him ballet. If you want your child to be able to communicate with your French poodle--or possibly call General de Gaulle--Henry can coach him in French.

If you live in the Grover's Corner area and you need a baby-sitter for a lion or a lamb call Henry Reed's Baby-Sitting Service!"

When I was reading this aloud, my IVR-design antennae went up immediately. Would this be an effective advertisement? Could it be organized better?

The design principles of radio advertisement and IVR service are similar--both must ensure that the person listening to the information receives it in a sequence that is easy to understand and easy to use. The customer cannot see anything.

  • Do I care to pay attention to the rest of this message?
  • If I do, does it tell me what I really need to know?
  • Does it tell me how I can respond, without confusing me?
  • Does it give me enough opportunity to remember or write down the pertinent information?

Let's see how Henry's ad stacks up:

  • The use of a jingle is catchy, grabbing the listener's attention in preparation for the speech.
  • The perspective is the potential customer's needs and problems; that's good. The ad shows how the company's services can solve the listener's problems.
  • The ad mentions the company's name twice, which is good for the listener's retention, but the name isn't the same both times.
  • The ad gives two phone numbers, instead of only one. How is the listener to know which one to choose, or which one is better to write down? Furthermore, if calls come into two places, the company has to do extra work: coordinating any duplicate requests for services, and following up any requests that only one partner knows about.
  • The biggest flaw: the phone numbers are not given again at the end of the message. After the listener's interest has been captured, perhaps by the "Grover's Corner area" or the French and ballet offers, there is no way to go back to hear the contact information. The ad would work OK in print, because the reader could go back to look for the phone number, but it doesn't work in speech.
  • The "Grover's Corner area" is only in the last sentence. This crucial piece of information should be much closer to the beginning. Listeners need to know that, if they are too far away for the company's services, they should not call. (As expected, in the story, Henry and Margaret do get some outside-area calls that can't be handled by their business. The fault lies with the poorly-worded radio ad.)
  • The part about the ballet lessons and the French adds some human interest to the business, making this service stand out memorably from any competitors. That part is also (unfortunately) too long and it uses too many irrelevant words.
  • The use of a female announcer might be more effective than a male announcer here, since the intended audience for the ad (in 1966!) is mostly women who are taking care of children. A female voice would show more empathy with the customer's problem and draw more interest to the proposed solution.


Let's try a revision, organizing the advertisement differently, so the information will be more usable by the listeners:

("Mary Had a Little Lamb" jingle sung by a girl)

Female announcer: "Folks, Mary wouldn't have much trouble today if she lived in the Grover's Corner area of New Jersey. Her education wouldn't be interrupted by her little lamb following her to school.

To take good care of her little lamb, she would simply telephone Henry Reed's Baby-Sitting Service, and in the twitch of a lamb's tail, a competent baby-sitter would be there.

Henry Reed and his partner Miss Margaret Glass offer reliable, efficient baby-sitting at prices you can afford. They can even coach your little lamb or lion in ballet or in French.

So, if you live in the Grover's Corner area and you need a baby-sitter for a lion or a lamb, call Henry Reed's Baby-Sitting Service at HA 9-1234! Again, that phone number is: HA 9-1234."

This could be improved further, but the first round of revisions has already corrected the most important problems of structure. It is now organized for speech and listening, rather than print.

  • Grab the Grover's Corner people in the first sentence. "Hey, they want business from me, and I should pay attention."
  • Use one phone number.
  • The phone number is last, and spoken twice, to give customers a chance to jot it or remember it.
  • Describe the services and company in a memorable way.
  • Use a voice that suggests empathy with the customer, rather than an authoritarian lecture.

Further revisions could tighten it up more, or answer the listener's questions in some better way:

  • Is the service only for people who are in school, with a child tagging along? Probably not.
  • "A twitch of a lamb's tail" implies that the company will immediately serve emergency requests. Is that the desired core of the business, or would it be preferred to have the work scheduled ahead?
  • Is the service available evenings and weekends, or only during weekday times when the parent needs to run errands?
  • Reconsider "competent", "reliable", "efficient" -- best adjectives describing the business? What is "efficient" baby-sitting? Are "competent" and "reliable" too similar to each other?

Whether it's an IVR system or a radio advertisement, the main point is: 15 minutes invested in reorganizing the material for usability can make a huge difference in the results. In the speech medium, write for speech, not print. Customers will find it easier to understand the business's case and to respond to its offers.

Brad Lehman About the Author: Brad Lehman is a Professional Services Consultant for Angel. In that role, he designs and develops customized systems to match the client's business requirements. He brings more than 20 years of professional experience in developing data-driven user interfaces. Brad's other background is in music, with a doctoral degree in harpsichord performance. Harpsichord players use carefully-controlled silence in the right places to clarify the music. Using musical listening skills and audio-production skills, Brad brings that same care to the crafting of IVR prompts. The pace of the IVR's recorded speech has to be exactly right, so the caller will have enough time to understand what was said, and will respond at the right moment with a confident decision. That's the obsession that Brad likes to write about in this forum: making it easy for the caller to get through the questions, through principles of well-organized wording and perfect pacing.

The icons are from

Back to my resume....