Collaboration Using Robust Speech Recognition
This system would be motivated by the need to support spoken dialog.
Since speech is readily generated but much more difficult to apprehend at a
glance than written information (at least among the better educated, in
general), this approach would be most useful to augment or supplant the
common and much despised endless meeting. It could also greatly enrich conversations
reaching a large audience, such as call in programs on radio and
television. Using speech recognition to control the system
is to allow participation by telephone alone although the same general approach
could be used by Internet participants with considerably less investment in new
hardware.
A small set of control phrases (e.g. "back, forward, time?, sssh! I'm
thinking, what are my options?, yes, no, i disagree,
that's wrong, I have a question, I have a comment, that is
interesting/silly/stupid/unimportant/irrelevant/redundant, who said that?, who
disagrees?, who laughs") may be used to build and asynchronously filter
conversational turns in order to create an interesting dialog.
Following is an
outline of how the process could augment the call-in format commonly used in radio and
television:
- An initial program is recorded (e.g., by a show host and guest) and the
topic/idea transitions are marked in real-time to help callers to navigate.
- The initial program is distributed live on radio or television. The program is preceded by an announcement like,
"This program is part of a series being continuously improved by
feedback from callers to (phone number). If you would like to ask a question
or disagree please note the exact time as it will help you to re-find the
context for your interaction. The time now is ____. If you get a
busy tone, call after the program has ended. The most asked questions
and best comments will be integrated into a rebroadcast of this program at
______."
- Callers are asked to identify themselves, find the point at which they
wish to interject and speak their piece. The system instructs callers
in the control language a few words at a time,
- Contributors are then asked to classify the type of each piece as a
question, disagreement, comment or restatement. They are also asked
whether they would like to record a short "earcon" (aka audio
icon) that suggests their opinion. This would be the kind of thing
that one hears in a live studio audience like an empathetic word, a cough or
hurrah. If the option to record a
personal or instance specific earcon is declined, a default sound matching
the indicated type will be
used (e.g. throat clearing for question, cough for disagreement, "mmm"
for support, "why not" for restatement). The
earcon will be inserted/mixed into the subsequent program playback to alert
users who have been instructed to interact with the rest of the audience at
will.
- The contributor then hears other contributions to the program and is asked
for his opinion. These opinions are collectively weighed and influence
the subsequent prominence (principally volume) of their earcons.
Interesting and popular questions and comments will tend to rise in
prominence.
- The program staff monitors for emergent hotspots and can move to queue
those questions and comments to the live program. Or this can be done
automatically to save on costs.
- The guest may stay after the live program ends, or call in himself, in
order to have more time to talk with the audience.
- The show is rebroadcast some hours or days later with a much higher
proportion and quality of audience participation. The system gives the
audience the ability to direct the program and, once the system has been
primed by an interview with a Christoper Lydon, take over the hosting.
In other words, this system aims to do for radio talk shows what blogging is
doing to news reporting: bypassing the usual media outlets and their
pundits.
- A reward system is instituted to motivate high value contributions.
This could range from kudos to the kind of cash rewards that would encourage
the occasional Ollie North to defect.
Notes:
As a general principle, the novice user is exposed to a minimal spoken
command set. As user become more experienced, they are offered
opportunities to use and become fluent in more complex command
vocabularies. This approach may also be used in conjunction with a WWW
interface or Internet broadcast.
The eventual main use for collaboration with robust speech recognition will be to
augment live meetings with asynchrony that allows more flexible scheduling and filtering that
makes better use of participants' time and skills yielding more optimal and
timely resolutions.
Document Manager: Bruce McHenry, bruce@discussionsystems.com
This document is covered by discussIT.org's Terms
of Use.
Principals are asked to expand upon this document in a non-public forum.
Patent may be applied for by Discussion Systems.
Revision History
Created
2/10/2003
with thanks to Tamar H. Miller for motivating this document.
Link from discussit.org table of contents page corrected; edited for clarity 6/19/03,
1/30/04.
|