In the Service of Collaboration that Balances Openness and Accuracy

Home World Zars Global Mind Staff

Collaboration Using Robust Speech Recognition

This system would be motivated by the need to support spoken dialog.  Since speech is readily generated but much more difficult to apprehend at a glance than written information (at least among the better educated, in general), this approach would be most useful to augment or supplant  the common and much despised endless meeting.  It could also greatly enrich conversations reaching a large audience, such as call in programs on radio and television.  Using speech recognition to control the system is to allow participation by telephone alone although the same general approach could be used by Internet participants with considerably less investment in new hardware.

A small set of control phrases (e.g. "back, forward, time?, sssh! I'm thinking, what are my options?, yes, no, i disagree, that's wrong, I have a question, I have a comment, that is interesting/silly/stupid/unimportant/irrelevant/redundant, who said that?, who disagrees?, who laughs") may be used to build and asynchronously filter conversational turns in order to create an interesting dialog.  

Following is an outline of how the process could augment the call-in format commonly used in radio and television:

  1. An initial program is recorded (e.g., by a show host and guest) and the topic/idea transitions are marked in real-time to help callers to navigate.
  2. The initial program is distributed live on radio or television.  The program is preceded by an announcement like, "This program is part of a series being continuously improved by feedback from callers to (phone number).  If you would like to ask a question or disagree please note the exact time as it will help you to re-find the context for your interaction.  The time now is ____.  If you get a busy tone, call after the program has ended.  The most asked questions and best comments will be integrated into a rebroadcast of this program at ______."
  3. Callers are asked to identify themselves, find the point at which they wish to interject and speak their piece.  The system instructs callers in the control language a few words at a time, 
  4. Contributors are then asked to classify the type of each piece as a question, disagreement, comment or restatement.  They are also asked whether they would like to record a short "earcon" (aka audio icon) that suggests their opinion.  This would be the kind of thing that one hears in a live studio audience like an empathetic word, a cough or hurrah.   If the option to record a personal or instance specific earcon is declined, a default sound matching the indicated type will be used (e.g. throat clearing for question, cough for disagreement, "mmm" for support, "why not" for restatement).  The earcon will be inserted/mixed into the subsequent program playback to alert users who have been instructed to interact with the rest of the audience at will. 
  5. The contributor then hears other contributions to the program and is asked for his opinion.  These opinions are collectively weighed and influence the subsequent prominence (principally volume) of their earcons.  Interesting and popular questions and comments will tend to rise in prominence.
  6. The program staff monitors for emergent hotspots and can move to queue those questions and comments to the live program.  Or this can be done automatically to save on costs.
  7. The guest may stay after the live program ends, or call in himself, in order to have more time to talk with the audience.
  8. The show is rebroadcast some hours or days later with a much higher proportion and quality of audience participation. The system gives the audience the ability to direct the program and, once the system has been primed by an interview with a Christoper Lydon, take over the hosting.  In other words, this system aims to do for radio talk shows what blogging is doing to news reporting: bypassing the usual media outlets and their pundits.  
  9. A reward system is instituted to motivate high value contributions.  This could range from kudos to the kind of cash rewards that would encourage the occasional Ollie North to defect. 

 

Notes: 

As a general principle, the novice user is exposed to a minimal spoken command set.  As user become more experienced, they are offered opportunities to use and become fluent in more complex command vocabularies.  This approach may also be used in conjunction with a WWW interface or Internet broadcast.

The eventual main use for collaboration with robust speech recognition will be to augment live meetings with asynchrony that allows more flexible scheduling and filtering that makes better use of participants' time and skills yielding more optimal and timely resolutions.

 

Document Manager: Bruce McHenry, bruce@discussionsystems.com
This document is covered by discussIT.org's Terms of Use.
Principals are asked to expand upon this document in a non-public forum.
Patent may be applied for by Discussion Systems.

Revision History
Created 2/10/2003 with thanks to Tamar H. Miller for motivating this document.
Link from discussit.org table of contents page corrected; edited for clarity 6/19/03, 1/30/04.