handlers/usecases/speech


Speech Use Cases for Expert Handlers



Draft 1: Speech Use Cases for Expert Handlers (Janina Sajka, author)

Computer users who are blind or severely visually impaired often use assistive technology (AT) built around synthetic text to speech (TTS). These AT applications are commonly called "screen readers." Screen
reader users listen to a synthetic voice rendering of on screen content because they are physically unable to see this content on a computer display monitor.

Because synthetic voice rendering is intrinsically temporal, whereas on screen displays are (or can easily be made) static, various strategies are provided by screen readers to allow users to tightly control the alternative TTS rendering. Screen reader users often find it useful, for instance, to skim through content until a particular portion is located and then examine that portion in a more controlled manner, perhaps word by word or even character by rendered character. It is almost never useful to wait for a synthetic voice rendering that begins at the upper left of the screen and proceeds left to right, row by row, until it reaches the bottom because such a procedure is temporally inefficient, requiring the user to strain to hear just the portion desired in the midst of unsought content. Thus, screen readers provide mechanisms that allow the user to focus anywhere in the content and examine only that content which is of interest.

Screen readers have proven highly effective at providing their users access to content which is intrinsically textual and linear in nature. It is not hard to provide mechanisms to focus synthetic voice rendering paragraph by paragraph, sentence by sentence, word by word, or character by character.

Access to on screen widgets have also proven effective by rendering that static content in list form, where the user can pick from a menu of options using up and down arrow plus the enter key to indicate a selection, in liue of picking an icon on screen using a mouse.

Access to content arrayed in a table can also succeed by allowing the AT to simulate the process a sighted user employs to consider tables. In other words, mechanisms are provided to hear the contents of a cell and also the row and column labels for that cell (which define the cell's meaning).

Similar "smart" content rendering and navigation strategies are required by screen reader users in more complex, nonlinear content such as mathematical (chemical, biological, etc) expressions, music, and graphical renderings. Because such content is generally the province of knowledge domain experts and students, and not the domain of most computer users, screen readers do not invest the significant resources necessary to serve only a small portion of their customer base with specialized routines for such content. Furthermore, the general
rendering and navigation strategies provided for linear (textual), menu, and tabular content are woefully insufficient to allow users to examine specific portions of such domain specific expressions effectively. On the other hand domain specific markup often does provide sufficient specificity so that the focus and rendering needs of the screen reader can be well supported.

In order to gain effective access to such domain specific content screen reader users require technology that can:

  • Synthetically voice the expression in a logical order
  • Allow the user to focus on particular, logical portions of
  • expressions possibly at several layers of granularity
  • Appropriately voice specialized symbols and symbolic expressions



Groups: