Unified Use Cases for Expert Handlers (Draft 1)
Braille Use Cases for Expert Handlers: Structure to Braille Conversion
Original Section Author: Vladimir Bulatov
An Expert Handler should be able to provide braille data for braille display output by generic Assisstive Technology (AT). Custom braille output is needed, because generic AT has no knowledge about how specific data can be represented via braille. An example is MathML; there are many different braille codes used to represent mathematics in different countries and agencies.
An Expert Handler should provide a way allow user to select different type of braille conversion.
There is a finite number of possible braille dot patterns because there are only 6 or 8 possible dots per Braille symbol and simple ASCII strings are usually used to communicate Braille.
However, there are a lot of specific ASCII-to-dots pattern encoding in various countries. It make sense to use unicode symbols from
0x28FF to communicate the Braille pattern to the AT. Alternatively, an Expert Handler could communicate via a Braille table to AT or could request a specific, specialized Braille table be loaded in-process for use by the AT.
There is possible need to have braille output of various level of granularity. For example at low level of granularity - have overall description of the mathematical expression or image and at high level of granularity have complete braille translation of the whole math expression or list of all labeled components of the image.
Some data may need to have more advanced tactile output than braille. For example graphical data would greatly benefit from being embossed on paper or 2D braille display and having input device (such as a touchpad or camera) which allows user to communicate to computer which parts of graphics he is interested in. Such interactive functionality should be left to expert handler completely. It means, that expert handler have to have interactive mode and a way for AT to turn this mode on. In such mode AT should provide a way to expert handler to produce speech and braille output via AT devices (use the same TTS engine or/and braille display)
Magnification Use Cases
Original Use Case Author: Neil Soiffer
The most obvious use of magnification is for rendering the entire content larger. For text-based (or more generally, font-based) applications, this means that Assistive Technology (AT) software should be able to request rendering with larger sized fonts or a certain amount of magnification relative to some baseline magnification. Applications beyond standard text-based ones include math, music, and labeled plots/graphics. For non text-based applications such as graphics and chemical structures, magnification could be based on a certain percentage of the normal size or given by "fill this area". I believe these two ideas can always be mapped onto each other. In all of these cases, the magnification may be due to having the entire documented magnified or it may be due to a request to magnify an individual instance (such as an equation).
There are two other uses for magnification:
- While navigating or speaking, it might be desirable to magnify the part being navigated/spoken to make it easier to see. For example, while playing some music, the current measure and next measure might be magnified to ease reading while leaving the rest unmagnified so that the amount of screen space used is minimized. There would also need to be a method to reset the magnification.
- Math and Chemical notation shrink fonts for superscripts and subscripts. In math, these are further reduced for nested scripts. On common feature for math renderers is to set a minimum font size. Typically, this is 50% of the base font size and corresponds to the size used for doubly nested scripts. It is potentially useful to allow the AT to control the maximum percent shrinkage used by renderers. Another possibility is to have a feature that says "don't shrink at all".
Although the rendering would not be consider high quality typesetting, it does make scripts more readable to those with some vision impairment.
Speech Use Cases for Expert Handlers
Original Case Use Author: Janina Sajka
Computer users who are blind or severely visually impaired often use assistive technology (AT) built around synthetic text to speech (TTS). These AT applications are commonly called "screen readers." Screen
reader users listen to a synthetic voice rendering of on screen content because they are physically unable to see this content on a computer display monitor.
Because synthetic voice rendering is intrinsically temporal, whereas on screen displays are (or can easily be made) static, various strategies are provided by screen readers to allow users to tightly control the alternative TTS rendering. Screen reader users often find it useful, for instance, to skim through content until a particular portion is located and then examine that portion in a more controlled manner, perhaps word by word or even character by rendered character. It is almost never useful to wait for a synthetic voice rendering that begins at the upper left of the screen and proceeds left to right, row by row, until it reaches the bottom because such a procedure is temporally inefficient, requiring the user to strain to hear just the portion desired in the midst of unsought content. Thus, screen readers provide mechanisms that allow the user to focus anywhere in the content and examine only that content which is of interest.
Screen readers have proven highly effective at providing their users access to content which is intrinsically textual and linear in nature. It is not hard to provide mechanisms to focus synthetic voice rendering paragraph by paragraph, sentence by sentence, word by word, or character by character.
Access to on screen widgets have also proven effective by rendering that static content in list form, where the user can pick from a menu of options using up and down arrow plus the enter key to indicate a selection, in liue of picking an icon on screen using a mouse.
Access to content arrayed in a table can also succeed by allowing the AT to simulate the process a sighted user employs to consider tables. In other words, mechanisms are provided to hear the contents of a cell and also the row and column labels for that cell (which define the cell's meaning).
Similar "smart" content rendering and navigation strategies are required by screen reader users in more complex, nonlinear content such as mathematical (chemical, biological, etc) expressions, music, and graphical renderings. Because such content is generally the province of knowledge domain experts and students, and not the domain of most computer users, screen readers do not invest the significant resources necessary to serve only a small portion of their customer base with specialized routines for such content. Furthermore, the general rendering and navigation strategies provided for linear (textual), menu, and tabular content are woefully insufficient to allow users to examine specific portions of such domain specific expressions effectively. On the other hand domain specific markup often does provide sufficient specificity so that the focus and rendering needs of the screen reader can be well supported.
In order to gain effective access to such domain specific content screen reader users require technology that can:
- Synthetically voice the expression in a logical order
- Allow the user to focus on particular, logical portions of expressions possibly at several layers of granularity
- Appropriately voice specialized symbols and symbolic expressions
Note: The Navigability section will need to address multiple levels of navigability. The following describes the most generic layer of navigability.
Assisstive Technology (AT) users need to be able to navigate within sub-components of documents containing specialized content, such as math, music or chemical markup. Typically these specialized components have content which needs to receive "focus" at different levels of granularity, e.g. a numerator within a numerator, an expression, a term, a bar of music, etc.
Within each level, functions are needed in response to AT commands to inspect and navigate to and from "items" (e.g., by word, bar, expression, clause, term, depending upon the type of content being expressed) for a particular level of granularity:
- previous/current/next item
- all items with user-defined characteristics
- all items in a author-defined category
- first/last item on a line
- first/last item within next higher or lower level of granularity
- first/last item in the document
There are two scenarios to consider, a read-only scenario and a scenario where the user is editing the document.
There are three system components that need to interact: the user agent, e.g. a browser, the AT, and the plugin/handler.
In the read-only case, the AT responds to some sort of "Point of Regard" change event and depending on the "role" of the object which received focus, the AT fetches accessibility information pertinent to that role and then formats/outputs a response tailored to an AT user, e.g. TTS/Braille. In the case of specialized content, a handler needs to be used by the AT because the AT doesn't know how to deal with such specialized content directly.
In order to meaningfully interact with the specialized content, the user needs to be able to execute the following actions:
- change level of granularity up/down
- read all from top
- read all from Point of Regard (POR)
- goto and read first/last item on the current line
- goto and read first/last item within the next less/more granular item
- goto and read first/last item in the document
- goto and read previous/current/next item
In the case of editable content there may also be a desire to have separate cursors, e.g. one to remain at the POR (the caret, if editing), and one to move around for review purposes.
The AT will already have UI input commands for most of the above functions, but probably not for changing to higher/lower levels of granularity. Let's assume ATs add that and in response the AT would call the handler to change the mode of granularity. The AT will handle the UI commands and in turn call the handler to return an item at the current level of granularity. The AT would have told the handler about the output mode, e.g. Braille or TTS. Armed with those three things: level of granularity, mode of output, and which item (first, last, previous, current, next), the handler knows what to do.
In the case of editable content, the UA provides the input UI for the
user. This editing capability would most likely be provided via a plugin. We need an example of such a plugin so we can evaluate what a11y features need to be added to the existing editors.
Issues, Questions and Concerns
- braille - capitalized "b" or lower-case "b"?
- in what order should the use cases be listed?
- are speech input and speech output a single use case?
- Are there terms in mathematics, for example, that can be used to define each level of granularity? If not is it sufficient to just increment/decrement the level? (question retained from Pete Brunet's original draft)
- How does one distinguish between identical characters used in a specific specialized markup dialect for multiple purposes? How does a human communicate the difference to a machine, verbally and non-verbally? An inquiry into the granularity levels of each major discipline for which a markup language has been defined will have to be made.
- Is it possible to broaden/generalize discipline-specific conventions of granularity? is it possible to use the International Scientific Vocabulary (ISV) -- the successor to Interlingua -- for this purpose? how widely used is ISV in educational and research settings?