Smart Device Aural Interface and Hardware Design

Overview

The smart device began as an effort to design an interface for a multimodal personal digital assistant. Given the state of the local world, the user's to-do list, calendar, statistically inferred habits, etc., what assistance and information could an always-on, personally-attached device provide and in what contexts?

The input methods and hardware details were purposefully left abstract and “magic.” Persistent network connectivity was presumed. Location awareness was presumed. Omnipresence in the user's life and daily activities was presumed. While data input is a nontrivial exercise, it was finally presumed that the user would have an efficient and worthwhile method of entering, maintaining and sharing their to-do lists, calendar data, etc., whether using existing software and plugins or using yet-to-be-developed infrastructures. The interface concern was entirely on the presentation of the data and the interaction with this on-the-fly assistance.

A large potential feature set was initially brainstormed, including travel assistance, proximity-based reminders, mood-sensing music playing, etc. To narrow down the problem domain, a step-by-step analysis of every event and action over the course of two or more days was undertaken by each member of the group. We slept, awoke, ate, worked, attended classes, met with friends, etc. Common situations spanning both days and individuals were distilled and used as the basis for a feature set. The final user interface was tailored specifically for this feature set.

Commonalities

G. is a twenty-two year old male computer science major. He is taking twelve credit hours, works part-time and lives off-campus, but within walking distance. His days were dominated by classes, work and socializing.

E. is a twenty year old female computer science major. She is taking eighteen credit hours, does not work and lives off-campus, but within walking distance. Her days were dominated by classes and meeting with classmates to work on projects.

B. is a twenty-five year old geography major. He is taking twelve credit hours, works part-time and lives off-campus, necessitating use of a campus shuttle or car. His days were dominated by classes, work and driving between both.

All three members of the group wear glasses, but not all of them wear them regularly. All three members of the group carry a cell phone, but none of them check it consistently. All three members of the group own portable digital music players, but none of them carry them consistently. Only G. wears a watch.

The most common situations of conflict within each person's schedules revolved around reminders and travel. All three members of the group overslept at one point or another, indicating traditional alarms failed them. All three members of the group forgot a due date, an assignment, a meeting or to-do item, indicating traditional planning techniques failed them. All three members of the group functioned inefficiently when traveling, either getting stuck in traffic, doubling back due to a forgotten event or being ill-attired for the weather. Therefore, we chose reminders and travel as our areas of focus.

Hardware

As none of the group members used existing technology consistently, we were unable to piggyback the UI on an existing modality. That is, while we all have the potential to use existing graphical displays to assist us with these scenarios – smart watches, cell phone applications, PDAs – we have elected not to, indicating that these devices are lacking in convenience, usability or portability.

Specifically, irregular wearing of glasses ruled out a heads-up display. Two group members do not wear watches, ruling out a wrist-based display. Inconsistent cell phone checking ruled out a phone-based application. No group members used a PDA, and only one carried a pad and pen with any frequency, ruling out that form factor as well. Reminders and travel being our focus areas meant we could not rely on the presence of a computer monitor, television or display surface such as a mirror or wall; reminders would have to be able to be received anywhere and with more reliability than a cell phone turned to silent and stuck in a pocket, and travel information would need to be presented while driving.

These requirements all but eliminated graphical, display-based solutions. A new device, carried on the person at all times, with a new interface style was required. We decided on an aural interface: a display can be ignored, packed in a bag or moved out of your field of view, but someone whispering in your ear cannot.

Aural interfaces are increasing in their public presence. White iPod earbuds are all but ubiquitous. Bluetooth cell phone headsets are almost common. However, both of these solutions block outside noise, which is rude at best and dangerous at worst. We settled on a somewhat obscure solution: bone conduction.

Bone conduction headphones transmit sound by vibrating through your skull instead of your ear canal. They do not block outside noise, only add to it. For this reason, they are often used by military and police. They are also used for hearing aids when the ear canal is damaged. The bones beneath your temple, behind your ear and the back of your head are common locations.

As nominal pressure needs to be maintained to keep the headset pressed against the head, even monaural headsets traditionally have bands that wrap around the head. There were both fashion and hair-related reasons to discount this method, and a stereo system was also discounted for fashion reasons. We settled on a monaural device to sit on the temple, with a spring-loaded ear hook to maintain pressure, of average size when compared to existing higher-end cell phone headsets, approximately 3.5cm by 2.3cm.

Finally, given the variety of usage scenarios, an external buzzer is also present for when the device must function as an alarm without being worn.

Interface

With an aural interface, we wanted the device to prompt us unobtrusively, starting with a chime or vibration. As with a display, we need to pay attention to understand it, so it only speaks when we acknowledge the prompt. Because speaking can be slower than reading, we also split the event notifications into the basic event and additional details, such as travel directions and other instructions.

The need for travel information, such as weather or traffic details, most often came into play as part of another appointment reminder. Splitting the events into two parts allowed us to offload the the travel data into the appointment itself, constructing it as part of the user's existing calendaring system, and removing the need for a special UI for conveying it.

Three behaviors were identified for dealing with reminders: acknowledgment (“yes,” “okay”), postponement (“snooze,” “remind me later”) and a request for additional information (“more,” “details”). In addition, given the timescale of using the device throughout the day, the need to be able to skim through past and future events was also identified (“rewind,” “fast forward”).

A minimum of input was paramount. Acknowledging events, the most common action, is done with a single button press. Two consecutive presses signals a request for additional information. Three consecutive presses signals postponement. Button presses are cyclic in case of an errant press, so a fourth tap would be acknowledgment again.

However, fast-forward and rewind do not naturally map onto a single button. The use of a momentary rocker switch (on-off-on) was decided upon, such that the user would have to hold down the appropriate button for at least half a second to initiate fast forwarding or rewinding through events. Taps of less than half a second in duration, within less than half a second each, on either button, service the other input events.

In addition, bone conduction works for audio input as well. With a limited control vocabulary of three commands, the device also supports speaking the three trained command words at any point between reminder chimes or vibrations. Fast forward and rewind still require physically manipulating the device.

Finally, the expected usage for the device calls for it to be donned at the beginning of the day, and removed at the very end of the day, hinting at a bedside cradle. To serve as an alarm clock, this bedside charger and dock would also contain a speaker and large rocker switch.

Additional Development

A collaboration between two Computer Science students and myself, much additional work was done and presented in an extensive paper. Various styles of interaction were analyzed and selected for use by the device. The nature of the audience was considered along with fashion, cost, compatibility and network effects. Potential users were considered and thirteen scenarios were developed and tested using physical and software mockups.

Vitorio Miliano
December 6, 2006