At the renowned Fraunhofer institute they may have built a killer gesture app: Gesture control that lets surgeons control a 3D-display of a head (for example) during surgery while remaining sterile (touching buttons would break sterility, I guess).

Rotate the 3D image by gesturing (source)

Press Release: Non-contact image control

As if by magic, the three-dimensional CAT scan image rotates before the physician’s eyes – merely by pointing a finger. This form of non-contact control is ideal in an operating room, where it can deliver useful information without compromising the sterile work environment.

The physician leans back in a chair and studies the three-dimensional image floating before his eyes. After a little reflection, he raises a finger and points at a virtual button, likewise floating in the air. At the physician’s command, the CAT scan image rotates from right to left or up and down – precisely following the movement of his finger. In this way, he can easily detect any irregularities in the tissue structure. With another gesture, he can click on to the next image. Later, in the operating room, the surgeon can continue to refer to the scanner images. Using gesture control to rotate the images, he can look at the scan of the patient’s organs from the same perspective as he sees them on the operating table. There is no risk of contaminating his sterile gloves, because there is no mouse or keyboard involved.

But how does the system know which way the finger is pointing? “There are two cameras installed above the display that projects the three-dimensional image,” explains Wolfgang Schlaak, who heads the department at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut HHI in Berlin that developed the display. “Since each camera sees the pointing finger from a different angle, image processing software can then identify its exact position in space.” The cameras record one hundred frames per minute. A third camera, integrated in the frame of the display, scans the user’s face and eyes at the same frequency. The associated software immediately identifies the inclination of the person’s head and the direction in which the eyes are focused, and generates the appropriate pair of stereoscopic images, one for the left eye and one for the right. If the person moves their head a couple of inches to the side, the system instantly adapts the images. “In this way, the user always sees a high-quality three-dimensional image on the display, even while moving about. This is essential in an operating theater, and allows the physician to act naturally when carrying out routine tasks,” says Schlaak. “The unique feature of this system is that it combines a 3-D display screen with a non-contact user interface.” The three-dimensional display costs significantly less than conventional 3-D screens of comparable quality. Schlaak is convinced that “this makes our gesture-controlled 3-D display an affordable option even for smaller medical practices.” The research team will be presenting its prototype at the MEDICA trade fair from November 14 to 17, 2007, in Düsseldorf (Hall 16, Stand D55). Schlaak hopes to be able to commercialize the system within a year or so.

Things like this are probably the best bet for the near future of gesture recognition. Niche applications that exploit some specific benefit of using gestures instead of (or besides) other, more mundane interface technology. The biggest hit in gesture land is without a doubt the Nintendo Wii, which exploits another unique selling point of gestures: a higher (or more representative) physical involvement leading to a better ‘experience’ of a game. It specifically targets gamers who are interested in fun and exercise in a social context. I doubt that hardcore gamers, intent on getting to higher levels of killing sprees, will be very keen on the Wii.

And so it will probably remain for the near future. Like with speech recognition, gesture recognition will have to find some nice niches to live in and multiply. Maybe one day, the general conditions will change (ubiquitous camera viewpoints? intention-aware machines?) and gesture can become the dominant form of HCI, driving buttons to niche applications. I wouldn’t bet on it right now, though.