Various enterprises and personal interests, such as Man-Machine Interaction (MMI), gesture studies, signs, language, social robotics, healthcare, innovation, music, publications, etc.

Month: January 2007

eSign Editor Turns HamNoSys into SIGML for Signing Avatars

Have a seat and let me tell you the tale of Virtual Guido, the Signing Avatar, one step at a time.

A long time ago, in movie studios far away, Man first used Motion Capture to record Human movement. Having caught the movements, we could play with them. We could apply the movements to artificial shapes instead of the originals, and create talking heads, robots, clowns, dogs, anything we fancied.

Smeagol of the LOTR: high point of motion capture?

So, when a group of people started to work on sign language synthesis they first turned to motion capture. It was a project called Visicast. But the problem with motion capture is easy to see: you need to capture every sign to build sentences. And if signs are changed by their use in a context, and you cannot apply general rules to the captured data to account for that, then you must record signs in different contexts as well.

So, motion capture technology does make it possible to synthesize signing but not in a very intelligent way.

Now, this problem was solved previously in the development of speech synthesis by creating a phoneme-based synthesizer (instead of ‘word capture’ techniques). Actually most speech synthesizers used diphones or even triphones, but the point is this: A system was designed that could build speech utterances from a language’s collection of small building blocks. And then all you need to do to make a computer talk is to provide him with the text he must speak: text-to-speech.

But lest we forget: the best speech synthesizers out there are now combining this diphone method with complete recordings of words (comparable to motion capture). The one does not exclude the other.

Logically, for sign language synthesis the same route should be taken. So, what are the essential building blocks of a sign language? Now we enter muddy waters. Phonological systems have been proposed by the dozens for ASL, NGT and other sign languages. But most of those systems seem to underspecify the signs, if you compare them with spoken language phonology. Perhaps the most complete system of specification was offered by HamNoSys.

What is HamNoSys? The University of Hamburg has a good online explanation HamNoSys 3.0, their Hamburg Notation System for sign languages. The website about HamNoSys mentions that it is already at version HamNoSys 4.0. There is a document that describes the changes in version 4. The online explanation concerns version 3. The original specification of HamNoSys 2, that had a widespread following, is now only available from publisher Signum. It is not online anywhere as far as I can tell. (Prillwitz, Siegmund et al: HamNoSys. Version 2.0; Hamburg Notation System for Sign Languages. An introductory guide. (International Studies on Sign Language and Communication of the Deaf; 5) Hamburg : Signum 1989 – 46 p.)

Next, the signal needs to be produced. In other words, someone or something needs to visualize the signing. It cannot be created out of virtual thin air. Signing needs a virtual body with hands and such. Well, avatars and embodied conversational agents were well established by the year 2000. These virtual puppets could surely be made to sign?

Rea: fair avatress? (source)

Televirtual Ltd took the job and created avatars that could sign. They did so in eSign, the project that followed Visicast. There are a few signing avatars, most notably Virtual Guido, Tessa and Visio. These avatars are controlled by input that is like HamNoSys. But HamNoSys input is of course not the same as text input.

And this is where we are now. One basically has to be able to annotate signs in HamNoSys to operate the signing avatars. Of course, as time goes by, a lexicon of ready-made signs can be built, making life a little easier.

Finally, it had to be online of course. The Deaf community was simply screaming for an online signing avatar to get access to all sorts of important information. Right? Wrong!

The project fell badly, because no Deaf people were involved, because it was seen as an alternative to making recordings of signers which is much better (just less flexible), and finally because it was not very good. The signing was not very natural. Nevertheless they created a system for the web with a client-side agent (a plug-in for a browser) and the input was wrapped in webfriendly SIGML files.

What is SIGML: Signing Gesture Markup Language? The HamNoSys notation is the basis, but not the sole component of SIGML. The eSign editor creates SIGML that feeds the Avatar. It is this small XML text file that travels the internet from the servers to the signing avatar clients. This avoids having to download large video clips. The SIGML feeds the Avatar that resides on your PC.

Guido; hairy and scary? (source)

How does the story end? Well, I got the software (the HamNoSys editor and the Signing Avatar) from Inge Zwitserlood. She was one of the main people behind eSign here in the Netherlands. After some trying I think I will be able to put some examples up soon.

Inge also told me how the development of the avatar software has basically stopped. Any new requirements laid down concerning more natural signing are not worked on anymore. And as far as I know nobody is actually using Virtual Guido or his brothers and sisters on the web?

There is a website at that is about Guido and uses him as well. They even provide a free training to create signs and use them on the web. So where will all of this go from here? Can some other group take it from here? If you have such an ambition, call the people at Televirtual or contact me. I would love to see progress on Sign Language Synthesis.

Update 9 Sep 2007: After exchanging more info with dr. Zwitserlood it should be added that the success of eSign and Virtual Guido may have had a more complicated background than the one sketched above. Deaf people actually were involved in the UK, Holland and Germany and the scientists did not intend to provide an alternative to video recordings. This was perhaps not sufficiently clear or obvious to the larger Deaf audience (who have every right to be skeptical). Furthermore, the focus on a lifelike avatar instead of a simplified one may have misguided people’s expectations. The WFD 2005 promo movie also featured sign synthesis but was very well received, perhaps because the avatars resembled cartoon characters more.

Liwei Zhao and Laban Movement Analysis

Liwei Zhao (DBLP) defended his thesis in 2001 which was called Synthesis and Acquisition of Laban Movement Analysis Qualitative Parameters for Communicative Gestures (Penn Library, pdf). He studied under Norman Badler at the Center for Human Modeling and Simulation. Laban Movement Analysis (LMA) is used to annotate movement.

Rudolf Laban proposed a theory called Effort/Shape to describe human movement. LMA is used by dancers, athletes, physical and occupational therapists. Laban created Labanotation specifically for Dance notation. As such it makes use of and incorporates LMA. I was told not to confuse the two, so there you have it. A school called LIMS guards the precious LMA treasure. You need to be brainwashed study there to become a CMA (certified movement analyst).

The Effort of a movement consists of four factors, and it can be given two extreme values on that scale or no value. The graph (source) shows the way these values are indicated in a single figure: * Space: Direct / Indirect * Weight: Strong / Light * Time: Quick / Sustained * Flow: Bound / Free

Anyway, Liwei Zaho’s research there focused on interesting aspects of gesture technology. In his thesis he starts by quoting Kendon on the missing link of gesture studies: what makes a movement a gesture? Kendon wrote in 1980 about how one can identify gestures, and in his 2004 book it appears science did not make much progress in this area.

Unfortunately, Liwei Zhao’s thesis seems not to spend too much effort on that question either. He seems more focussed on getting the modeling of gestures right. And to that end Zhao studied if and how he could apply the theory of Laban Movement Analysis. Zhao and Badler worked on both the synthesis of (qualities of) gestures and on the automatic recognition of (qualities of) gestures.

Trajectories of motion styles with varying Time factors: left=sustained; middle=neutral; right=quick (source).

Zhao reports that he was fairly well able to automatically extract the right Effort from movements that were made by an actor. Using motion capture was more reliable than using cameras.

Page 2 of 2

Powered by WordPress & Theme by Anders Norén