Have a seat and let me tell you the tale of Virtual Guido, the Signing Avatar, one step at a time.
A long time ago, in movie studios far away, Man first used Motion Capture to record Human movement. Having caught the movements, we could play with them. We could apply the movements to artificial shapes instead of the originals, and create talking heads, robots, clowns, dogs, anything we fancied.
Smeagol of the LOTR: high point of motion capture?
So, when a group of people started to work on sign language synthesis they first turned to motion capture. It was a project called Visicast. But the problem with motion capture is easy to see: you need to capture every sign to build sentences. And if signs are changed by their use in a context, and you cannot apply general rules to the captured data to account for that, then you must record signs in different contexts as well.
So, motion capture technology does make it possible to synthesize signing but not in a very intelligent way.
Now, this problem was solved previously in the development of speech synthesis by creating a phoneme-based synthesizer (instead of ‘word capture’ techniques). Actually most speech synthesizers used diphones or even triphones, but the point is this: A system was designed that could build speech utterances from a language’s collection of small building blocks. And then all you need to do to make a computer talk is to provide him with the text he must speak: text-to-speech.
But lest we forget: the best speech synthesizers out there are now combining this diphone method with complete recordings of words (comparable to motion capture). The one does not exclude the other.
Logically, for sign language synthesis the same route should be taken. So, what are the essential building blocks of a sign language? Now we enter muddy waters. Phonological systems have been proposed by the dozens for ASL, NGT and other sign languages. But most of those systems seem to underspecify the signs, if you compare them with spoken language phonology. Perhaps the most complete system of specification was offered by HamNoSys.
What is HamNoSys? The University of Hamburg has a good online explanation HamNoSys 3.0, their Hamburg Notation System for sign languages. The website about HamNoSys mentions that it is already at version HamNoSys 4.0. There is a document that describes the changes in version 4. The online explanation concerns version 3. The original specification of HamNoSys 2, that had a widespread following, is now only available from publisher Signum. It is not online anywhere as far as I can tell. (Prillwitz, Siegmund et al: HamNoSys. Version 2.0; Hamburg Notation System for Sign Languages. An introductory guide. (International Studies on Sign Language and Communication of the Deaf; 5) Hamburg : Signum 1989 – 46 p.)
Next, the signal needs to be produced. In other words, someone or something needs to visualize the signing. It cannot be created out of virtual thin air. Signing needs a virtual body with hands and such. Well, avatars and embodied conversational agents were well established by the year 2000. These virtual puppets could surely be made to sign?
Rea: fair avatress? (source)
Televirtual Ltd took the job and created avatars that could sign. They did so in eSign, the project that followed Visicast. There are a few signing avatars, most notably Virtual Guido, Tessa and Visio. These avatars are controlled by input that is like HamNoSys. But HamNoSys input is of course not the same as text input.
And this is where we are now. One basically has to be able to annotate signs in HamNoSys to operate the signing avatars. Of course, as time goes by, a lexicon of ready-made signs can be built, making life a little easier.
Finally, it had to be online of course. The Deaf community was simply screaming for an online signing avatar to get access to all sorts of important information. Right? Wrong!
The project fell badly, because no Deaf people were involved, because it was seen as an alternative to making recordings of signers which is much better (just less flexible), and finally because it was not very good. The signing was not very natural. Nevertheless they created a system for the web with a client-side agent (a plug-in for a browser) and the input was wrapped in webfriendly SIGML files.
What is SIGML: Signing Gesture Markup Language? The HamNoSys notation is the basis, but not the sole component of SIGML. The eSign editor creates SIGML that feeds the Avatar. It is this small XML text file that travels the internet from the servers to the signing avatar clients. This avoids having to download large video clips. The SIGML feeds the Avatar that resides on your PC.
Guido; hairy and scary? (source)
How does the story end? Well, I got the software (the HamNoSys editor and the Signing Avatar) from Inge Zwitserlood. She was one of the main people behind eSign here in the Netherlands. After some trying I think I will be able to put some examples up soon.
Inge also told me how the development of the avatar software has basically stopped. Any new requirements laid down concerning more natural signing are not worked on anymore. And as far as I know nobody is actually using Virtual Guido or his brothers and sisters on the web?
There is a website at Gebarennet.nl that is about Guido and uses him as well. They even provide a free training to create signs and use them on the web. So where will all of this go from here? Can some other group take it from here? If you have such an ambition, call the people at Televirtual or contact me. I would love to see progress on Sign Language Synthesis.
Update 9 Sep 2007: After exchanging more info with dr. Zwitserlood it should be added that the success of eSign and Virtual Guido may have had a more complicated background than the one sketched above. Deaf people actually were involved in the UK, Holland and Germany and the scientists did not intend to provide an alternative to video recordings. This was perhaps not sufficiently clear or obvious to the larger Deaf audience (who have every right to be skeptical). Furthermore, the focus on a lifelike avatar instead of a simplified one may have misguided people’s expectations. The WFD 2005 promo movie also featured sign synthesis but was very well received, perhaps because the avatars resembled cartoon characters more.