A Nice Gesture by Jeroen Arendsen

Various personal interests and public info, gesture, signs, language, social robotics, healthcare, innovation, music, publications, etc.

Category: Sign Language Synthesis

In Love with SiSi

A wonderful bit of news has been hitting the headlines:

BBC News: Technique links words to signing: Technology that translates spoken or written words into British Sign Language (BSL) has been developed by researchers at IBM. The system, called SiSi (Say It Sign It) was created by a group of students in the UK. SiSi will enable deaf people to have simultaneous sign language interpretations of meetings and presentations. It uses speech recognition to animate a digital character or avatar.
IBM says its technology will allow for interpretation in situations where a human interpreter is not available. It could also be used to provide automatic signing for television, radio and telephone calls.

Read the full story at IBM: IBM Research Demonstrates Innovative ‘Speech to Sign Language’ Translation System


Demo or scripted scenario?

Serendipity. Just this week a man called Thomas Stone inquired whether he could get access to the signing avatars of the eSign project. I passed him on to Inge Zwitserlood. She first passed him on to the eSign coordinator at Hamburg University, which was a dead end. Finally, he was pointed to the University of East Anglia, to John Glauert. And who is the man behind the sign synthesis in SiSi?

From the press release from IBM:

John Glauert, Professor of Computing Sciences, UEA, said: “SiSi is an exciting application of UEA’s avatar signing technology that promises to give deaf people access to sign language services in many new circumstances.”
This project is an example of IBM’s collaboration with non-commercial organisations on worthy social and business projects. The signing avatars and the award-winning technology for animating sign language from a special gesture notation were developed by the University of East Anglia and the database of signs was developed by RNID (Royal National Institute for Deaf People).

Well done professor Glauert, thank you for keeping the dream alive.

Now for some criticism: the technology is not very advanced yet. It is not at a level where I think it is wise to make promises about useful applications. The signing is not very natural and I think much still needs to be done to achieve of basic level of acceptability for users. But it is good to see that the RNID is on board, although they choose their words of praise carefully.

It is amazing how a nice technology story gets so much media attention so quickly. Essentially these students have just linked a speech recognition module to a sign synthesis module. The inherent problems with machine translation (between any two languages) is not even discussed. And speech recognition only works under very limited conditions and produces limited results.

IBM says: “This type of solution has the potential in the future to enable a person giving a presentation in business or education to have a digital character projected behind them signing what they are saying. This would complement the existing provision, allowing for situations where a sign language interpreter is not available in person”.

First, speech recognition is incredibly poor in a live event like a business presentation (just think of interruptions, sentences being rephrased, all the gesturing that is linked to the speech, etc.) and second, the idea that it will be (almost) as good as an interpreter is ludicrous for at least the next 50 years. The suggestion alone will probably be enough to put off some Deaf people. They might (rightly?) see it as a way for hearing people to try to avoid the costs of good interpreters.

I think the media just fell in love at first sight with the signing avatar and the promises it makes. I also love SiSi, but as I would like to say to her and to all the avatars I’ve loved before: My love is not unconditional. If you hear what I say, will you show me a sign?

XV Congreso Mundial de la WFD

The World Federation of the Deaf will be hosting their 15th World Congress in July 2007, Madrid.

Logo of the congress

The World Federation of the Deaf (WFD) is an international non-governmental organisation comprising national associations of Deaf people. It watches over the interests of more than 74 million Deaf people worldwide -more than 80% of them live in developing countries. The WFD was founded in 1951, during the First World Congress of the WFD, held in Rome. Such an early date makes the WFD one of the oldest international disability organisations in the world. Currently, the WFD has a membership of 127 national associations from the five continents.

Here is a very nice video with avatars signing what I think is LSE (Lengua de signos o señas española, or Spanish Sign Language). Whoever made these animations did a very good job. Both manual and non-manual features are synthesized quite nicely. The message is about the (apparently succesfull) Spanish candidacy for the 2007 congress.

iCommunicator solves nothing at $6499?

,,Well Jim, good to see you and what have you got for us today?” ,,Same here, John, and I can tell you I have something really amazing, just watch this!”

It listens, it types, it signs, it speaks, “iCommunicator is the finest software ever developed for people who are deaf or hard of hearing”

Here is some ‘honest advice’ from EnableMart:

Training to Ensure Positive Outcomes. … Systematic professional training is strongly encouraged to maximize use of the unique features… The end user must be completely trained … to achieve positive outcomes. Managers of the system should … provide training for both end users and speakers … Additional time may be required to customize … Contact EnableMart for information about professional training opportunities. 

This seems at first glance a fair bit of warning before you spend $6499 on an iCommunicator 5.0 kit. However, EnableMart sells the advised training for an additional $125 an hour, it is not free. I think this entire thing is a bit suspicious. I have worked with speech recognition, inlcuding the Dragon NaturallySpeaking, and it makes recognition errors (period). I have also fooled around with or seen most sign synthesis technology available today, and it is far from natural. And the same is true for speech recognition.

These technologies have yet to make good on their promises. If you ignore actual user experiences you can imagine it will solve many communication problems. But in practice, little errors cause big frustrations. Using speech recognition can be very tiring and irritating. It only works if the entire interaction is designed well and the benefits outweigh the cost.

Just imagine you are a deaf person using this iCommunicator with some teacher and a simple speech recognition error occurs: How is that error handled? Usually, when a speaker dictates to Dragon NaturallySpeaking he will spot the error and correct it. In this case your teacher will not spot the error (assuming he doesn’t monitor your screen) and the dialogue will continue with the error in place (unless there is enough context for you to spot the error and understand what the speaker actually said). A second problem is that you have to persuade people to wear your microphone to enter into a conversation with you. In a weird and cynical way you are asking them to suffer the same techno-torture as you. Not something you want to do more than twice a day, I imagine. And only with people whose affection you can afford to lose. The sign synthesis is fairly straightforward sign concatenation. A dictionary of 30.000 signs is accessed to get a video for every word. The videos are then played one by one, without any further sentence prosody. That means it looks terrible, like a gun firing signs at you. It also means it does not sign ASL, but signed English at best. Good enough, you might say, but I think the benefit of artificial signed English over typed text is not big. So, the signing is pretty much worthless. Jim the tell-sell guy further claims you can use it to improve your speaking. I do not believe speech recognition technology can give the proper feedback to improve articulation difficulties. It may be able to judge whether you pronounced something correctly (or at least similar to what it knows), but that’s about it. Although there is something in the specs about pronunciation keys, the video doesn’t show details. Well, I simply do not think a computer can reliably tell you what sort of error you made. So what does that leave? You can type text and your iCommunicator reads it out loud with text-to-speech. You can get that sort of software for the price of a cheap dinner from any of these sites.

Finally, the iCommunicator v5.0 lets you search for a word on Google with a single click. That’s pretty neat I admit. If you also think that that is worth a couple of thousand dollars, please contact me. I can supply an iBrowser v6.1 for only $2999, and will supply the necessary training for free. What the hell, I’ll even throw in a professional designer microphone v7.2 🙂 Unfortunately, the business case of the iCommunicator may actually rest on sales to hearing people who wish to reduce or entirely avoid the cost of interpreters:

HighBeam Encyclopedia: …The iCommunicator also enables government workers to provide equal access to information and services to the hearing impaired in compliance with the Americans with Disabilities Act and Section 508… 

Sometimes, you can only hope the future will prove you wrong.

Bristol has best signing on web

I came across a bundle of websites that is like a city full of British Sign Language (BSL). And it all seems to come from the University of Bristol, Centre for Deaf Studies (CDS). They managed to create a great signing experience in their site, in just about every page you care to look at. If you are thinking about using (video’s of) sign language on your website, you simply must check these out:

Deafresource: To find out about the Deaf community sign language and Deaf studies. Deafresource

Deafstation: News and information service for BSL users. Deafstation provides a daily news programme in BSL. Deafstation

Signstation: Learn about British Sign Language (BSL)? Signstation has video, interactive exercises, pictures, graphics and explanations about BSL and Deaf people at work. Signstation I would have to check how they did everything, but it is clear that a team of people with good technological skills (flash, shockwave, scripting, streaming video, etc) and good web design skills is working on these sites. They did require me to download and install a more recent shockwave player, but I guess I was due for upgrading and it’s free anyway 🙂

For any community of Deaf people sites like these can be extremely valuable, is my belief. The power of the internet can be put to use to share and enjoy without interference (by the hearing cultural majority or otherwise). But it takes skill to get the websites suited to a primarily signing audience. I think the CDS did a good job here, and I wouldn’t be surprised if these sites are already playing a large role in UK Deaf society.

Please put aside any cultural integration reservations for the length of this post and while browsing through the links. I once attended a lecture from one of Bristol’s senior researchers, Paddy Ladd, at the International Conference for the Education of the Deaf in Maastricht, 2005. It was the most memorable event of the entire conference. Mr. Ladd unleashed his anger, frustration and fears on the unsuspecting audience. He was angry about the history of abuse associated to the conference in question (at the first edition in 1880 deaf education was all but limited to oral methods for almost a century). He feared that developments in cochlear implementation would cause doctors and people in deaf education to regress into methods treating deafness as a curable medical condition. And he warned against cultural genocide taking place should sign language teaching suffer, even going so far as comparing the withholding of sign language as primary language from CI kids to child abduction. It was a great speech, arousing much passion and support from Deaf attendees. Next day the British association for the education of the Deaf (or something like that) distanced itself from Paddy Ladd’s views. His speech dominated discussions afterward.

It is not my place to take part in discussions on educational methods or the impact of CI, nor am I an expert judge on the value of Deaf sign language subculture in comparison to a potentially fuller social integration of individuals. But I will say that these websites constitute a fine part of the world wide web. They are examples of how you can make video work on the web. It would be a shame if they were exterminated.

Update 8 Mar ’07: I was so impressed I forgot the news sparking my interest: CDS launched the worlds first sign language dictionary for mobile phones at www.mobilesign.org. Again it is a very well designed service. Two important things: it is simple enough to use with a small screen, and download size of (video) files is fairly small (and thus affordable in pay-per-bit environment like the mobile internet). Unfortunately, prior scientific work on how to reduce bit rates without harming sign language understandability were not applied as far as I can tell:
* Cavender et al. (2005). MobileASL: Intelligibility of Sign Language Video as Constrained by Mobile Phone Technology (pdf); * Parish et al. (1990). Intelligent Temporal Subsampling of American Sign Language Using Event Boundaries (pdf)
* Sperling et al. (1985). Intelligible encoding of ASL image sequences at extremely low information rates. (Might have been used? ACM, DOI). ‘

Update 9 Mar ’07: Tim Tolkt is pretty good at integrating sign video’s (in flash) as well.

SAMSARA! I mean… samsaraah..

Or: Samsara, the Amazing Motion-capture of Signing for Avatars that are Really Angry. Alexis Heloir wrote me an email the other day about a project called Samsara. The acronym is quite amazing and telling: Synthesis and Analysis of Motions for Simulation and Animation of Realistic Agents. The demo videos of the SIGNE (sub)project are of a signing avatar, or of a naked skeleton with hands. The special thing is that they make the puppet sign in an angry way or as if he were somewhat tired. And this they do by applying rules to the motion-captured data. A neat trick, if they really manage it. Check out the videos and see for yourself whether they got it right.

Personally, I did not find it very convincing (sorry Alexis). But being told that I was supposed to see an angry signer helped though :-). I guess the emotion is there if you are willing to see it, or if the context (speech, situation) matches it. To be fair, I am not very convinced that emotions are easily read from people’s gestures at all, even if Frank Pollick says so. Making a recognizable movie of a neatly enacted emotion is something else as being able to do live recognition of someone’s emotional state from his behavior.

But perhaps creating a gesturing avatar that matches the emotional state of the moment (as dictated by situation or speech) is enough. Maybe all that is possible is all that is necessary. Do we ever need to rely solely on an avatar’s gestures to project emotions? A frantically waving guy in the distance? Too small to see his face, too far away to hear him, no idea what he wants. Why is he waving? Is he mad at us or warning us? Is he in need of assistance? If the guy were human I would be wondering as well.

The Signing Skeleton of Samsara

“Hi” – “Wait ’till I get you” – “Help!” – “Watch out!”

The members of Samsara are Sylvie Gibet, Jean-Francois Kamp, Nicolas Courty, Alexis Heloir, Alexandre Bouënard, and Charly Awad. Together these researchers have an impression collection of publications on the topic, I must say. The work is done at VALORIA Research Laboratory in Computer Science, Brittany, France. They also seem to have a connection to the wonderful Elephants Dream project, which is an open source animated movie.

Elephants Dream Guy

What emotional state is visible from this gesture? (source)

Ps. According to wikipedia the word Samsara can refer to many things, most importantly to the religious concept of reincarnation in Hinduism. I wonder if that plays a part in the name for this project: the reincarnation of signs, that come to life after being motion captured and lying dead as data on a disk somewhere.

Demo Video Visicast

As an illustration of the use of sign language synthesis this video was made for the Visicast project:

(More videos are available on the Visicast site)

It is a combination of speech recognition, automatic translation and sign language synthesis. The avatar is Tessa. Unfortunately, it does not work the other way. So, if the signer needs to ask a question it gets a bit difficult. But then again, it is better than nothing. As these things go, it is but a demo video. I believe no such system exists at the moment.

eSign Editor Turns HamNoSys into SIGML for Signing Avatars

Have a seat and let me tell you the tale of Virtual Guido, the Signing Avatar, one step at a time.

A long time ago, in movie studios far away, Man first used Motion Capture to record Human movement. Having caught the movements, we could play with them. We could apply the movements to artificial shapes instead of the originals, and create talking heads, robots, clowns, dogs, anything we fancied.

Smeagol of the LOTR: high point of motion capture?

So, when a group of people started to work on sign language synthesis they first turned to motion capture. It was a project called Visicast. But the problem with motion capture is easy to see: you need to capture every sign to build sentences. And if signs are changed by their use in a context, and you cannot apply general rules to the captured data to account for that, then you must record signs in different contexts as well.

So, motion capture technology does make it possible to synthesize signing but not in a very intelligent way.

Now, this problem was solved previously in the development of speech synthesis by creating a phoneme-based synthesizer (instead of ‘word capture’ techniques). Actually most speech synthesizers used diphones or even triphones, but the point is this: A system was designed that could build speech utterances from a language’s collection of small building blocks. And then all you need to do to make a computer talk is to provide him with the text he must speak: text-to-speech.

But lest we forget: the best speech synthesizers out there are now combining this diphone method with complete recordings of words (comparable to motion capture). The one does not exclude the other.

Logically, for sign language synthesis the same route should be taken. So, what are the essential building blocks of a sign language? Now we enter muddy waters. Phonological systems have been proposed by the dozens for ASL, NGT and other sign languages. But most of those systems seem to underspecify the signs, if you compare them with spoken language phonology. Perhaps the most complete system of specification was offered by HamNoSys.

What is HamNoSys? The University of Hamburg has a good online explanation HamNoSys 3.0, their Hamburg Notation System for sign languages. The website about HamNoSys mentions that it is already at version HamNoSys 4.0. There is a document that describes the changes in version 4. The online explanation concerns version 3. The original specification of HamNoSys 2, that had a widespread following, is now only available from publisher Signum. It is not online anywhere as far as I can tell. (Prillwitz, Siegmund et al: HamNoSys. Version 2.0; Hamburg Notation System for Sign Languages. An introductory guide. (International Studies on Sign Language and Communication of the Deaf; 5) Hamburg : Signum 1989 – 46 p.)

Next, the signal needs to be produced. In other words, someone or something needs to visualize the signing. It cannot be created out of virtual thin air. Signing needs a virtual body with hands and such. Well, avatars and embodied conversational agents were well established by the year 2000. These virtual puppets could surely be made to sign?

Rea: fair avatress? (source)

Televirtual Ltd took the job and created avatars that could sign. They did so in eSign, the project that followed Visicast. There are a few signing avatars, most notably Virtual Guido, Tessa and Visio. These avatars are controlled by input that is like HamNoSys. But HamNoSys input is of course not the same as text input.

And this is where we are now. One basically has to be able to annotate signs in HamNoSys to operate the signing avatars. Of course, as time goes by, a lexicon of ready-made signs can be built, making life a little easier.

Finally, it had to be online of course. The Deaf community was simply screaming for an online signing avatar to get access to all sorts of important information. Right? Wrong!

The project fell badly, because no Deaf people were involved, because it was seen as an alternative to making recordings of signers which is much better (just less flexible), and finally because it was not very good. The signing was not very natural. Nevertheless they created a system for the web with a client-side agent (a plug-in for a browser) and the input was wrapped in webfriendly SIGML files.

What is SIGML: Signing Gesture Markup Language? The HamNoSys notation is the basis, but not the sole component of SIGML. The eSign editor creates SIGML that feeds the Avatar. It is this small XML text file that travels the internet from the servers to the signing avatar clients. This avoids having to download large video clips. The SIGML feeds the Avatar that resides on your PC.

Guido; hairy and scary? (source)

How does the story end? Well, I got the software (the HamNoSys editor and the Signing Avatar) from Inge Zwitserlood. She was one of the main people behind eSign here in the Netherlands. After some trying I think I will be able to put some examples up soon.

Inge also told me how the development of the avatar software has basically stopped. Any new requirements laid down concerning more natural signing are not worked on anymore. And as far as I know nobody is actually using Virtual Guido or his brothers and sisters on the web?

There is a website at Gebarennet.nl that is about Guido and uses him as well. They even provide a free training to create signs and use them on the web. So where will all of this go from here? Can some other group take it from here? If you have such an ambition, call the people at Televirtual or contact me. I would love to see progress on Sign Language Synthesis.

Update 9 Sep 2007: After exchanging more info with dr. Zwitserlood it should be added that the success of eSign and Virtual Guido may have had a more complicated background than the one sketched above. Deaf people actually were involved in the UK, Holland and Germany and the scientists did not intend to provide an alternative to video recordings. This was perhaps not sufficiently clear or obvious to the larger Deaf audience (who have every right to be skeptical). Furthermore, the focus on a lifelike avatar instead of a simplified one may have misguided people’s expectations. The WFD 2005 promo movie also featured sign synthesis but was very well received, perhaps because the avatars resembled cartoon characters more.

Powered by WordPress & Theme by Anders Norén