INTEROB

Project
Context
Objectives
Presentation
Justification
Diagram
Results
Impact
Management
Resources
Team
Final Report
Dissemination

2. PRESENTATION OF NATIONAL AND INTERNATIONAL CONTEXT WITHIN THE MENTIONED THEMATIC FIELDS:

Human-Computer Interaction or HCI aims at the development of interactive systems to increase the level of user interaction experience. The project has as goal achieving a natural gesture interaction system.  The objective is not only to aquire human motion but to interpret, segment and identify semantically semnificative gestures. The gesture acquisition technology is video information only that comes from two video cameras (aiming for stereoscopy view and anlysis).

Gesture recognition is a complex research problem that includes specific aspects:

  • psycholingustic studies
  • gesture acquisition (where technological differences appear – magnetic, mechanical, acoustic, inertial, video based, hybrid devices, etc.)
  • gesture posture recognition as part of pattern recognition and machine learning
  • analysis and recognition of the gesture trajectory (gesture being analysed in its 2D/3D dynamics)

All these aspects imply fundamental research, development of new theories and high level edge technological concepts that will allow for quick and efficient transposal into real applications of the designe system.

2.1. Psycolinguistic views on human gestures

Gestures represent movements of hands, arms and head which express idea, sentiment, and intention, sometimes replacing words and enhancing speech [DEX'98]. Gestures convey information and are accompanied by content and semnatics.

Various psycolingustic studies have been conducted in what concerns the understanding of gesture communication. All these studies provide an excellent starting material for the domain of human computer interaction [Kendon 86, Cadoz, 1994, McNeill 1992, etc.]. [Cadoz, 1994] uses the term of gesture communication path and identifies 3 types of gestures (or 3 different functional roles associated to gestures) hence we have classification by gesture functions:

  • ergotic gesture that acts on the environment and derives from the notion of modeling the real world. It is the gesture that may be considered for interacting with the virtual objects of a virtual environment.
  • epistemic gesture that offers information with regards to temperature, pression, shape, orientation, weight (the tactile sense). The environment discovery gets done through tactile experience.
  • semiotic gesture that produces an informational message for the environment with the rol of conveying information. It is the type of gestures for yes/no, approve/deny type of actions for the human computer dialogue.

Taking further the studies on the psycolingustic approach for gesticulation = spontaneous gestures, associated with speech, [McNeill 1992] defined 4 types of gestures:

  • iconic gesture that reproduces a serie of characteristics of the object, action or phenomenon being described
  • metaphoric gesture that represents a common metaphore
  • beat gesture usually associated with enhancing and supporting speach
  • deictic gesture as pointing

It has been noticed that gesticulation or spontaneous gestures represent about 90% of the entire human gestures. Studies conducted on iconic, metaphoric and deictic gestures show their segmentation into 3 phases [Kendon, Cassell, Wilson], an aspect that is very important considering the gesture acquisition process by an information system: shifting into the gesture space or the preparation phase, quick strocke and shifting back into the resting phase or the pulling back.

2.2. Current technologies for gesture interaction. Considerations with regards to visual gesture recognition

A quick presentation of current technologies for gesture acquisition and interaction, especially with the aim of immersion inside virtual environments, includes: magnetic devices (Ascension's Flock of Birds), mechanical (Fakespace's BOOM Tracker), acoustic (Logitech's Fly Mouse), inertial (InterSense IS300), video based (video cameras with eventual additional colored markers that attach to the object to be tracked) or hybrid devices (InterSense IS600).

Comparing to other gesture acquisition technologies, video sources present the advantage of the lack of intrusion; the user is not restricted to using or wearing additional equipments or instruments (for example sensor gloves), that creates the feeling of natural interaction. Video gesture recognition appears this way as the ideal technology for the human computer interaction by eliminating the inconveniences presented by other methods (the innoportunity of the keyboard or the traditional mouse in the virtual environments, wearing sensor gloves, etc.)

Nevertheless, there are limitations of video processing systems, such as: video resolution is not sufficient for detecting high fidelity movements of fingers; 30 frames per second for conventional video capture devices is usually not enough for capturing quick hand movements (hand is quicker than the eye); fingers detection may become challenging cause of occlusion.

2.3. State of the art for video gesture recognition

Basic research has been conducted under the general term of gesture recognition, which can be grouped in two major categories: (1) visual gesture acquisition (using technics that are specific to video processing, image processing and artificial intelligence) and (2) actual gesture recognition (using techniques that are specific to pattern recognition)

2.3.1. Visual gesture acquisition

Gesture acquisition considers the detection and tracking of an object of interest (for example the hand with the fingers, face and eyes for the head's movements). Detection techniques include video segmentation function of several characteristics: color, motion or mixed (for example edges as the result of an edge detection process). Tracking [Chen et al. 2003] represents the process of continuously determining a series of characteristics of the object of orientation in video sequences (such as position, orientation, etc.). There are a few criteria that define a tracking system: the accuracy or the error between the real location and the measured one, number of degrees of freedom, domain or the maximum working area, etc.

An important research direction is centred on hands / face detection using skin color detection algorithms [Caetano et al. 2001, 2003].

Face detection has been given a lot of attention lately and several approaches have been proposed [Starovoitov et. al 2002, Lienhart et al. 2002, Li et. al 2002]. We must consider the important contribution of [Viola & Jones 2001] that implemented a real time face detection system working at 15 fps on conventional desktop PCs using Haar like features (the code is available as part of the open source project Open CV of Intel).

2.3.2. Gesture recognition techniques

Markov models [Starner, 1995] have been given a lot of attention for gesture recognition. Using these type of models is motivated by a series of observations with regards to the nature of gestures, as follows: gestures vary with location, time and social factors; gestures have semantic content attached; gestures present a series of regularities that make them appropriate for linguistic methods. The idea behind using Hidden Markov Models is to use multi-dimensioanl models representing different gestures. Each model's parameters are determined on the training data set.

Markov models have been used by [Starner, 1995] for the recognition of ASL – American Sign Language.  The system uses a color camera in order to detect hands wearing colored gloves.

[Hong, Turk et al.] describe a 2D gesture recognition technique in which each gesture is modeled using a finite state machine, FSM in the spatio-temporal space. By using the continuous gesture information, the movement is separated into phrases using only the spatial information.

[Kumar et al.] propose a method of gesture classification based on temporal motion templates and wavelet transforms. The method is based on SWT – Stationary Wavelet Transform. They use a temporal representation of gesture based on differences between consecutive images by building motion templates.

For [James William Davis, MIT], the starting point is motivated by the fact that a human observer can instantaneously recognize gestures without great effort in low level resolution images. A Binary Motion Region, BMR image is computed to act as an index in the gesture library. BMR describes the spatial distribution of motion for a given angle and for a give gesture.

2.4 Organizations and interest groups in the field, both on national and international levels

On the national level, one can identify the activity of the local ACM SIGCHI group for Romania, RoCHI, founded in 2000 and having as the main objective the development of a interdisciplinary forum for the exchange of ideas in the field of human computer interaction (ACM SIGCHI Curricula for HCI). Also, the national conferences on human computer interaction that have been organized by the group at Bucharest, 2004 and 2005. The group also distributes the tutorials of the past CHI conferences as well as the series of "Human Computer Interaction" volumes launched by RoCHI in 2003.

On the international level, one can identify the major conferences in the field such as the annual ACM/CHI conferences as well as the HCI workshops: International conference on Intelligent User Interfaces, Conference on Human Computer Interaction, Human Work Interaction Design, annual conference on Human-Robot Interaction, IEEE symposium on 3D User Interfaces, etc.

2.5 Potential users

The system will allow for a natural interaction with information and robotics systems by means of gestures, without restricting the user to wear additional devices or equipment. The system will be for sure integrated in the virtual reality applications as well as in the augmented reality ones. In what concerns robots interaction, the system is absolutely necessary when it comes to situations where radio or wire based communication is impossible or unwanted (military applications, polluted or dangerous environments. By cumulating the categories of users (students, designers, home users) of all these information and robotics systems, one can easily notice that the propsed system will benefit of a wide demand on the market especially due to the newly introduced paradigm for human computer interaction. The great number of potential users has brought Microsoft into the field as well [Hong et al.].

--------------------------------------- References ---------------------------------------

[1] Claude Cadoz. Le geste canal de communication homme/machine. In Technique et Science Informatique, Vol. 13, No 1, pp. 31-61, 1994.

[2] Adam Kendon. Current issues in the study of gesture. In The Biological Foundation of Gestures: Motor and semiotic Aspects, pp. 23-47, Lawrence Erlbaum Associate, Hillsdale, NJ, 1986.

[3] David McNeill. Hand and mind: What gestures reveal about thought. University of Chicago Press, 1992.

[4] Andrew Wilson, Aaron Bobick, Justine Cassell. Recovering the temporal structure of natural gesture. In Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition, 1996

[5] Fakespace Labs, http://www.fakespacelabs.com ; Pegasus Technologies Ltd. http://www.pegatech.com; Ascension Technology Corporation http://www.ascension-tech.com/products/flockofbirds.php; Logitech Inc., www.logitech.com; Intersense, http://www.isense.com/products

[6] Chen, F.S, Fu, C.M., Huang, C.L., Hand gesture recognition using a real-time tracking method and hidden Markov models, IVC(21), No. 8, August 2003, pp. 745-758.

[7] T. S. Caetano, D. A. C. Barone, A probabilistic model for human skin color, IAP Conf. 2001, pp. 279-283

[8] V.V.Starovoitov, D.I.Samal, D.V.Briliuk, Three Approaches For Face Recognition, IPRAI Conf. 2002, Russia

[9] Paul Viola, Michael Jones, Rapid Object Detection using a Boosted Cascade of Simple Features, CVPR Conf. 2001

[10] R.Lienhart, J.Maydt, An Extended Set of Haar-like Features for Rapid Object Detection, Intel Labs,2002, Intel

[11] A.Mulder, Hand Gestures for HCI, Hand Centered Studies of Human Movement Project, TR 96-1, 1996

[12] James William Davis, Appearance-Based Motion Recognition of Human Actions, MIT Media Lab, TR 387, 1996

[13] P.Hong, M.Turk, T.S.Huang, Constructing Finite State Machines for Fast Gestures Recognition, Microsoft Reasearch

Starner T., Pentland A., Real time american sign language recognition from video using hidden Markov model, TR. 375, MIT Media Laboratory, 1995

[Project] [Context] [Objectives] [Presentation] [Justification] [Diagram] [Results] [Impact] [Management] [Resources] [Team] [Final Report] [Dissemination]