INTEROB

Project
Context
Objectives
Presentation
Justification
Diagram
Results
Impact
Management
Resources
Team
Final Report
Dissemination

4. SCIENTIFIC AND TECHNICAL PRESENTATION OF THE PROJECT:

4.1. Design of the gesture dictionnary

The design of the gesture dictionary will contain a serie of objectives, as follows::

    § proposal for a gesture dictionary including hand postures, gesture trajectories and head movements
    § gathering of voluntary users for collecting video data containg gesture commands
    § ergonomic studies with regards to the users' capacity of interaction by the means of gesture with an information system and considerations on the aspect of complexity, naturalness, flexibility and efficiency of the proposed gestures.

Taking into account for example virtual reality systems, one can identify 3 types of commands in what concerns the possible interactions:

    § general application commands (such as yes/no/menu activation, etc.)
    § commands for interacting with virtual objects (translation, rotation, scale, etc.)
    § commands for travel in the virtual environment (change of camera's point of view, travel inside a scene, zoom operations, etc.)

General application commands do not address to objects or to the virtual environment but to the application in  a general manner:

    § activate / inactivate the gesture interaction technology – a necessary gesture representing the decision of using the interaction technology, decision that must be made known to the system
    § working with menus (a proposal consisting in 3 operations is presented in the picture below) – absolutely necessary in the existing WIMP metaphore (Windows, Icons, Menus, Point)
    § yes / no – there is the necessity for simple gestures to confirm/negate at several stages of interaction (validating an action, answering to a question of the application, etc.)
    § undo / redo – simple gestures to annulate or to allow the remake of an operation.

 

               

 

Menu activation

Selecting an option from a contextual menu attached to a virtual object

Closing the menu

Figure 1. Gestures proposal for working with menus

In what concerns the commands that relate to working with virtual objects, we can identify:

    § selection (simple or multiple) – see an example in the figure below
    § translation of a virtual object inside a scene
    § rotation of an object (considerations on wheter the rotation should be performed using one or two hands ?)
    § scale of a virtual objects

   
Figure 2. Gestures proposal for selection / rotation of a virtual object

The commands for travel inside a virtual environment are represented by (two examples make use of natural head movements as presented in the picture below):

    § changement of the camera's point of view
    § travel along a direction previously chosen (normal speed, augmented speed, etc etc.)
    § zoom operations on the scene

   
Figure 3. Head movements to indicate change in the travel direction

4.2. Visual acquisition of hand and head gestures

For hand gesture detection and acquisition we propose a segmentation technique based on complementary informations obtained using:

    § color features (color based segmentation)
    § motion detection and background subtraction methods
    § derivate features (such as edges resulting from an edge detection process)

The accent is on color segmentation and on designing a dynamic adaptive filter for skin color detection using the HSV color space (see next picture below). The current video frame is segmented using the following condition:

 

where the color thresholds for hue / saturation are automatically determined function of the particularities of the video frame to be analysed (brightness, background objects, etc.).


Figure 4. Three consecutive video frames and their associated 2D hue / saturation histogram
(Vatavu et.al, Advances in Electrical and Computer Engineering, 1/2005)

The necessity of this solution can be easily observed from the above pictures that presents 3 consecutives video frames. One can easily observe the consistent variation that appears in what concerns the 2D hue / saturation histograms between consecutive frames. The larger rectangle stands for the maximum limits the skin color may vary in and the smaller rectangle represents the actual skin color limits for the given video frame.

The pre processing algorithms for skin color detection are going to be firstly developed on desktop systems and we shall consider a future implementation on a hardware based System on Chip architecture in order to reduce the total processing time and releaving the main system of additional computation. We are considering a SoC architecture that will:

    § capture in real time video information using a camera attached to our device
    § process in real time the captured video frames using skin color and motion detection methods
    § transfer from the PC working parameters such as: frames per second, image resolution, compression type, etc.
    § transfer to the PC of the obtained results with regards to the motion detected areas or skin color regions of interest

The aim of the hardware architecture is to reduce the amount of processing power required on the desktop machine by performing a series of preprocessing on specialized hardware devices. The architecture includes: (1) 32 bits microprocesor (Xilinx Microblaze); (2) Ethernet interface for PC communcation; (3) UART interface and JTAG interface for debug operations; (4) SDRAM/FLASH interface for data storage; (5) VGA/LCD graphical interface for displaying results; (6) USB interface for video camera communication. The device presents a high level of novelty both on the implementation level and on the software interfacing. Although current research in gesture recognition rapidly progresses, there are no references in the literature with regards to the existence of similar devices.

In what concerns face detection, an algorithm based on [Viola & Jones, 2001] will be developed with Haar based features and Ada Boost learning methods. Boosting methods have the goal of enriching the individual performances of any learning algorithms. The method has as input data a training set and a series of classifiers that are considered to be weak or with weak performances. The goal is to combine all these weak classifiers into a strong one. The weak classifiers we are going to use or of the following type , where represents a Haar feature, a threshold and a parity indicator for the inequality sense. x represents a sample from the training set. The classifiers that are selected by the Ada Boost method are going to be organized in a cascade structure where the positive result of a classifier triggers the computation of the next one. The basic idea is that most of the video frames are going to be rejected by only computing a few classifiers. Thus, the total computing time can be greatly reduced this way. We can train a simple classifier to posses an accuracy of 100% on the training set but in this way, the percent of false positives will be greater too. However, these false positives examples will be eliminated by the next levels of the cascade.

 

4.3. The software architecture of the interaction system

In order to implement the interaction system, the following modules will be developped:

    § a video capture module, implemented as a DirectShow filter (VideoGrabber.ax)

    § a hand gesture acquisition module implemented as a DLL library (using OpenCV)

    § face and head movements module (DLL, using OpenCV)

    § the gesture dictionary


    Figure 5. Software system architecture

The modules will be developed using Microsoft Visual Studio .NET 7.0, Microsoft DirectX 9.0 SDK and the image processing library Intel OpenCV. In order to develop the application conforming to the new Microsoft standards in what concerns media streaming on the Windows platforms, the DirectShow component of DirectX 9.0 will be used. DirectShow supports video capture from WDM sources (Windows Driver Model) as well as from older devices using Video for Windows. Also, another advantage is given by the automatic detection of video hardware acceleration. For the video capture module, a DirectShow filter will be developed that can be used in the main filter graph, all with a very simple interface consisiting in a callback function that will deliver video frames to the main application.

4.4. Several implementations of the gesture interaction system

4.4.1. Working in virtual environments

The gesture interaction system will be implemented for a collaborative virtual environment Spin3D currently under development at LIFL Laboratoire d'Informatique Fondamentale de Lille. The goal is to assure a natural manipulation of the virtual objects (translations, rotations, scale changes, changes of the virtual object's properties and behaviour) using similar gestures with the ones proposed under paragraph 4.1. refeering to the gesture dictionnary. The gesture based interaction with virtual objects is already the object of a PhD thesis conducted in the collaboration between University of Suceava and LIFL of Universite des Sciences et Technologies de Lille.

4.4.2. Working in an augmented reality

The augmented reality application will use a video projector device that will project virtual objects on a real working table, next to real objects (see the picture below). Work will be done with both real and virtual objects in the same manner, hence using the same gestures (see the specifications of the gesture dictionanary as a very important part of the project proposal).


Figure 6. Working with virtual objects in an augmented reality applications
The virtual objects are projected on a real working table

4.4.3. Interfacing with a static arm type robotic system

The robot system Hercules has beed developed at the University of Suceava for several years now as a computer vision system. It is an arm type robot with pincers that allows executing commands in an action sphere of 501mm range. Commands are sent to the robot by an image processing application developed at the University of Suceava. The system is composed of a visual analysis component that processes objects from the robot's working area and issues commands for grabbing and moving them. The research will be conducted for the introduction of a gesture based interface for controlling the robot taking into account its specific type of operations.

4.4.4. Interfacing with a mobile robotic system

The mobile robot Centaure is the result of the collaboration between the University of Suceava and the automatics laboratories of Polytech'Lille of the Universite des Sciences et Technologies de Lille. The robot is capable of patrolling a previously defined perimeter and to gather information of the environment. The specifications of the Centaure robot include: rectangular base of 70cm x 40 cm; wheels placed on the inside with traction on the back: two 20 cm wheels and two small joecky type wheels; speed controller for 12/24V motors; OOPic-R microcontroller; position sensors; contact sensors; infrared sensors; infrared markers for the patrolling perimeter; video camera. The research will be conducted towards a gesture command inteferace taking into considerations specific gesturs (waving for getting attention, launching commands, changing travel directions, etc.).

4.5. The novelty and the complexity of the proposal consist in:

    § designing a gesture dictionary that will become a standard for the gesture based interface, taking into account the ergonomicity, naturleness, flexibility and the easiness of memorizing gestures. We must consider the fact that there are in the literature proposals for gestures for specific actions but there is not any attempt of standardizing on a common interactional gesture dictionanary
    § analyzing the gesture trajectories by correlating the visual information from two video cameras. There are attempts and approaches in what concerns stereoscopic analysis but they are centred on simple and specific actions. The proposal aims at embedding inside the same analysis following of the hand trajectories as well as of the head's movements in a viable model that will posses the requirements of a interaction interface,  positioning itself in the trend of current computer vision top research
    § experimenting and validating the system by implementing it for 4 different scenarios (inside a virtual reality system for virtual objects manipulations, inside an augmented reality system, for commading the static arm type robot Hercules and for controlling the mobile robot Centaure)

[Project] [Context] [Objectives] [Presentation] [Justification] [Diagram] [Results] [Impact] [Management] [Resources] [Team] [Final Report] [Dissemination]