Indian Institute of Information Technology, Allahabad Department of Information Technology Sign Language to Text Conversion for Dumb and Deaf September 10,2018
This report aims to provide a progress of our mini project on “Sign language conversion to Text for Dumb and Deaf”. Our project aims to create an application and train a model which when shown a real time video of hand gestures of American Sign Language shows the output for that particular sign in text format on the screen. Although many projects and research do exist on hand gesture recognition but there are very few which actually use it to produce an application for American sign language.
Sign language is the oldest and the natural form of language for communication. Communication is the process of exchange of thoughts and messages in various ways such as speech, signals, behavior and visuals . Deaf and dumb(D&M) people make use of their hands to express different gestures to express their ideas with the other people. Gestures are the nonverbally exchanged messages and these gestures are understood with vision. This nonverbal communication of deaf and dumb people is called sign language.Sign language is a visual language and consists of 3 major components: In our project we basically focus on producing a model which can recognise Fingerspelling based hand gestures in order to form a complete word by combining each gesture. The gestures we aim to train are as given in the image below.
For interaction between normal people and D&M people a language barrier is created as sign language structure which is different from normal text. So they depend on vision based communication for interaction. If there is a common interface that converts the sign language to text the gestures can be easily understood by the other people. So research has been made for a vision based interface system where D&M people can enjoy communication without really knowing each other’s language.
The aim is to develop a user friendly human computer interfaces (HCI) where the computer understands the human sign language. There are various sign languages all over the world, namely American Sign Language (ASL), French Sign Language, British Sign Language (BSL), Japanese Sign Language and work has been done on other languages all around the world.
In the recent years there has been tremendous research done on the hand gesture recognition. With the help of literature survey done we realized the basic steps in hand gesture recognition are:
The different approaches to acquire data about the hand gesture can be done in the following ways:
Use of sensory devices It uses electromechanical devices to provide exact hand configuration, and position. Different glove based approaches can be used to to extract information. But it is expensive and not user friendly.
Vision based approach In vision based methods computer camera is the input device for observing the information of hands or fingers. The Vision Based methods require only a camera, thus realizing a natural interaction between humans and computers without the use of any extra devices. These systems tend to complement biological vision by describing artificial vision systems that are implemented in software and/or hardware.
The main challenge of vision-based hand detection is to cope with the large variability of human hand’s appearance due to a huge number of hand movements, to different skin-colour possibilities as well as to the variations in view points, scales, and speed of the camera capturing the scene. Data preprocessing and Feature extraction for vision based approach:
In  the approach for hand detection combines threshold-based color detection with background subtraction.We can use Adaboost face detector to differentiate between faces and hands as both involve similar skin-color.
We can also extract necessary image which is to be trained by applying a filter called Gaussian blur. The filter can be easily applied using open computer vision also known as OpenCV and is described in .
We tried doing the hand segmentation of an image using color segmentation techniques but as mentioned in the research paper skin color and tone is highly dependent on the lighting conditions due to which output we got for the segmentation we tried to do were no so great. Moreover we have a huge number of symbols to be trained for our project many of which look similar to each other like the gesture for symbol ‘V’ and digit ‘2’, hence we decided that in order to produce better accuracies for our large number of symbols, rather than segmenting the hand out of a random background we keep background of hand a stable single color so that we don’t need to segment it on the basis of skin color. This would help us to.
In  Hidden Markov Models (HMM) is used for the classification of the gestures .This model deals with dynamic aspects of gestures. Gestures are extracted from a sequence of video images by tracking the skin-colour blobs corresponding to the hand into a body– face space centered on the face of the user. The goal is to recognize two classes of gestures: deictic and symbolic.The image is filtered using a fast look–up indexing table. After filtering, skin colour pixels are gathered into blobs. Blobs are statistical objects based on the location (x,y) and the colourimetry (Y,U,V) of the skin colour pixels in order to determine homogeneous areas.
In  Naïve Bayes Classifier is used which is an effective and fast method for static hand gesture recognition. It is based on classifying the different gestures according to geometric based invariants which are obtained from image data after segmentation.Thus,unlike many other recognition methods, this method is not dependent on skin colour. The gestures are extracted from each frame of the video,with a static background. The first step is to segment and label the objects of interest and to extract geometric invariants from them. Next step is the classification of gestures by using a K nearest neighbor algorithm aided with distance weighting algorithm (KNNDW) to provide suitable data for a locally weighted Naïve Bayes‟ classifier.
According to paper on “Human Hand
Using a Convolution Neural Network” by Hsien-I Lin , Ming-Hsiang Hsu, and Wei-Kai Chen graduates of Institute of Automation Technology National Taipei University of Technology Taipei, Taiwan, they construct a skin model to extract the hand out of an image and then apply binary threshold to the whole image. After obtaining the threshold image they calibrate it about the principal axis in order to center the image about it. They input this image to a convolutional neural network model in order to train and predict the outputs. They have trained their model over 7 hand gestures and using their model they produce an accuracy of around 95% for those 7 gestures.
This project is designed to create a user friendly interface to convert sign language to text. The main aim of considering ASL is that it is a static gesture language and it has a standard database. The system is a vision based approach. All the signs are represented with bare hands and so it eliminates the problem of using any artificial devices for interaction.
Data Set Generation
For the project we tried to find already made datasets but we couldn’t find dataset in the form of raw images that matched our requirements. All we could find were the datasets in the form of RGB values. Hence we decided to create our data set ourselves. Steps we followed to create our data set are as follows. 1. We used Open computer vision(OpenCV) library in order to produce our dataset. 2. Firstly we captured around 800 images of each of the symbol in ASL for training purposes and around 200 images per symbol for testing purpose. 3. First we capture each frame shown by the webcam of our machine. In the each frame we define a region of interest (ROI) which is denoted by a blue bounded square as shown in the image below. 4. From this whole image we extract our ROI which is RGB and convert it into gray scale Image as shown below. 5. Finally we apply our gaussian blur filter in order to our image which helps us extracting various features of our image. The image after applying gaussian blur looks like below.
After proper dataset generation we plan to read more about CNN(Convolutional Neural Networks) because as mentioned in various papers we read that it is a very good method to obtain more than 90 % accuracy for image classification related problems in deep learning. We will give our dataset as input to our CNN model and train our model and then test for its accuracy.