Abstract:
Sign language is one of the oldest and most natural form of language for communication, but since most people do not know sign language and interpreters are very difficult to come by we have come up with a real time method using Convolutional Neural Networks. The learning aids for hearing and speech disabled people exist but the usage of these aids are limited. The proposed system would be a real time system wherein live sign gestures would be processed using image processing . Then classifiers would be used to differentiate various signs and the translated output would be displaying text .Deep learning algorithm CNN will be used to train on the data set. In our project we aim to develop a cognitive system which would be responsive and robust so as to be used in day to day applications by hearing and speech disabled people.
The proposed system would be a real time system wherein live sign gestures would be processed using image processing. Then classifiers would be used to differentiate various signs and the translated output would be displaying text . I created my own dataset of 29 classes. The dataset includes 2900 images in ‘.png’ format ,100 for each of the 29 classes. The dataset used is the American Sign Language Dataset. The algorithm used for the Proposed system is Convolutional Neural Networks (CNN) model which is a popular deep learning method and is state of the art for image recognition. The model performs feature extraction from images through multiple layers. These are later used for training the model and thereby recognizing characters.
Here I am focusing on an application that will help us to recognize the sign language characters. The application that will take image as input. The image will contain English alphabets, space, nothing and delete. The application will recognize and output the characters in the image and using these characters we can build a sentence. It will process the image using CNN and display the characters found in the image. It implement using AlexNet architecture of CNN. AlexNet consists of 5 Convolutional Layers and 3 Fully Connected Layers. The hand is first passed through a classifier which predicts the class of the hand gestures. Our method provides 99 % accuracy for the 29 characters.