It has been a concern for the people like larynx or vocal cord conditions, deaf and blind, dumb (temporarily unable to speak) people to communicate with a fully able person, and to communicate in between them effectively. This solution deals with people like blind, deaf, and dumb. When such a person arrives, a webcam captures the lip movement and stores the lip movements as an array of images. Then those inputs are further processed and output as a simple text or audio so that the audience can understand the speaker only using the mouth movement.
This solution minimizes the gap between a fully able person and people with any vocal issues. Such a system can be deployed at any public place like Bus station, Railway station, Airport & Hospitals, etc. using a web application or a device with a simple webcam so that those people can continue with their day to day tasks without the dependency of another.
The above diagram simply explains the total methodology of this system and we will look into it step by step as follows: –
- At first, a camera captures the video segment of a word uttered by the disabled person and stores it in the memory.
- Then the video segment is sent for face localization and split into multiple images at a given frame rate per second.
- Using the above set of images Lip is localized and to get a better output the noise is removed.
- Features of the Lip area are taken using those images and stored for the classification process.
- Using machine learning algorithms above features will be cross-checked along with a predefined data set and classified into a word along with a language model.
- Finally, the output is provided through a text or audio to the audience.
There are existing methods to read the lip movement of a disabled person which is done by examining the speech patterns, movements, gestures, and expressions of the speaker. But there is a learning process involved in a fully able person to communicate in such away. The proposed solution cuts down the extra efforts taken in such methods and gives a flexible way to communicate with the speaker.
- This system is fully based on the quality of the image input at the first stage, hence it varies upon the camera quality
- Slight delay comparing to real-time conversation
- The system is trained only for one language at a time
- Words with same mouth movements cannot be trained
Overall, this system provides a huge contribution to disabled persons once implemented with better processing equipment. If the particular person pre-trained in the system it is possible to hit an accuracy above 90%.
Also do contact us for further study materials like coding etc. and do comment on your feedback and further enhancement methods for this system. Thank you!