New computer software programme excels at lip reading

Main Content

“A new computer software programme has the potential to lip-read more accurately than people and to help those with hearing loss, Oxford University researchers have found. Watch, Attend and Spell (WAS), is a new artificial intelligence (AI) software system that has been developed by Oxford, in collaboration with the company DeepMind. The AI system uses computer vision and machine learning methods to learn how to lip-read from a dataset made up of more than 5,000 hours of TV footage, gathered from six different programmes including Newsnight, BBC Breakfast and Question Time. The videos contained more than 118,000 sentences in total, and a vocabulary of 17,500 words. The research team compared the ability of the machine and a human expert to work out what was being said in the silent video by focusing solely on each speaker’s lip movements. They found that the software system was more accurate compared to the professional. The human lip-reader correctly read 12 per cent of words, while the WAS software recognised 50 per cent of the words in the dataset, without error. The machine’s mistakes were small, including things like missing an “s” at the end of a word, or single letter misspellings.”

Link to article

Cookies

Main Content