“When we talk to each other, we don’t just rely on sound,” says Dr Andrew Abel from the Department of Computer Science and Software Engineering at Xi’an Jiaotong-Liverpool University.
“We look at each other’s faces, we look at each other’s body language, and we all lip-read to an extent,” he continues. “So far, we’ve been unable to incorporate these things into hearing aid technology. That’s ultimately what we’re looking to change.”
Dr Abel and his fellow researchers are investigating how to use technology to replicate the brain’s natural abilities to process speech. The goal is to greatly improve hearing aids, but the technology would have applications in other areas as well.
“In the long term we’re moving towards what’s being called a ‘cognitive hearing aid’ – one inspired by the brain,” he says.
WHAT PERCEPTUAL PHENOMENA REVEAL ABOUT HUMAN HEARING
Quirks of human hearing, or perceptual phenomena, have for decades served as starting points for researchers, and have inspired Dr Abel and his colleagues, too.
For example, people unconsciously raise or lower the volume of their voice depending on the level of background noise. This is known as the Lombard Effect, and it is one of our instinctive processes for making ourselves heard, understood, and overcoming what is known as ‘the cocktail party problem’.
Current hearing aids are often ineffective in noisy environments such as parties, where people with full hearing are still able to communicate effectively, thanks to something known simply as the ‘cocktail party effect’.
A listener with full hearing has the ability to focus on a particular person’s speech and ‘tune out’ others. Sounds and speech the brain determines to be less relevant become more easily ignored.
“This isn’t just due to the mechanics of hearing, but is also about how our brains process that sound,” says Dr Abel.
A particularly curious phenomenon known as the McGurk effect demonstrates how visual information impacts on the brain’s ability to process speech.
The effect occurs when a speech sound, for example, the syllable ‘ba’, is heard along with a visual component such as lip movements for another sound, for example, the syllable ‘fa’. The sound will be perceived as ‘fa’ or even sometimes a third sound such as ‘va’ (see video above).
“This happens because your brain receives information that doesn’t make sense,” explains Dr Abel.
“It is seeing one thing and hearing another, and it tries to process it as best as it can. Even if you know about the McGurk effect, it is an audio-visual illusion which will still happen to you.
“This shows the importance of visual information in processing speech, even to the extent that what we see can override what we hear.”
Psychologists and hearing scientists have studied these effects for decades, and there has also been work by biologists to determine how sound is processed by the brain, and which neurons are involved in these effects.
“My fellow researchers and I have used these perceptual phenomena to inspire our work,” says Dr Abel, “identifying aspects of human hearing and speech processing, and determining ways to replicate them with technology. The question is, how can we develop machines that can receive all the input that we use, and ‘hear’ like we do?”
DR ABEL’S RESEARCH
Many conventional hearing aids work by amplifying certain frequencies that the user has trouble hearing. Some hearing aids have noise-cancelling algorithms, reducing the volume of frequencies not used in human speech, or directional microphones to detect sound only from specific directions.
The next step is to design hearing aids that can use additional information, for example visual information provided by a camera, to improve the filtering.
Prior to joining XJTLU, Dr Abel worked with Professor Amir Hussain at the University of Stirling, Scotland, to develop technology for a potential hearing aid linked to a small wearable camera.
“The idea is that an audio signal from a microphone and visual information from a camera can be fed into the same system which then processes the information, using the visual information to filter the ‘noise’ from the audio signal,” says Dr Abel.
above: mouth tracking methods developed by XJTLU researchers
Inspired by this, Dr Abel is focussed on studying the fundamentals of how such visual input can be effective. The first step is processing images to isolate relevant information about lip movement.
A new system devised by Dr Abel, XJTLU graduate Chengxiang Gao, and researchers from the University of Stirling, can track the movements of a subject’s mouth, determining whether the mouth is open or closed, and the width and depth of the mouth when it is open (referred to as ‘lip features’).
“The information is used to build up three-dimensional representations of what is going on in the mouth,” explains Dr Abel.
“This 3D representation can be used to estimate volume and pitch characteristics of speech, which could then potentially be applied to the noise-reduction function of a hearing aid. They can also be used for lip reading, and this is also part of our research.”
The team’s new approach can successfully extract lip features with minimum processing, and from a range of different speakers. The system is designed to be quick, simple, lightweight, robust, and able to recover if, for example, the speaker turns away from the camera.
Another area Dr Abel is looking into is the use of image-recognition technology to improve noise filtering. For example, if a camera can recognise what type of environment a user is in, be it a quiet office environment or a noisy bar, appropriate noise filters can then be applied.
“This is something our brains do naturally - we can take account of distractions and ignore them without really thinking about it,” says Dr Abel.
There are varied applications for his research. For example, he is currently overseeing a Final Year Project about applying lip reading technology to the study of Chinese as a second language.
“With the right audio and visual input, a system could give automatic correction and feedback for learners trying to distinguish between different sounds in spoken Chinese,” he says.
“Chinese is a tonal language, with sounds and voicing different from other languages, and it can be difficult for people to learn how to get the sounds right.”
Language itself is key to speech processing, and is another aspect Dr Abel and his fellow researchers are trying to incorporate.
“We use our knowledge of language when we process speech,” he says.
“A lot of our speech processing is prediction-based. That’s why if one unexpected word is used in a sentence we can find the whole sentence difficult to understand.”
Ultimately, Dr Abel and his colleagues hope to one day incorporate word recognition and prediction-based speech processing, as well as environment recognition and other visual information, into an improved hearing aid that thinks like we do.
“When we can understand and replicate what actually happens when we listen, we will not only be able to improve hearing aids, but we will learn so much about ourselves, and how our minds work,” he says.
By Danny Abbasi; image provided by the Department of Computer Science and Software Engineering;
Additional images from Shutterstock
For more information about the Department of Computer Science and Software Engineering at XJTLU, visit its official webpage.