Team wins Judges’ Award at international contest with acoustic scene classification system

09 Dec 2024

Helping smart devices identify sound environments enhances safety and user experience in various applications such as hearing aids and self-driving vehicles. For instance, hearing aids can automatically adjust noise reduction levels and strengthen sound based on where the wearer is, and a vehicle’s self-driving system can make more accurate and safer decisions by listening to its surroundings. 

A team of researchers from Xi’an Jiaotong-Liverpool University (XJTLU) and Nanjing University of Posts and Telecommunications (NJUPT) has developed an acoustic scene classification system that can monitor, identify, and analyse various sounds and automatically classify the different scenes where the sound sources are collected, such as streets, parks, airports, and subway stations. This project earned the team the Judges’ Award at the Detection and Classification of Acoustic Scenes and Events (DCASE) contest.

The team is comprised of two students from XJTLU, Yiqiang Cai, PhD student from the School of Advanced Technology and Minyu Lin, a Year Four student of BEng Telecommunications Engineering, as well as two supervisors, Dr Shengchen Li from the Department of Intelligent Science at XJTLU, and Professor Xi Shao from NJUPT. 

Dr Shengchen Li (left) and Yiqiang Cai (right) at the DCASE award ceremony

“We overcame two main challenges in designing our acoustic scene classification system. One is that the system needed to be deployed in small devices such as headphones and microphones, with limited memory and power consumption, so the algorithms couldn’t be too large or too complex,” says Cai, the project lead.

“The other challenge lay in the new requirements of the contest regarding the system’s training method. Currently, most such systems are based on deep learning models, which need a large amount of manually labelled data in training to tell the system what category a certain sound belongs to. This method is costly. Therefore, it was required that all participating teams use limited labelled data for training to reduce human resources needed and improve the efficiency of algorithm training,” he says.

“To tackle these challenges, we utilised self-supervised learning to train the system. We need to design effective self-supervised tasks so that the model can automatically learn useful features from audio data,” explains Cai. Self-supervised learning is a type of machine learning that allows models to learn meaningful representations from unlabelled data, saving the time it takes to manually label data.

The team also applied model compression techniques to make the system lightweight enough to run smoothly on small devices.

Yiqiang Cai designing the classification system in his office

Dr Shengchen Li says: “The team’s innovative approach, including the application of self-supervised learning, impressed the contest organisers and highlighted the system’s ability to produce accurate classification results without manual sound feature extraction.”

Cai’s journey with DCASE has led to continuous learning, problem-solving, and the development of critical thinking skills.

Before pursuing his PhD, Cai studied MSc Financial Computing at the School of Advanced Technology. During that period, he participated in DCASE for several times. Although his academic background was not related to the audio field, he developed a strong interest in acoustic scene classification, which encouraged him to deepen this research in his PhD studies.

A group photo of the organisers and participants after the awarding ceremony

DCASE is an international contest that attracted 17 teams from around the world, including universities from Germany, France, Singapore, Australia, and top universities in China. It is a platform for promoting interdisciplinary research in audio signal processing and machine learning.

“Communicating and exchanging ideas with researchers from all over the world broadened my international horizons and made me realise the charm of scientific research that knows no borders,” says Cai.

 

By Huatian Jin

Translated by Xiangyin Han

Edited by Patricia Pieterse

Photos courtesy of the School of Advanced Technology

09 Dec 2024


RELATED NEWS

SAT academic wins IEEE VIS Test of Time award
Science and Technology

SAT academic wins IEEE VIS Test of Time award

In a world overflowing with data, turning complex information into visual representations is more important than ever. Visualisation techniques are widely us...

Learn more
On being a dean and using AI for good
Science and Technology

On being a dean and using AI for good

While Teachers’ Day falls on different dates in different countries, it is celebrated on 10 September in China. This year we are profiling one of our outstan...

Learn more