DiffCRNN: A Novel Approach for Detecting Sound Events in Smart Home Systems Using Diffusion-based Convolutional Recurrent Neural Network

Main Article Content

Maryam M. Al Dabel

Abstract

This paper presents a latent diffusion model and convolutional recurrent neural network for detecting sound event, fusing advantages of different networks together to advance security applications and smart home systems. The proposed approach underwent initial training using extensive datasets and subsequently applied transfer learning to adapt to the desired task to effectively mitigate the challenge of limited data availability. It employs the latent diffusion model to get a discrete representation that is compressed from the mel-spectrogram of audio. Subsequently a convolutional neural network (CNN) is linked as the front-end of recurrent neural network (RNN) which produces a feature map. After that, an attention module predicts attention maps in temporal-spectral dimensions level, from the feature map. The input spectrogram is subsequently multiplied with the generated attention maps for adaptive feature refinement. Finally, trainable scalar weights aggregate the fine-tuned features from the back-end RNN. The experimental findings show that the proposed method performs better compared to the state-of-art using three datasets: the DCASE2016-SED, DCASE2017-SED and URBAN-SED. In experiments on the first dataset, DCASE2016-SED, the performance of the approach reached a peak in F1 of 66.2% and ER of 0.42. Using the second dataset, DCASE2017-SED, the results indicate that the F1 and ER achieved 68.1% and 0.40, respectively. Further investigation with the third dataset, URBAN-SED, demonstrates that our proposed approach significantly outperforms existing  alternatives as 74.3% and 0.44 for the F1 and ER.

Article Details

Section
Special Issue - Recent Advance Secure Solutions for Network in Scalable Computing