学术论文

      面向中文语音情感识别的改进栈式自编码结构

      Improved stacked autoencoder for Chinese speech emotion recognition

      摘要:
      为进一步提高汉语语音情感识别率,基于深度学习中的自编码、降噪自编码及稀疏自编码的网络结构,提出了一种改进的栈式自编码结构.该结构第1层使用降噪自编码学习一个比输入特征维数更大的隐藏特征,第2层采用稀疏自编码学习稀疏性特征,最后使用softmax分类器进行分类识别.训练过程首先采用逐层预训练的方法,达到网络参数全面初始化的目的,然后对整个网络进行微调.在中文语音库上的情感识别实验显示,相较于单独使用栈式降噪或稀疏自编码,所提结构具有更好的识别效果.此外,基于CASIA库的对比实验显示,该结构比K近邻算法、稀疏表示方法、传统支持向量机和人工神经网络识别率分别提高了53.7%,29.8%,14.3%和1.9%.在自行录制的语音库中,该结构的识别率比人工神经网络提高了1.64%.
      Abstract:
      An improved stacked autoencoder based on autoencoder, denoising autoencoder and sparse autoencoder is proposed to improve the Chinese speech emotion recognition.The first layer of the structure uses a denoising autoencoder to learn a hidden feature with a larger dimension than the dimension of the input features, and the second layer employs a sparse autoencoder to learn sparse features.Finally, a softmax classifer is applied to classify the features.In the training process, the layer-wise pre-training is used to achieve the purpose of initializing all parameters of the network, and then the whole network is fine-tuned.The experiments on Chinese databases show that the improved stacked autoencoders achieve a better recognition rate than the stacked denoising autoencoders or stacked sparse autoencoders.In addition, the comparative experiments based on CASIA database show that the recognition rate of the structure is improved by 53.7%, 29.8%, 14.3% and 1.9%, respectively, compared with the K-nearest neighbor algorithm, the sparse representation method, the traditional support vector machine and the artificial neural network.The recognition rate of this structure is 1.64% higher than the artificial neural network on the self-recording database.
      作者: 朱芳枚 [1] 赵力 [1] 梁瑞宇 [2] 王青云 [3] 邹采荣 [1]
      Author: Zhu Fangmei [1] Zhao Li [1] Liang Ruiyu [2] Wang Qingyun [3] Zou Cairong [1]
      作者单位: 东南大学水声信号处理教育部重点实验室,南京,210096 东南大学水声信号处理教育部重点实验室, 南京 210096;南京工程学院通信工程学院, 南京 211167 南京工程学院通信工程学院,南京,211167
      年,卷(期): 2017, 47(4)
      分类号: TP391.42
      在线出版日期: 2017年8月15日
      基金项目: 国家自然科学基金资助项目,江苏省青蓝工程资助项目、江苏省博士后科研资助计划资助项目,江苏省"六大人才高峰"资助项目,中国博士后科学基金资助项目