双级门控分段式多模态情绪识别方法

doi:null

您当前的位置：

首页 >

文章列表页 >

双级门控分段式多模态情绪识别方法

学术论文 | 更新时间：2025-05-07

- 双级门控分段式多模态情绪识别方法
- Dual-Stage Gated Segmented Multimodal Emotion Recognition Method—***¹ , ***¹ , ***², ***¹
- 智能科学与技术学报 2025年
- 作者机构：
- 作者简介：
- 基金信息：
  
  省教育科学“十四五”规划课题（******）;**省科技厅自然科学基金计划面上项目（*********）
- DOI：
  中图分类号： TP391
- 稿件说明：
移动端阅览
双级门控分段式多模态情绪识别方法[J/OL]. 智能科学与技术学报, 2025.

Dual-Stage Gated Segmented Multimodal Emotion Recognition Method—***¹ , ***¹ , ***², ***¹[J/OL]. Chinese journal of intelligent science and technology, 2025.
双级门控分段式多模态情绪识别方法[J/OL]. 智能科学与技术学报, 2025. DOI：

Dual-Stage Gated Segmented Multimodal Emotion Recognition Method—***¹ , ***¹ , ***², ***¹[J/OL]. Chinese journal of intelligent science and technology, 2025. DOI：

摘要

多模态情绪识别在心理健康检测与机器情感分析中应用广泛，但现有研究多依赖全局或局部特征，忽略两者的联合建模，限制了情绪识别性能。为此，提出了一种双级门控分段式融合架构，包括交互阶段与双级门控阶段，交互阶段采用OAGL融合策略建模全局-局部跨模态交互，优化特征融合效率；双级门控阶段整合局部与全局特征，充分利用情绪信息。此外，针对模态间局部时序特征不对齐的问题，设计了基于缩放点积的序列对齐方法以提升融合精度。实验结果表明，该方法在多数据集上的识别效果优于现有算法，验证了其捕捉情绪细节的能力与泛化性能。

Abstract

Multimodal emotion recognition has broad applications in psychological health monitoring and affective computing. However

most existing studies rely on either global or local features

neglecting the joint modeling of both

which limits performance. To address this

a dual-stage gated segmented fusion architecture is proposed

consisting of an interaction stage and a dual-stage gating stage. The interaction stage leverages the OAGL fusion strategy to model global-local cross-modal interactions

improving the efficiency of feature fusion. The dual-stage gating stage integrates local and global features to fully utilize emotional information. Additionally

to resolve the misalignment of local temporal features across modalities

a scaled dot-product-based sequence alignment method is designed to enhance fusion accuracy. Experimental results demonstrate that the proposed method outperforms representative algorithms on multiple datasets

validating its ability to capture emotional details and its strong generalization capability.

关键词

Keywords

references

潘家辉 , 何志鹏 , 李自娜 , 等 . 多模态情绪识别研究综述 [J ] . 智能系统学报 , 2020 , 15 ( 04 ): 633 - 645 .

Pillalamarri R , Shanmugam U . A review on EEG-based multimodal learning for emotion recognition [J ] . Artificial Intelligence Review , 2025 , 58 ( 5 ): 131 .

Hossain M R , Hoque M M , Dewan M A A , et al . AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution [J ] . Expert Systems with Applications , 2025 , 262 : 125643 .

Gan Y , You Y , Huang J , et al . Multi-view Clustering via Multi-stage Fusion [J ] . IEEE Transactions on Multimedia , 2025 .

Fu Z , Liu F , Xu Q , et al . LMR-CBT: Learning modality-fused representations with CB-transformer for multimodal emotion recognition from unaligned multimodal sequences [J ] . Frontiers of Computer Science , 2024 , 18 ( 4 ): 184314 .

Liu F , Fu Z , Wang Y , et al . TACFN: transformer-based adaptive cross-modal fusion network for multimodal emotion recognition [J ] . CAAI Artificial Intelligence Research , 2023 , 2 : 9150019 .

Yang Z , He Q , Du N , et al . Temporal text-guided feedback-based progressive fusion network for multimodal sentiment analysis [J ] . Alexandria Engineering Journal , 2025 , 116 : 699 - 709 .

Zhu L , Zhao H , Zhu Z , et al . Multimodal sentiment analysis with unimodal label generation and modality decomposition [J ] . Information Fusion , 2025 , 116 : 102787 .

Wang R , Yang Q , Tian S , et al . Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis [J ] . Neurocomputing , 2025 , 618 : 129163 .

Fu Z , Liu F , Xu Q , et al . Nhfnet: A non-homogeneous fusion network for multimodal sentiment analysis [C ] // 2022 IEEE International Conference on Multimedia and Expo (ICME) . IEEE , 2022 : 1 - 6 .

Hazarika D , Zimmermann R , Poria S . Misa: Modality-invariant and-specific representations for multimodal sentiment analysis [C ] // Proceedings of the 28th ACM international conference on multimedia . 2020 : 1122 - 1131 .

Liu F , Shen S Y , Fu Z W , et al . Lgcct: A light gated and crossed complementation transformer for multimodal speech emotion recognition [J ] . Entropy , 2022 , 24 ( 7 ): 1010 .

Pham H , Liang P P , Manzini T , et al . Found in translation: Learning robust joint representations by cyclic translations between modalities [C ] // Proceedings of the AAAI conference on artificial intelligence . 2019 , 33 ( 01 ): 6892 - 6899 .

Tsai Y H H , Bai S , Liang P P , et al . Multimodal transformer for unaligned multimodal language sequences [C ] // Proceedings of the conference . Association for computational linguistics . Meeting. NIH Public Access , 2019 , 2019 : 6558 .

Yu W , Xu H , Yuan Z , et al . Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C ] // Proceedings of the AAAI conference on artificial intelligence . 2021 , 35 ( 12 ): 10790 - 10797 .

Sun L , Lian Z , Liu B , et al . Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis [J ] . IEEE Transactions on Affective Computing , 2023 , 15 ( 1 ): 309 - 325 .

Devlin J , Chang M W , Lee K , et al . Bert: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) . 2019 : 4171 - 4186 .

角远韬 , 李润梅 , 王剑 . 基于模糊自然语言处理的铁路CTC接口文本智能测试方法 [J ] . 智能科学与技术学报 , 2024 , 6 ( 2 ): 201 - 209 .

宋明 , 刘彦隆 . Bert在微博短文本情感分类中的应用与优化 [J ] . 小型微型计算机系统 , 2021 , 42 ( 04 ): 714 - 718 .

Hochreiter S . Long Short-term Memory [J ] . Neural Computation MIT-Press , 1997 .

Vaswani A . Attention is all you need [J ] . Advances in Neural Information Processing Systems , 2017 .

Xu J , Sun X , Zhang Z , et al . Understanding and improving layer normalization [J ] . Advances in neural information processing systems , 2019 , 32 .

Yuan Z , Li W , Xu H , et al . Transformer-based feature reconstruction network for robust multimodal sentiment analysis [C ] // Proceedings of the 29th ACM international conference on multimedia . 2021 : 4400 - 4407 .

Zadeh A , Zellers R , Pincus E , et al . Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages [J ] . IEEE Intelligent Systems , 2016 , 31 ( 6 ): 82 - 88 .

Zadeh A A B , Liang P P , Poria S , et al . Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph [C ] // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 2018 : 2236 - 2246 .

Yu W , Xu H , Meng F , et al . Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality [C ] // Proceedings of the 58th annual meeting of the association for computational linguistics . 2020 : 3718 - 3727 .

Yuan Z , Li W , Xu H , et al . Transformer-based feature reconstruction network for robust multimodal sentiment analysis [C ] // Proceedings of the 29th ACM International Conference on Multimedia . 2021 : 4400 - 4407 .

Mao H , Yuan Z , Xu H , et al . M-SENA: An integrated platform for multimodal sentiment analysis [C ] // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations . 2022 : 204 - 213 .

Degottex G , Kane J , Drugman T , et al . COVAREP—A collaborative voice analysis repository for speech technologies [C ] // 2014 ieee international conference on acoustics, speech and signal processing (icassp) . IEEE , 2014 : 960 - 964 .

McFee B , Raffel C , Liang D , et al . librosa: Audio and music signal analysis in python [C ] // SciPy . 2015 : 18 - 24 .

Sun Y , Mai S , Hu H . Learning to learn better unimodal representations via adaptive multimodal meta-learning [J ] . IEEE Transactions on Affective Computing , 2022 , 14 ( 3 ): 2209 - 2223 .

Du J , Jin J , Zhuang J , et al . Hierarchical graph contrastive learning of local and global presentation for multimodal sentiment analysis [J ] . Scientific Reports , 2024 , 14 ( 1 ): 5335 .

Wang P , Liu S , Chen J . CCDA: A Novel Method to Explore the Cross-Correlation in Dual-Attention for Multimodal Sentiment Analysis [J ] . Applied Sciences , 2024 , 14 ( 5 ): 1934 .

Van der Maaten L , Hinton G . Visualizing data using t-SNE [J ] . Journal of machine learning research , 2008 , 9 ( 11 ).

柯善军 , 聂成洋 , 王钰苗 , 等 . 基于集成学习与聚类联合标注的多模态个体情绪识别 [J ] . 智能科学与技术学报 , 2024 , 6 ( 1 ): 76 - 87 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于深度学习的自动驾驶多模态轨迹预测方法：现状及展望

基于自适应池化注意力的Transformer的唇语识别方法