
浏览全部资源
扫码关注微信
移动端阅览
双级门控分段式多模态情绪识别方法[J/OL]. 智能科学与技术学报, 2025.
Dual-Stage Gated Segmented Multimodal Emotion Recognition Method—***1 , ***1 , ***2 , ***1[J/OL]. Chinese journal of intelligent science and technology, 2025.
多模态情绪识别在心理健康检测与机器情感分析中应用广泛,但现有研究多依赖全局或局部特征,忽略两者的联合建模,限制了情绪识别性能。为此,提出了一种双级门控分段式融合架构,包括交互阶段与双级门控阶段,交互阶段采用OAGL融合策略建模全局-局部跨模态交互,优化特征融合效率;双级门控阶段整合局部与全局特征,充分利用情绪信息。此外,针对模态间局部时序特征不对齐的问题,设计了基于缩放点积的序列对齐方法以提升融合精度。实验结果表明,该方法在多数据集上的识别效果优于现有算法,验证了其捕捉情绪细节的能力与泛化性能。
Multimodal emotion recognition has broad applications in psychological health monitoring and affective computing. However
most existing studies rely on either global or local features
neglecting the joint modeling of both
which limits performance. To address this
a dual-stage gated segmented fusion architecture is proposed
consisting of an interaction stage and a dual-stage gating stage. The interaction stage leverages the OAGL fusion strategy to model global-local cross-modal interactions
improving the efficiency of feature fusion. The dual-stage gating stage integrates local and global features to fully utilize emotional information. Additionally
to resolve the misalignment of local temporal features across modalities
a scaled dot-product-based sequence alignment method is designed to enhance fusion accuracy. Experimental results demonstrate that the proposed method outperforms representative algorithms on multiple datasets
validating its ability to capture emotional details and its strong generalization capability.
潘家辉 , 何志鹏 , 李自娜 , 等 . 多模态情绪识别研究综述 [J ] . 智能系统学报 , 2020 , 15 ( 04 ): 633 - 645 .
Pillalamarri R , Shanmugam U . A review on EEG-based multimodal learning for emotion recognition [J ] . Artificial Intelligence Review , 2025 , 58 ( 5 ): 131 .
Hossain M R , Hoque M M , Dewan M A A , et al . AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution [J ] . Expert Systems with Applications , 2025 , 262 : 125643 .
Gan Y , You Y , Huang J , et al . Multi-view Clustering via Multi-stage Fusion [J ] . IEEE Transactions on Multimedia , 2025 .
Fu Z , Liu F , Xu Q , et al . LMR-CBT: Learning modality-fused representations with CB-transformer for multimodal emotion recognition from unaligned multimodal sequences [J ] . Frontiers of Computer Science , 2024 , 18 ( 4 ): 184314 .
Liu F , Fu Z , Wang Y , et al . TACFN: transformer-based adaptive cross-modal fusion network for multimodal emotion recognition [J ] . CAAI Artificial Intelligence Research , 2023 , 2 : 9150019 .
Yang Z , He Q , Du N , et al . Temporal text-guided feedback-based progressive fusion network for multimodal sentiment analysis [J ] . Alexandria Engineering Journal , 2025 , 116 : 699 - 709 .
Zhu L , Zhao H , Zhu Z , et al . Multimodal sentiment analysis with unimodal label generation and modality decomposition [J ] . Information Fusion , 2025 , 116 : 102787 .
Wang R , Yang Q , Tian S , et al . Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis [J ] . Neurocomputing , 2025 , 618 : 129163 .
Fu Z , Liu F , Xu Q , et al . Nhfnet: A non-homogeneous fusion network for multimodal sentiment analysis [C ] // 2022 IEEE International Conference on Multimedia and Expo (ICME) . IEEE , 2022 : 1 - 6 .
Hazarika D , Zimmermann R , Poria S . Misa: Modality-invariant and-specific representations for multimodal sentiment analysis [C ] // Proceedings of the 28th ACM international conference on multimedia . 2020 : 1122 - 1131 .
Liu F , Shen S Y , Fu Z W , et al . Lgcct: A light gated and crossed complementation transformer for multimodal speech emotion recognition [J ] . Entropy , 2022 , 24 ( 7 ): 1010 .
Pham H , Liang P P , Manzini T , et al . Found in translation: Learning robust joint representations by cyclic translations between modalities [C ] // Proceedings of the AAAI conference on artificial intelligence . 2019 , 33 ( 01 ): 6892 - 6899 .
Tsai Y H H , Bai S , Liang P P , et al . Multimodal transformer for unaligned multimodal language sequences [C ] // Proceedings of the conference . Association for computational linguistics . Meeting. NIH Public Access , 2019 , 2019 : 6558 .
Yu W , Xu H , Yuan Z , et al . Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C ] // Proceedings of the AAAI conference on artificial intelligence . 2021 , 35 ( 12 ): 10790 - 10797 .
Sun L , Lian Z , Liu B , et al . Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis [J ] . IEEE Transactions on Affective Computing , 2023 , 15 ( 1 ): 309 - 325 .
Devlin J , Chang M W , Lee K , et al . Bert: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) . 2019 : 4171 - 4186 .
角远韬 , 李润梅 , 王剑 . 基于模糊自然语言处理的铁路CTC接口文本智能测试方法 [J ] . 智能科学与技术学报 , 2024 , 6 ( 2 ): 201 - 209 .
宋明 , 刘彦隆 . Bert在微博短文本情感分类中的应用 与优化 [J ] . 小 型 微 型 计 算 机 系 统 , 2021 , 42 ( 04 ): 714 - 718 .
Hochreiter S . Long Short-term Memory [J ] . Neural Computation MIT-Press , 1997 .
Vaswani A . Attention is all you need [J ] . Advances in Neural Information Processing Systems , 2017 .
Xu J , Sun X , Zhang Z , et al . Understanding and improving layer normalization [J ] . Advances in neural information processing systems , 2019 , 32 .
Yuan Z , Li W , Xu H , et al . Transformer-based feature reconstruction network for robust multimodal sentiment analysis [C ] // Proceedings of the 29th ACM international conference on multimedia . 2021 : 4400 - 4407 .
Zadeh A , Zellers R , Pincus E , et al . Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages [J ] . IEEE Intelligent Systems , 2016 , 31 ( 6 ): 82 - 88 .
Zadeh A A B , Liang P P , Poria S , et al . Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph [C ] // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 2018 : 2236 - 2246 .
Yu W , Xu H , Meng F , et al . Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality [C ] // Proceedings of the 58th annual meeting of the association for computational linguistics . 2020 : 3718 - 3727 .
Yuan Z , Li W , Xu H , et al . Transformer-based feature reconstruction network for robust multimodal sentiment analysis [C ] // Proceedings of the 29th ACM International Conference on Multimedia . 2021 : 4400 - 4407 .
Hazarika D , Zimmermann R , Poria S . Misa: Modality-invariant and-specific representations for multimodal sentiment analysis [C ] // Proceedings of the 28th ACM international conference on multimedia . 2020 : 1122 - 1131 .
Mao H , Yuan Z , Xu H , et al . M-SENA: An integrated platform for multimodal sentiment analysis [C ] // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations . 2022 : 204 - 213 .
Degottex G , Kane J , Drugman T , et al . COVAREP—A collaborative voice analysis repository for speech technologies [C ] // 2014 ieee international conference on acoustics, speech and signal processing (icassp) . IEEE , 2014 : 960 - 964 .
McFee B , Raffel C , Liang D , et al . librosa: Audio and music signal analysis in python [C ] // SciPy . 2015 : 18 - 24 .
Sun Y , Mai S , Hu H . Learning to learn better unimodal representations via adaptive multimodal meta-learning [J ] . IEEE Transactions on Affective Computing , 2022 , 14 ( 3 ): 2209 - 2223 .
Du J , Jin J , Zhuang J , et al . Hierarchical graph contrastive learning of local and global presentation for multimodal sentiment analysis [J ] . Scientific Reports , 2024 , 14 ( 1 ): 5335 .
Wang P , Liu S , Chen J . CCDA: A Novel Method to Explore the Cross-Correlation in Dual-Attention for Multimodal Sentiment Analysis [J ] . Applied Sciences , 2024 , 14 ( 5 ): 1934 .
Van der Maaten L , Hinton G . Visualizing data using t-SNE [J ] . Journal of machine learning research , 2008 , 9 ( 11 ).
柯善军 , 聂成洋 , 王钰苗 , 等 . 基于集成学习与聚类联合标注的多模态个体情绪识别 [J ] . 智能科学与技术学报 , 2024 , 6 ( 1 ): 76 - 87 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621