HCANet：基于分层Transformer架构的微表情识别模型

杨澳男; 莫红; 赵世礼; 欧阳玉琦

doi:10.11959/j.issn.2096-6652.202525

您当前的位置：

首页 >

文章列表页 >

HCANet：基于分层Transformer架构的微表情识别模型

学术论文 | 更新时间：2025-07-21

- HCANet：基于分层Transformer架构的微表情识别模型
- HCANet: a micro expression recognition model based on hierarchical Transformer architecture
- 智能科学与技术学报 2025年7卷第2期页码：277-286
- 作者机构：
  
  长沙理工大学电气与信息工程学院，湖南长沙 410114
- 作者简介：
  
  [ "杨澳男（1999- ），男，长沙理工大学电气与信息工程学院硕士生，主要研究方向为数字图像与机器视觉。" ]
  [ "莫红（1972- ），女，长沙理工大学电气与信息工程学院教授，主要研究方向为智慧医疗、模糊AI和复杂系统管理与控制。" ]
  [ "赵世礼（2000- ），男，长沙理工大学电气与信息工程学院硕士生，主要研究方向为数字图像与机器视觉。" ]
  [ "欧阳玉琦（2000- ），男，长沙理工大学电气与信息工程学院硕士生，主要研究方向为数字图像与机器视觉。" ]
- 基金信息：
- DOI：10.11959/j.issn.2096-6652.202525
  中图分类号： TP391.4
- 收稿日期：2025-05-11，
  
  修回日期：2025-05-29，
  
  纸质出版日期：2025-06-15
- 稿件说明：
移动端阅览
杨澳男,莫红,赵世礼等.HCANet：基于分层Transformer架构的微表情识别模型[J].智能科学与技术学报,2025,07(02):277-286.

YANG Aonan,MO Hong,ZHAO Shili,et al.HCANet: a micro expression recognition model based on hierarchical Transformer architecture[J].Chinese Journal of Intelligent Science and Technology,2025,07(02):277-286.
杨澳男,莫红,赵世礼等.HCANet：基于分层Transformer架构的微表情识别模型[J].智能科学与技术学报,2025,07(02):277-286. DOI： 10.11959/j.issn.2096-6652.202525.

YANG Aonan,MO Hong,ZHAO Shili,et al.HCANet: a micro expression recognition model based on hierarchical Transformer architecture[J].Chinese Journal of Intelligent Science and Technology,2025,07(02):277-286. DOI： 10.11959/j.issn.2096-6652.202525.

摘要

面部微表情通过短暂肌肉运动展示了人类的真实情感，具有微妙且不自觉的特点。为了探讨面部标记点固有的空间关联，提高微表情识别准确率，采用分层连续注意力网络（hierarchical continuous attention network，HCANet）以有效利用自注意力机制捕捉序列中标记点间的关系，通过对起始帧与峰值帧的光流差异进行建模来避免直接从完整视频帧提取特征时忽视局部细节，减少了对身份信息的干扰。HCANet主要由Transformer层和聚合层组成，首先将人脸分为4个区域，在Transformer层中，引入连续注意力块（continuous attention block，CAB）专注于单个区域的局部细微肌肉运动用于提取本地时态特征；其次通过跨层注意力传递机制，在聚合层中专注于学习各区域之间的相互作用以提取全局语义面部特征；最后在4个公开微表情数据集（CASME Ⅱ、CASME Ⅲ、SMIC、SAMM）上，使用留一交叉验证与其他6个算法进行了对比验证。实验结果表明：HCANet在CASME Ⅲ、SMIC、SAMM数据集上的分类准确率都有提升，在复杂场景（如低帧率、背景噪声）中展现出更强的鲁棒性。

Abstract

Facial micro expressions are subtle

involuntary facial movements that reveal true emotions. To enhance recognition accuracy by exploiting spatial correlations among facial landmarks

a hierarchical Transformer architecture hierarchical continuous attention network (HCANet) was proposed to effectively leverage the self-attention mechanism for capturing relationships between landmarks in sequences. HCANet models optical flow differences between onset and apex frames to capture local details and reduce identity interference

thereby avoiding the oversight of local details when directly extracting features from full video frames It consists of a Transformer layer for local temporal feature extraction and an aggregation layer for global facial feature learning. Initially

the face was divided into four regions. Within the Transformer layer

continuous attention block (CAB) was introduced to focus on the local

minute muscular movements within individual regions for extracting local temporal features. Subsequently

the aggregation layer concentrated on learning the inter-region interactions to extract global semantic facial features through a cross-layer attention mechanism. Finally

comparative validations were conducted using leave-one-out-cross-validation on four publicly available micro expression datasets (CASME Ⅱ

CASME Ⅲ

SMIC

SAMM) against six other algorithms. The results demonstrate that HCANet achieves improved classification accuracy on the CASME Ⅲ

SMIC and SAMM datasets

and exhibits stronger robustness in complex scenarios (e.g.

low frame rates

background noise).

关键词

Keywords

references

WANG Y, SUN Y X, HUANG Y W, et al. FERV39k: a large-scale multi-scene dataset for facial expression recognition in videos[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 20890-20899.

GAN Y S, LIONG S T, YAU W C, et al. OFF-ApexNet on micro-expression recognition system[J]. Signal Processing: Image Communication, 2019, 74: 129-139.

LIU Y C, DU H M, ZHENG L, et al. A neural micro-expression recognizer[C]//Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). Piscataway: IEEE Press, 2019: 1-4.

LIONG S T, SEE J, WONG K, et al. Less is more: Micro-expression recognition from video using apex frame[J]. Signal Processing: Image Communication, 2018, 62: 82-92.

LIU Y J, ZHANG J K, YAN W J, et al. A main directional mean optical flow feature for spontaneous micro-expression recognition[J]. IEEE Transactions on Affective Computing, 2016, 7(4): 299-310.

HAPPY S L, ROUTRAY A. Fuzzy histogram of optical flow orientations for micro-expression recognition[J]. IEEE Transactions on Affective Computing, 2019, 10(3): 394-406.

王飞跃, 曹东璞, 魏庆来. 强化学习: 迈向知行合一的智能机制与算法[J]. 智能科学与技术学报, 2020, 2(2): 101-106.

WANG F Y, CAO D P, WEI Q L. Reinforcement learning: toward action-knowledge merged intelligent mechanisms and algorithms[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(2): 101-106.

ZHAO G Y, PIETIKÄINEN M. Dynamic texture recognition using local binary patterns with an application to facial expressions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915-928.

LEI L, CHEN T, LI S G, et al. Micro-expression recognition based on facial graph representation learning and facial action unit fusion[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2021: 1571-1580.

ZHANG K H, HUANG Y Z, WU H, et al. Facial smile detection based on deep learning features[C]//Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). Piscataway: IEEE Press, 2015: 534-538.

ZHANG K H, HUANG Y Z, DU Y, et al. Facial expression recognition based on deep evolutional spatial-temporal networks[J]. IEEE Transactions on Image Processing, 2017, 26(9): 4193-4203.

NIU W J, ZHANG K H, LI D X, et al. Four-player GroupGAN for weak expression recognition via latent expression magnification[J]. Knowledge-Based Systems, 2022, 251: 109304.

LIONG S T, GAN Y S, SEE J, et al. Shallow triple stream three-dimensional CNN (STSTNet) for micro-expression recognition[C]//Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). Piscataway: IEEE Press, 2019: 1-5.

XIA Z Q, PENG W, KHOR H Q, et al. Revealing the invisible with model and data shrinking for composite-database micro-expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 8590-8605.

ZHOU L, MAO Q R, HUANG X H, et al. Feature refinement: an expression-specific feature learning and fusion method for micro-expression recognition[J]. Pattern Recognition, 2022, 122: 108275.

WANG Z F, ZHANG K H, LUO W H, et al. HTNet for micro-expression recognition[J]. Neurocomputing, 2024, 602: 128196.

LI H T, SUI M Z, ZHU Z Q, et al. MMNet: Muscle motion-guided network for micro-expression recognition[EB ] . arXiv preprint, 2022: arXiv: 2201.05297 .

LIONG S T, SEE J, WONG K, et al. Automatic apex frame spotting in micro-expression database[C]//Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). Piscataway: IEEE Press, 2015: 665-669.

JOSE E, M G, HARIDAS M T P, et al. Face recognition based surveillance system using FaceNet and MTCNN on jetson TX2[C]//Proceedings of the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS). Piscataway: IEEE Press, 2019: 608-613.

WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). Berlin: Springer, 2018: 3-19.

YAN W J, LI X B, WANG S J, et al. CASME II: an improved spontaneous micro-expression database and the baseline evaluation[J]. PLoS One, 2014, 9(1): e86041.

DAVISON A K, LANSLEY C, COSTEN N, et al. SAMM: a spontaneous micro-facial movement dataset[J]. IEEE Transactions on Affective Computing, 2018, 9(1): 116-129.

LI J T, DONG Z Z, LU S Y, et al. CAS(ME) 3 : a third generation facial spontaneous micro-expression database with depth information and high ecological validity[J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 2782-2800.

LI X B, PFISTER T, HUANG X H, et al. A spontaneous micro-expression database: inducement, collection and baseline[C]//Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). Piscataway: IEEE Press, 2013: 1-6.

CHANG C W, ZHONG Z Q, LIOU J J. A FPGA implementation of farneback optical flow by high-level synthesis[C]//Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York: ACM Press, 2019: 309-309.

NGUYEN X B, DUONG C N, LI X, et al. Micron-BERT: BERT-based facial micro-expression recognition[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 1482-1492.

刘栋军, 王宇涵, 凌文芬, 等. 基于脑机协同智能的情绪识别[J]. 智能科学与技术学报, 2021, 3(1): 65-75.

LIU D J, WANG Y H, LING W F, et al. Emotion recognition based on brain and machine collaborative intelligence[J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(1): 65-75.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于情感信息融合注意力机制的抑郁症识别

基于自适应池化注意力Transformer的唇语识别方法

基于跨空间多尺度信息聚合和推理一致性的域泛化方法

注意力机制增强的输煤传送带异物检测

融合句法增强与语义增强的方面情感分析