浏览全部资源
扫码关注微信
1.兰州大学信息科学与工程学院,甘肃 兰州 730000
2.兰州市公安局,甘肃 兰州 730000
[ "祁瑞艳(2001- ),女,兰州大学信息科学与工程学院硕士生,主要研究方向为深度学习、自然语言处理等。" ]
[ "李龙杰(1981- ),男,兰州大学信息科学与工程学院博士生,主要研究方向为大数据分析与应用、图机器学习、深度学习等。" ]
[ "徐世琤(2000- ),男,兰州大学信息科学与工程学院硕士生,主要研究方向为深度学习、自然语言处理等。" ]
[ "马笠恭(1969- ),男,本科,兰州市公安局网络安全保卫支队,主要研究方向为网络安全管理、网络安全防护等。" ]
[ "马志新(1973- ),男,博士,兰州大学信息科学与工程学院教授,主要研究方向为大数据分析与应用、复杂网络分析、网络空间安全等。" ]
收稿日期:2024-08-13,
修回日期:2024-09-24,
纸质出版日期:2024-12-15
移动端阅览
祁瑞艳,李龙杰,徐世琤等.基于跨度与类别增强的中文新闻命名实体识别[J].智能科学与技术学报,2024,06(04):495-508.
QI Ruiyan,LI Longjie,XU Shicheng,et al.Named entity recognition based on span and category enhancement for Chinese news[J].Chinese Journal of Intelligent Science and Technology,2024,06(04):495-508.
祁瑞艳,李龙杰,徐世琤等.基于跨度与类别增强的中文新闻命名实体识别[J].智能科学与技术学报,2024,06(04):495-508. DOI: 10.11959/j.issn.2096-6652.202437.
QI Ruiyan,LI Longjie,XU Shicheng,et al.Named entity recognition based on span and category enhancement for Chinese news[J].Chinese Journal of Intelligent Science and Technology,2024,06(04):495-508. DOI: 10.11959/j.issn.2096-6652.202437.
在新闻领域,识别命名实体涉及复杂的语法结构和长名称,这为确定实体边界带来了挑战,同时也引发了序列标注方法在预测长实体时的提前中断问题。为了应对这些挑战,提出了一种基于跨度与类别增强的中文新闻命名实体识别模型——SpaCE。该模型基于Transformer结构的双向编码器表示预训练模型(BERT),通过跨度预测和类别描述增强,提升了识别性能。在编码新闻文本信息的过程中,模型结合类别描述以增强语义知识,并采用基于跨度的解码方式来解决长实体预测中断问题。另外,通过精确标记的方法引入词边界信息,并优化实体匹配策略,有效减少了由跨度解码引起的非实体匹配情形。与基线模型相比,SpaCE在3个数据集上的性能均有所提升。另外,在无序文本上,SpaCE仍表现出了较强的命名实体识别能力,具有很好的鲁棒性。
In the field of news
the identification of named entities is complicated by complex syntactic structures and long entity names
which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges
a model named SpaCE (span and category enhancement for Chinese news named entity recognition) was proposed. This model was developed based on the bidirectional encoder representation pre-trained model with a Transformer structure (BERT) and was enhanced by span prediction and category description to improve recognition performance. During the encoding of news text information
category descriptions were incorporated to enhance semantic knowledge
and a span-based decoding method was adopted to address interruptions in predicting long entities. Furthermore
word boundary information was introduced through precise labeling
and the entity matching strategy was optimized
effectively reducing non-entity matching caused by span decoding. Compared to baseline models
SpaCE demonstrated improved performance on three datasets. Furthermore
SpaCE exhibits strong named entity recognition capabilities on disordered texts
indicating its robustness.
罗兵, 张显峰, 段立, 等. 基于RoBERTa-Span-Attack的标签指针网络军事命名实体识别[J]. 海军工程大学学报, 2024, 36(1): 76-82, 93.
LUO B, ZHANG X F, DUAN L, et al. Military named entity recognition based on RoBERTa-Span-Attack tag pointer network[J]. Journal of Naval University of Engineering, 2024, 36(1): 76-82, 93.
LIU Z X, ZHU C H, ZHAO T J. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words? [C]//International Conference on Intelligent Computing. Berlin, Heidelberg: Springer, 2010: 634-640.
HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequencetagging[EB]. arXiv preprint, 2015, arXiv:1508.01991.
YU J T, BOHNET B, POESIO M. Named entity recognition as dependency parsing[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 6470-6476.
LI X Y, FENG J R, MENG Y X, et al. A unified MRC framework for named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 5849-5859.
YANG P, CONG X, SUN Z Y, et al. Enhanced language representation with label knowledge for span extraction[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 4623-4635.
YANG S, FENG D W, QIAO L B, et al. Exploring pre-trained language models for event extraction and generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 5284-5294.
郑洪浩, 宋旭晖, 于洪涛, 等. 基于深度学习的中文命名实体识别综述[J]. 信息工程大学学报, 2021, 22(5): 590-596.
ZHENG H H, SONG X H, YU H T, et al. Survey of Chinese named entity recognition based on deep learning[J]. Journal of Information Engineering University, 2021, 22(5): 590-596.
胡甜甜, 但雅波, 胡杰, 等. 基于注意力机制的Bi-LSTM结合CRF的新闻命名实体识别及其情感分类[J]. 计算机应用, 2020, 40(7): 1879-1883.
HU T T, DAN Y B, HU J, et al. News named entity recognition and sentiment classification based on attention-based bi-directional long short-term memory neural network and conditional random field[J]. Journal of Computer Applications, 2020, 40(7): 1879-1883.
郑彦斌, 夏志超, 郭智, 等. 东盟十国新闻文本的命名实体识别[J]. 科学技术与工程, 2018, 18(35): 162-168.
ZHENG Y B, XIA Z C, GUO Z, et al. Named entity recognition of news texts in ten ASEAN countries[J]. Science Technology and Engineering, 2018, 18(35): 162-168.
高国忠, 李宇, 华远鹏, 等. 基于BERT-BiLSTM-CRF模型的油气领域命名实体识别[J]. 长江大学学报(自然科学版), 2024, 21(1): 57-65.
GAO G Z, LI Y, HUA Y P, et al. Named entity recognition in oil and gas domain based on the BERT-BiLSTM-CRF model[J]. Journal of Yangtze University (Natural Science Edition), 2024, 21(1): 57-65.
HE J Z, WANG H F. Chinese named entity recognition and word segmentation based on character[C]//Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing. [s.l.:s.n.] 2008: I08-4022.
李宝昌, 郭卫斌. 词典信息分层调整的中文命名实体识别方法[J]. 华东理工大学学报(自然科学版), 2023, 49(2): 276-283.
LI B C, GUO W B. Chinese named entity recognition based on hierarchical adjustment of lexicon information[J]. Journal of East China University of Science and Technology, 2023, 49(2): 276-283.
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deepbidirectional transformers for language understanding[EB]. arXiv preprint,2018, arXiv:1810.04805.
崔文成, 王可丽, 邵虹. 基于稠密块和注意力机制的肺部病理图像异常细胞分割[J]. 智能科学与技术学报, 2023, 5(4): 525-534.
CUI W C, WANG K L, SHAO H. Abnormal cell segmentation for lung pathological image based on denseblock and attention mechanism[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(4): 525-534.
李超, 侯霞, 乔秀明. 融合知识的文博领域低资源命名实体识别方法研究[J]. 北京大学学报(自然科学版), 2024, 60(1): 13-22.
LI C, HOU X, QIAO X M. A low-resource named entity recognition method for cultural heritage field incorporating knowledge fusion[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60(1): 13-22.
于媛芳, 张勇, 左皓阳, 等. 基于角色信息引导的多轮事件论元抽取[J]. 北京大学学报(自然科学版), 2023, 59(1): 83-91.
YU Y F, ZHANG Y, ZUO H Y, et al. Multi-turn event argument extraction based on role information guidance[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2023, 59(1): 83-91.
DU X Y, CARDIE C. Event extraction by answering (almost) natural questions[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: Association for Computational Linguistics, 2020: 671‒683.
WEI Z P, SU J L, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 1476-1488.
GUI T, ZOU Y, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg: Association for Computational Linguistics, 2019: 1040-1050.
角远韬, 李润梅, 王剑. 基于模糊自然语言处理的铁路CTC接口文本智能测试方法[J]. 智能科学与技术学报, 2024, 6(2): 201-209.
JIAO Y T, LI R M, WANG J. Intelligent testing method for railway CTC interface data based on fuzzy natural language processing[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(2): 201-209.
祁鹏年, 廖雨伦, 覃飙. 基于深度学习的中文命名实体识别研究综述[J]. 小型微型计算机系统, 2023, 44(9): 1857-1868.
QI P N, LIAO Y L, QIN B. Survey on deep learning for Chinese named entity recognition[J]. Journal of Chinese Computer Systems, 2023, 44(9): 1857-1868.
MA R, PENG M, ZHANG Q, et al. Simplify the usage of lexicon inChinese NER[EB]. arXiv preprint, 2020, arXiv:1908.05969.
SU J L, MURTADHA A, PAN S F, et al. Global pointer: novel efficient span-based approach for named entity recognition[EB]. arXiv preprint,2022, arXiv:2208.03054.
0
浏览量
17
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构