东北石油大学计算机与信息技术学院,黑龙江 大庆 163318
[ "曹茂俊(1978- ),男,博士,东北石油大学计算机与信息技术学院副教授,主要研究方向为油气领域人工智能、领域软件工程、测井工业软件、知识图谱、大模型。" ]
[ "田明家(2002- ),男,东北石油大学计算机与信息技术学院硕士生,主要研究方向为自然语言处理、知识图谱、大模型。" ]
[ "肖阳(1998- ),男,东北石油大学计算机与信息技术学院硕士生,主要研究方向为自然语言处理、知识图谱。" ]
收稿:2025-06-04,
修回:2025-09-16,
录用:2025-09-18,
纸质出版:2025-12-15
移动端阅览
曹茂俊,田明家,肖阳.融合K-BERT与KG-BART的测井文本生成方法研究[J].智能科学与技术学报,2025,07(04):444-453.
CAO Maojun,TIAN Mingjia,XIAO Yang.Research on well logging text generation method combining K-BERT and KG-BART[J].Chinese Journal of Intelligent Science and Technology,2025,07(04):444-453.
曹茂俊,田明家,肖阳.融合K-BERT与KG-BART的测井文本生成方法研究[J].智能科学与技术学报,2025,07(04):444-453. DOI: 10.11959/j.issn.2096-6652.202538.
CAO Maojun,TIAN Mingjia,XIAO Yang.Research on well logging text generation method combining K-BERT and KG-BART[J].Chinese Journal of Intelligent Science and Technology,2025,07(04):444-453. DOI: 10.11959/j.issn.2096-6652.202538.
测井文本生成是油气勘探开发中的关键环节,其质量直接影响地层构造解释的效率与准确性。现有方法主要包括基于模板的规则策略、统计摘要技术以及基于循环神经网络(recurrent neural network,RNN)/Transformer的小规模数据驱动模型,但这些方法普遍存在领域知识利用率不足、长文本语境与逻辑一致性差以及缺少多任务协同机制等问题。针对中文测井文本的高专业性与复杂性,提出一种融合知识增强型基于Transformer的双向编码器表示(knowledge-enhanced bidirectional encoder representations from transformer,K-BERT)语义理解与知识图谱增强型双向自回归Transformer(knowledge graph-enhanced bidirectional and auto-regressive transformer,KG-BART)生成能力的多任务模型K2-KGLogGen。该模型通过引入测井领域知识图谱以增强语义感知,利用分类模块提供类别语境引导,并借助自注意力机制实现分类与生成的协同优化。实验结果表明,在分类任务中,K2-KGLogGen模型的F1-score相较于现有主流模型均有显著提升。其中,相较于K-BERT(单任务)提升约2.2%,相较于BERT模型、文本卷积神经网络(text convolutional neural network,TextCNN)、支持向量机+词频-逆文档频率(support vector machine+term frequency-inverse document frequency,SVM+TF-IDF)分别提升3.2%、4.7%及9.3%;在生成任务中,ROUGE-1、ROUGE-2和ROUGE-L分别达0.63、0.41和0.54,显著优于Transformer、文本到文本迁移Transformer(text-to-text transfer transformer,T5)、统一语言模型(unified language model,UniLM)、指针生成网络(pointer generator network,PGN)和BART等方法。消融实验进一步验证了自注意力机制与知识注入模块对性能提升的关键作用,表明K2-KGLogGen模型在专业测井文本生成中具有显著优势,并在其他高专业性技术文本生成任务中具有推广价值。
Well-logging text generation is a key task in oil and gas exploration and development
and its quality directly affects the efficiency and accuracy of subsequent geological structure interpretation. Existing approaches mainly include template-based rule strategies
statistical summarization techniques
and small-scale data-driven models based on recurrent neural network (RNN)/Transformer architectures. However
these methods generally have problems such as insufficient utilization of domain knowledge
poor contextual and logical consistency in long-text generation
and the absence of multi-task collaborative learning mechanisms. To address the high specialization and complexity of Chinese well-logging texts
this paper proposed a multi-task model
K2-KGLogGen
which integrated the semantic understanding capability of knowledge-enhanced bidirectional encoder representations from transformer (K-BERT) with the text generation strength of knowledge graph-enhanced bidirectional and auto-regressive transformer (KG-BART). The model incorporated a well-logging domain knowledge graph to enhance semantic awareness
used a classification module to provide category-specific contextual guidance
and employed a self-attention fusion mechanism to achieve joint optimization of classification and generation. The experimental results show that in the classification task
the proposed model achieves significant improvements in F1-score compared to existing mainstream models. Specifically
it outperforms K-BERT (single-task) by approximately 2.2%
BERT by 3.2%
text convolutional neural network (TextCNN) by 4.7%
and support vector machine + term frequency-inverse document frequency (SVM+TF-IDF) by 9.3%. In the generation task
it attains ROUGE-1
ROUGE-2
and ROUGE-L scores of 0.63
0.41 and 0.54
significantly outperforming Transformer
text-to-text transfer transformer (T5)
unified language model (UniLM)
pointer generator network (PGN)
and BART. Ablation studies confirm that the self-attention mechanism and knowledge injection module are key contributors to performance gains. The results demonstrate the effectiveness of K2-KGLogGen in professional well-logging text generation and its potential applicability to other highly specialized technical text generation tasks.
BANAEE H, AHMED M U, LOUTFI A. Automatic generation of patient's daily summaries from electronic health records: a template-based approach[J]. IEEE Journal of Biomedical and Health Informatics, 2019, 23(5): 2103-2113.
EL-KASSAS W S, SALAMA C R, RAFEA A A, et al. Automatic text summarization: a comprehensive survey[J]. Expert Systems with Applications, 2021, 165: 113679.
张汉清, 宋昊林, 李绍宇, 等. 基于Transformer的预训练语言模型在可控文本生成中的研究综述[J]. ACM计算调查, 2023, 56(3): 1-37.
ZHANG H Q, SONG H L, LI S Y, et al. A survey of controllable text generation using transformer-based pre-trained language models[J]. ACM Computing Surveys, 2023, 56(3): 1-37.
李宁, 徐彬森, 武宏亮, 等. 人工智能在测井地层评价中的应用现状及前景[J]. 石油学报, 2021, 42(4): 508-522.
LI N, XU B S, WU H L, et al. Application status and prospects of artificial intelligence in well logging and formation evaluation[J]. Acta Petrolei Sinica, 2021, 42(4): 508-522.
石玉江, 刘国强, 钟吉彬, 等. 基于大数据的测井智能解释系统开发与应用[J]. 中国石油勘探, 2021, 26(2): 113-126.
SHI Y J, LIU G Q, ZHONG J B, et al. Development and application of intelligent logging interpretation system based on big data[J]. China Petroleum Exploration, 2021, 26(2): 113-126.
奚雪峰, 周国栋. 面向自然语言处理的深度学习研究[J]. 自动化学报, 2016, 42(10): 1445-1465.
XI X F, ZHOU G D. A survey on deep learning for natural language processing[J]. Acta Automatica Sinica, 2016, 42(10): 1445-1465.
KOROTEEV M V. BERT: a review of applications in natural language processing and understanding[EB ] . arXiv preprint, 2021, arXiv: 2103.11943 .
JAWAHAR G, SAGOT B, SEDDAH D. What does BERT learn about the structure of language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 3651-3657.
LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[EB ] . arXiv preprint, 2019, arXiv: 1910.13461 .
LIU W J, ZHOU P, ZHAO Z, et al. K-BERT: enabling language representation with knowledge graph[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(3): 2901-2908.
LIU Y, WAN Y, HE L F, et al. KG-BART: knowledge graph-augmented BART for generative commonsense reasoning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(7): 6418-6425.
张天成, 田雪, 孙相会, 等. 知识图谱嵌入技术研究综述[J]. 软件学报, 2023, 34(1): 277-311.
ZHANG T C, TIAN X, SUN X H, et al. Overview on knowledge graph embedding technology research[J]. Journal of Software, 2023, 34(1): 277-311.
石磊, 王毅, 成颖, 等. 自然语言处理中的注意力机制研究综述[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
SHI L, WANG Y, CHENG Y, et al. Review of attention mechanism in natural language processing[J]. Data Analysis and Knowledge Discovery, 2020, 4(5): 1-14.
朱靖波, 陈文亮. 基于领域知识的文本分类[J]. 东北大学学报(自然科学版), 2005, 26(8): 733-735.
ZHU J B, CHEN W L. An approach based on domain knowledge to text categorization[J]. Journal of Northeastern University (Natural Science), 2005, 26(8): 733-735.
黄刚. 知识图谱构建方法及其在油气勘探开发领域应用研究[D]. 大庆: 东北石油大学, 2019.
HUANG G. Research on knowledge graph construction methods and their application in oil and gas exploration and development[D]. Daqing: Northeast Petroleum University, 2019.
许智双, 张昆, 范俊超, 等. 基于本体的网络安全知识图谱构建方法[J]. 信息网络安全, 2025, 25(3): 451-466.
XU Z S, ZHANG K, FAN J C, et al. Construction Method of Cybersecurity Knowledge Graph Based on Ontology[J]. Netinfo Security, 2025, 25(3): 451-466.
ASMARA S M, SAHABUDIN N A, NADIAH I N S, et al. A review of knowledge graph embedding methods of TransE, TransH and TransR for missing links[C]//Proceedings of the 2023 IEEE 8th International Conference on Software Engineering and Computer Systems (ICSECS). Piscataway: IEEE, 2023: 470-475.
YANG B S, YIH W T, HE X D, et al. Embedding entities and relations for learning and inference in knowledge bases[EB ] . arXiv preprint, 2014, arXiv: 1412.6575 .
SUN Z Q, DENG Z H, NIE J Y, et al. RotatE: knowledge graph embedding by relational rotation in complex space[EB ] . arXiv preprint, 2019, arXiv: 1902.10197 .
鞠默然, 罗江宁, 王仲博, 等. 融合注意力机制的多尺度目标检测算法[J]. 光学学报, 2020, 40(13): 132-140.
JU M R, LUO J N, WANG Z B, et al. Multi-scale target detection algorithm based on attention mechanism[J]. Acta Optica Sinica, 2020, 40(13): 132-140.
李思聪, 王飞, 魏子令, 等. 面向恶意代码检测的深度注意力网络架构[J]. 信息网络安全, 2025, 25(8): 1208-1222.
LI S C, WANG F, WEI Z L, et al. Deep attention network architecture for malicious code detection[J]. Netinfo Security, 2025, 25(8): 1208-1222.
张钰, 刘建伟, 左信. 多任务学习[J]. 计算机学报, 2020, 43(7): 1340-1378.
ZHANG Y, LIU J W, ZUO X. Survey of multi-task learning[J]. Chinese Journal of Computers, 2020, 43(7): 1340-1378.
SOUZA F, NOGUEIRA R, LOTUFO R. Portuguese named entity recognition using BERT-CRF[EB ] . arXiv preprint, 2019, arXiv: 1909.10649 .
0
浏览量
29
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
