基于 Deep Belief+Nets+的中文名实体关系抽取

  • 格式:pdf
  • 大小:863.66 KB
  • 文档页数:14

下载文档原格式

  / 14
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

软件学报ISSN 1000-9825, CODEN RUXUEW E-mail: jos@

Journal of Software,2012,23(10):2572−2585 [doi: 10.3724/SP.J.1001.2012.04181]

+86-10-62562563 ©中国科学院软件研究所版权所有. Tel/Fax:

基于Deep Belief Nets的中文名实体关系抽取

陈宇, 郑德权+, 赵铁军

(哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨 150001)

Chinese Relation Extraction Based on Deep Belief Nets

CHEN Yu, ZHENG De-Quan+, ZHAO Tie-Jun

(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)

+ Corresponding author: E-mail: dqzheng@,

Chen Y, Zheng DQ, Zhao TJ. Chinese relation extraction based on Deep Belief Nets. Journal of Software,

2012,23(10):2572−2585 (in Chinese). /1000-9825/4181.htm

Abstract: Relation extraction is a fundamental task in information extraction, which is to identify the semantic

relationships between two entities in the text. In this paper, deep belief nets (DBN), which is a classifier of a

combination of several unsupervised learning networks, named RBM (restricted Boltzmann machine) and a

supervised learning network named BP (back-propagation), is presented to detect and classify the relationships

among Chinese name entities. The RBM layers maintain as much information as possible when feature vectors are

transferred to next layer. The BP layer is trained to classify the features generated by the last RBM layer. The

experiments are conducted on the Automatic Content Extraction 2004 dataset. This paper proves that a

character-based feature is more suitable for Chinese relation extraction than a word-based feature. In addition, the

paper also performs a set of experiments to assess the Chinese relation extraction on different assumptions of an

entity categorization feature. These experiments showed the comparison among models with correct entity types and

imperfect entity type classified by DBN and without entity type. The results show that DBN is a successful

approach in the high-dimensional-feature-space information extraction task. It outperforms state-of-the-art learning

models such as SVM and back-propagation networks.

Key words: DBN (deep belief nets); neural network; relation extraction; deep architecture network;

character-based feature

摘要: 关系抽取是信息抽取的一项子任务,用以识别文本中实体之间的语义关系.提出一种利用DBN(deep

belief nets)模型进行基于特征的实体关系抽取方法,该模型是由多层无监督的RBM(restricted Boltzmann machine)网

络和一层有监督的BP(back-propagation)网络组成的神经网络分类器. RBM网络以确保特征向量映射达到最优,最

后一层BP网络分类RBM网络的输出特征向量,从而训练实体关系分类器.在ACE04语料上进行的相关测试,一方

面证明了字特征比词特征更适用于中文关系抽取任务;另一方面设计了3组不同的实验,分别使用正确的实体类别

信息、通过实体类型分类器得到实体类型信息和不使用实体类型信息,用以比较实体类型信息对关系抽取效果的影

响.实验结果表明,DBN非常适用于基于高维空间特征的信息抽取任务,获得的效果比SVM和反向传播网络更好.

关键词: DBN(deep belief nets);神经网络;关系抽取;深层网络;字特征

∗基金项目: 国家自然科学基金(61073130); 国家高技术研究发展计划(863)(2011AA01A207)

收稿时间:2011-06-16; 修改时间: 2011-08-09; 定稿时间: 2012-01-16