通信学报 ›› 2010, Vol. 31 ›› Issue (8A): 44-47.doi: 1000-436X(2010)8A-0044-04

• 学术论文 • 上一篇    下一篇

基于概念簇的文本向量构建方法

冯扬,罗森林,潘丽敏,刘莉莉,陈开江   

  1. 北京理工大学 信息与电子学院 信息安全与对抗技术实验室,北京 100081
  • 出版日期:2010-08-25 发布日期:2017-07-03
  • 基金资助:
    国家 242 计划基金资助项目;北京理工大学基础研究基金资助项目;北京理工大学研究生科技创新基金资助项目

Method of text vector construction based on concept cluster

Yang FENG,Sen-lin LUO,Li-min PAN,Li-li LIU,Kai-jiang CHEN   

  1. Information Security and Countermeasures Laboratory,School of Information and Electronics,Beijing Institute of Technology,Beijing 100081,China
  • Online:2010-08-25 Published:2017-07-03
  • Supported by:
    The National 242 Projects;The Basic Research Foundation of Beijing Institute of Technology;Graduated Student Science & Technology Creative Project of Beijing Institute of Technology

摘要:

为提高文本向量对文本概念的逼近程度,通过将具有相同语法语义特征的词进行聚类,提取概念簇,利用空间变换将文本向量由词空间变换到概念簇空间上来表达文本。实验比较了基于TF-IDF、IG、TF-IDF-IG、LSA以及它们结合概念簇后对文本分类的效果,证明了基于概念簇的文本向量构建方法能提高文本向量对文本概念逼近的准确程度,同时也提高了不同类型文本之间的区分度。

关键词: 中文信息处理, 文本向量, 概念簇, 文本分类

Abstract:

To enhance the performance of the text vector,terms were clustered,which contained similar syntax or semantic feature,to construct concept cluster.The text vector would be transformed from term-space to concept-cluster-space to represent the original text.The experiment compared effects of text classification based on TF-IDF,IG,TF-IDF-IG,LSA,and their combinations with concept cluster.And the results show that,the text vector based on concept cluster improves the accuracy of text concept approaching,and advances the discriminating degree between different types of texts.

Key words: chinese information processing, text vector, concept cluster, text classification

No Suggested Reading articles found!