通信学报 ›› 2013, Vol. 34 ›› Issue (Z2): 157-162.doi: 10.3969/j.issn.1000-436x.2013.Z2.030

• 数字化校园与典型应用 • 上一篇    下一篇

基于斜率密度聚类的相似文本标定

邹杜1,唐文军1,龙卫江2,张凌3   

  1. 1 华南理工大学 信息网络工程研究中心,广东 广州 510640
    2 华南理工大学 理学院,广东 广州 510640
    3 华南理工大学 计算机学院,广东 广州 510640
  • 出版日期:2013-12-25 发布日期:2017-06-16
  • 基金资助:
    国家自然科学基金资助项目

Similar text positioning method based on slope-density cluster

Du ZOU1,Wen-jun TANG1,Wei-jiang LONG2,Ling ZHANG3   

  1. 1 Information Network Engineering and Research Center,South China University of Technology,Guangzhou 510640,China
    2 School of Science,South China University of Technology,Guangzhou 510640,China
    3 School of Computer Science &Engineering,South China University of Technology,Guangzhou 510640,China
  • Online:2013-12-25 Published:2017-06-16
  • Supported by:
    The National Natural Science Foundation of China

摘要:

相似文本标定是抄袭检测的一个重要环节,现有标定方法大多采用直接对文本或指纹进行合井的方式,标定精度受干扰信息影响较大。针对这种局限性,分析了匹配指纹对的语义特征,提出基于斜率密度的相似文本聚类方法,将文本匹配合井问题转化成稠密样本点聚类问题,井在 PAN 公用语料库上对该方法进行了测试,得到的主要指标优于 PAN10 前 3 名。目前已将该方法用于华南理工大学特色专业教学平台的作业查抄,取得了较好的效果。

关键词: 抄袭检测, 相似文本标定, 聚类, 指纹

Abstract:

Similar text positioning is an important part of plagiarism detection.The existing positioning method directly merges text or fingerprint to obtain similar text.Due to the disturb information in the similar text,the positioning accuracy is poor.The semantic features of the match fingerprints were analyzed,and a cluster method based on slope density for similar text positioning was proposed,which converts the text merge problem into dense sample points clustering problem,and improves the efficiency and accuracy of the positioning.Through the experiment on the PAN public corpus,the result shows it performs better than the PAN10 top three.This method has been used in the South China University of Technology 's feature professional teaching platform to detect the plagiarism of homework.

Key words: plagiarism detection, similar text positioning, cluster, fingerprint

No Suggested Reading articles found!