Telecommunications Science ›› 2011, Vol. 27 ›› Issue (11): 62-65.doi: 10.3969/j.issn.1000-0801.2011.11.018

• Cloud computing column • Previous Articles     Next Articles

Research on Term Weighting Based on MapReduce

Kai Wang1,Shuicai Shi1,2,Tao Wang1,2,Xueqiang Lv1,2   

  1. 1 Beijing Information Science and Technology University,Chinese Information Processing Research Center, Beijing 100101,China
    2 Beijing TRS Information Technology Co.,Ltd.,Beijing 100101,China
  • Online:2011-11-15 Published:2011-11-15

Abstract:

Term recognition is widely used in the ontology construction,dictionary construction and other fields. And term weighting is a key step in the term recognition. In this paper,several improvements have been made to TF-IDF algorithm,e.g., the length of terms is considered in weighting,also with terms’ correlations to documentation set. The candidate term weight is calculated in a distributed manner based on MapReduce on Hadoop. Experimental results show that the method proposed not only simplifies the steps of term weighting,but also improves the efficiency of the algorithm.

Key words: term weight, TF-IDF, MapReduce, distributed

No Suggested Reading articles found!