Utilization of Global Ranking Information in Graph-based Biomedical Literature Clustering


In this paper, we explore how global ranking method in conjunction with local density method help identify meaningful term clusters from ontology enriched graph representation of biomedical literature corpus. One big problem with document clustering is how to discount the effects of class-unspecific general terms and strengthen the effects of class-specific core terms. We claim that a well constructed term graph can help improve the global ranking of classspecific core terms. We first apply PageRank and HITS to a directed abstracttitle term graph to target class specific core terms. Then k dense term clusters (graphs) are identified from these terms. Last, each document is assigned to its closest core term graph. A series of experiments are conducted on a document corpus collected from PubMed. Experimental results show that our approach is very effective to identify class-specific core terms and thus help document clustering.

Proceedings of the 9th International Conference on Data Warehousing and Knowledge Discovery - DaWak ‘07