Addressing the variability of natural language Expression in sentence similarity with semantic structure of the sentences


In this paper, we present a new approach that incorporates semantic structure of sentences, in a form of verb-argument structure, to measure semantic similarity between sentences. The variability of natural language expression makes it difficult for existing text similarity measures to accurately identify semantically similar sentences since sentences conveying the same fact or concept may be composed lexically and syntactically different. Inversely, sentences which are lexically common may not necessarily convey the same meaning. This poses a significant impact on many text mining applications’ performance where sentence-level judgment is involved. The evaluation has shown that, by processing sentence at its semantic level, the performance of similarity measures is significantly improved.

Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining - PAKDD ‘09