We propose a ranking model to diversify answers of non-factoid questions based on an inverse notion of graph connectivity. By representing a collection of candidate answers as a graph, we posit that novelty, a measure of diversity, is inversely …
We introduce our QA system AskDragon which employs a novel lightweight local context analysis technique to handling two broad classes of factoid questions, entity and numeric questions. The local context analysis module dramatically improves the …
In this paper, we present a new approach that incorporates semantic structure of sentences, in a form of verb-argument structure, to measure semantic similarity between sentences. The variability of natural language expression makes it difficult for …
Digital reference services normally rely on human experts to provide quality answers to the user requests via online communication tools. As the services gain more popularity, more experts are needed to keep up with a growing demand. Alternatively, …
The ability to accurately judge the similarity between natural language sentences is critical to the performance of several applications such as text mining, question answering, and text summarization. Given two sentences, an effective similarity …
Document representation is one of the crucial components that determine the effectiveness of text classification tasks. Traditional document representation approaches typically adopt a popular bag-of-word method as the underlying document …
We describe the Image Tagger system - a web-based tool for supporting collaborative image indexing by students. The tool has been used in three successive graduate-level classes on content representation. To fully satisfy the class' requirements and …
In this paper, we explore how global ranking method in conjunction with local density method help identify meaningful term clusters from ontology enriched graph representation of biomedical literature corpus. One big problem with document clustering …
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that …
Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient …