Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that …
Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient …
This paper proposes some expanded rough set models with maximal compatible classes as primitive granules, introduces two new granules for extending rough set model, and designs algorithms to solve maximal compatible classes, to find the lower and …
We developed a simple web-based prototype to familiarize students with digital library tools. To assist the students with the indexing task, the prototype provided basic functionalities, including metadata input form, photo search interface. The …
Electronic voting is slowly making its way into American politics. At the same time, more voters and potential voters are using online news and political information sources to help them make voting choices. We conducted a mockvoting study, using …