1

A framework for text processing and supporting access to collections of digitized historical newspapers

Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that …

Semantically Enhanced User Modeling

Content-based implicit user modeling techniques usually employ a traditional term vector as a representation of the user's interest. However, due to the problem of dimensionality in the vector space model, a simple term vector is not a sufficient …

Algorithms for different approximations in incomplete information systems with maximal compatible classes as primitive granules

This paper proposes some expanded rough set models with maximal compatible classes as primitive granules, introduces two new granules for extending rough set model, and designs algorithms to solve maximal compatible classes, to find the lower and …

A tool for teaching principles of image metadata generation

We developed a simple web-based prototype to familiarize students with digital library tools. To assist the students with the indexing task, the prototype provided basic functionalities, including metadata input form, photo search interface. The …

Voting and political information gathering on paper and online

Electronic voting is slowly making its way into American politics. At the same time, more voters and potential voters are using online news and political information sources to help them make voting choices. We conducted a mockvoting study, using …