About the Project
The MOA Collection
The complete MOA collection includes over 1.5 million images, representing approximately 5,000 volumes of primary source materials. The selection process at Cornell focused on the major journal literature of the period, ranging from general interest publications to those with more targeted audiences (such as agriculture). The Michigan process focused on monographs in the subject areas of education, psychology, American history, sociology, science and technology, and religion.
The thematic focus of the initial phase--antebellum period through reconstruction, 1850-1877--was chosen for several reasons:
- the extant literature is manageable, so that a cohesive body of material in digital form can be assembled quickly
- publications from this period are not covered by copyright protection
- scholarly and general interest in this period of American history remains high, thus increasing the potential of the collection as a research and teaching tool at the partner institutions
- this core collection will serve as the foundation for an extended distributed collection as the project grows
- much of the literature of this period is deteriorating rapidly--to preserve its informational content, the materials must be reformatted.
The Conversion Process
The materials in the MOA collection were scanned from the original paper source, with materials disbound locally due to the brittle nature of many of the items. The conversion of the materials was outsourced to Northern Micrographics, Inc., a service vendor in LaCrosse, WI. The images were captured at 600 dpi in TIFF image format and compressed using CCITT Group 4. Minimal document structuring occurred at the point of conversion, primarily linking image numbers to pagination and tagging self-referencing portions of the text. In the case of serials, low-level indexing was added post-conversion by the partner institutions; Cornell and Michigan staff are collaborating to determine low-level indexing guidelines for this complex group of serial titles.
Further conversion included both optical character recognition of the page images, and SGML-encoding of the ensuing textual information.