Fall 2016 Research

During Fall 2016, the Overseas Pension Project explored several facets of digital curation and historical research.

Relational Database 

For their database design course, members Jennifer Proctor, Scott Harkless, and Paridhi Mathur developed a relational database for the pension records relating to a specific pensioner. The database would organize the important details from a collection of documents relating to pensioned veterans of early American wars living overseas and provide a searchable system for the array of items in this collection. The collection includes many diverse documents. The terms of the pensions required the veterans to be recertified by a doctor; this resulted in documents including medical files, letters between Pension Office officials and the State, department letters from consulate officials on behalf of veterans and a variety of other information.

Utilizing SQL programming and MySQL Workbench, students developed a work relational database. Not without challenge, however.

The pension documents proved difficult because the data contained within and about the records are vast and unique. How does one connect a letter from an individual person to statistics discussing government payments to veterans in New York and Florida? Transcription had its own challenges. These documents are largely handwritten documents in the formats of letters which needed to be transcribed in order to draw much information from them. For some documents, an attempt was made to use OCR, or Optical Character Recognition software to transcribe digitized images. However, although our OCR program ABBYY Finereader is able to convert many digitized documents to a machine readable format, this usually requires relatively clear lettering and formats. A handwritten document in archaic cursive is quite beyond it. Even manual transcription is often difficult. Some of the documents were scanned in somewhat low resolution to reduce scanning time and storage requirements, and the handwriting was sometimes cramped and difficult to read. As such, transcription often required multiple readings.

The documents also provided structural challenges The number of join tables and foreign keys (which was necessary for these records) increased the need to change the primary key from an attribute that was also a foreign key to a new attribute, because the program indexes primary keys and names them automatically, but in its attempts to delete and create those indexes when primary keys were changed, it caused same name errors with already existing foreign keys so we had to delete the associated foreign keys, create the new primary key and switch the old ones to non-primary key attributes, and then recreate the foreign keys.

We encourage you to read their full report!