Network Analytics and Graph Databases

Selected blog by Claire McDonald

This week I’ve been exploring how network analysis/graph databases might be applied to two of my interest areas—oral histories and genealogy research.

Oral histories are a prime candidate for network analysis/graph database. Mitchell states, “social network analysis can best be thought of as a study of human relationships through their presentation using graphs and the application of graph theory” (Ch. 7, SNA sec.). OrgNet takes the definition further with, “Social network analysis [SNA] is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities” ( I like their addition of the word measuring, as the analysis can go beyond just identifying a relationship to determining the intensity of a relationship (perhaps as measured by frequency of contact or transaction or the relative positions of individuals). I believe the ICIJ took this approach in identifying “The Power Players” in the Panama Papers, and Facebook certainly measures relationships when it tells me who my BFFs are.

What does this mean for oral histories? When I processed oral histories for our county archives, I read all the transcripts and quickly noticed several names (of people and organizations) that repeatedly came up, regardless of whether an oral history was with a former political leader, business leader, or community member. Anecdotally, I could see relationships and, in some cases, identify the “power players” or influential people/organizations (“influencers”) in our county. With just 60 oral histories, this was manageable (plus, selection bias can’t be ruled out). But what if I wanted to analyze the 10,000 oral histories at Columbia University (or a subset based on specific topics) and identify relationships and influencers across a wide range of individuals, organizations, and subjects? Building off last week’s blog, I could potentially begin with the available transcripts, use named-entity recognition to identify and extract people and organization names, construct data tables, and use SNA (via Neo4j or OrgNet) to analyze the relationships using various queries. The results might yield interesting insights (and prompt further inquiry) into who the influencers are, cross-disciplinary influencers, impact of social networks on individual paths, development of social movements, and so on.

Another application comes in the genealogy world. While attempting to learn more about Neo4j, I came across an interesting, 5-minute video in which Rik Bruggen, a Regional VP for Sales at Neo4j, shows how he created a graph database with his mother’s genealogy data. His use was intended as a fun way to explain how graph databases can be used.

More importantly, the video got me thinking beyond my own genealogy research. In introducing graph databases, Mitchell states: “As data mining and social networking companies started to collect disparate information about people, places, and things and how they are related to each other, the need for storing information organized using a graph topology became apparent” (Ch. 6, Graph DBs sec.). Borrowing Mitchell’s language, exploring and identifying relationships between “disparate information about people, places, and things” is also a good description of genealogy research. You can see what I mean on the database webpage of the Maryland State Archives’ Legacy of Slavery project, a great resource for African-American genealogy. The site shows 31 different record series that are searchable online; the record series are diverse, including census records, real estate records, city directories, Certificates of Freedom, court records, and more. Wouldn’t it be interesting to apply network analysis/graph databases in this context and use the different record series, for example “Census 1850-1860” and “Census 1870-1880,” to examine changes in family structures, migration, or occupations over time and, ultimately, better understand the stories of slaves in Maryland? A question that will be answer through MSA’s partnership with the UMD Digital Curation Innovation Center, I hope…

Possible Take-Away for CAS/MLIS Education. Marciano et al point out that “graph analytics are being used increasingly by the users of archives” and cite the need for CAS students to learn more about graph theory, databases, and analytics (p. 8). I would expand on this idea and, drawing on contextual learning theory (Imel, 2000), propose a practicum requirement in which CAS/MLIS students work together to apply one or more computational methods to a real life archival issue.

Bruggen, R. (2014, Jan 13). “The Making of my genealogy graph database.” Accessed at

Columbia Oral History Center. Accessed at

Imel, S. (2000). Contextual Learning in Adult Education. Practice Application Brief Number 12. Center on Education and Training for Employment. College of Education. The Ohio State University, p. 3. Accessed at

International Consortium of Investigative Journalists. The Panama Papers. Accessed at

Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M., and Conrad, M. (In press). Archival Records and Training in the Age of Big Data. Advances in Librarianship – Re-Envisioning the MLIS: Perspectives on the Future of Library and Information Science Education, eds. Sarin, L.C., Percell, J., Jaeger, P.T., & Bertot, J.C.

Maryland State Archives. Legacy of Slavery. Accessed at

Mitchell, E.T. (2015). Metadata Standards and Web Services in Libraries, Archives, and Museums : An Active Learning Resource. (Kindle Ed.) Santa Barbara, CA: Libraries Unlimited.

OrgNet. (n.d.) Social Network Analysis: An Introduction. Accessed at