IEEE Big Data 2017: 2nd CAS workshop

Workshop Title: 2nd Computational Archival Science (CAS) workshop
Wednesday, December 13, 2017
Westin Copley Plaza
10 Huntington Avenue, Boston, MA 02116
Boston, USA, 20001

PART OF: IEEE Big Data 2017
http://cci.drexel.edu/bigdata/bigdata2017/
*** There is a 1-day registration option ***


PROGRAM:
14 presentations from France, Netherlands, UK, Canada, US, Taiwan; 2 demos from GE, US; Student panel on new curricula.

room

9:00 – 9:15 Welcome

  • Workshop Chairs:
    Mark Hedges1, Victoria Lemieux2, Richard Marciano3
    1 KCL, 2 UBC, 3 U. Maryland

    vicki-richard-mark

    • The “COMPUTATIONAL ARCHIVAL SCIENCE (CAS)” Portal http://dcicblog.umd.edu/cas
      • Join our Google Group at: computational-archival-science@googlegroups.com
    • Foundational Paper: Dec. 2017, “Archival records and training in the Age of Big Data”, Marciano, Lemieux, Hedges, Esteva, Underwood, Kurtz, Conrad, accepted for publication. See: LINK.
      • 8 topics: (1) Evolutionary prototyping and computational linguistics, (2) Graph analytics, digital humanities and archival representation, (3) Computational finding aids, (4) Digital curation, (5) Public engagement with (archival) content, (6) Authenticity, (7) Confluences between archival theory and computational methods: cyberinfrastructure and the Records Continuum, and (8) Spatial and temporal analytics.
      • [In: “Advances in Librarianship – Re-Envisioning the MLIS: Perspectives on the Future of Library and Information Science Education”, Editors: Lindsay C. Sarin, Johnna Percell, Paul T. Jaeger, & John Carlo Bertot.]

9:15 – 10:35 Session 1: Exploring Archival Data (talks: 20 mins each)

  • #1: Building new knowledge from distributed scientific corpus; HERBADROP & EUROPEANA: two concrete case studies for exploring big archival data
    [Pascal Dugenie, Nuno Freire, Daan Broeder — CINES, FR & MEERTENS Institut, NL & INESC-ID/Europeana DSI, NL]

    nuno_frere
    SlidesPaper

    • Computational Methods: EUDAT automated scalable e-infrastructure, integrated computation services,

    • Archival Concepts: Trusted digital repositories (TDR),
      OCR, cultural heritage platforms
  • #2: An Infrastructure and Application of Computational Archival Science to Enrich and Integrate Big Digital Archival Data: Using Taiwan Indigenous Peoples Open Research Data (TIPD) as Example
    [Ji-Ping LinAcademia Sinica, TW]

    jp_ling
    SlidesPaper

    • Computational Methods: Topic Modelling for concept extraction from large EC archival holdings

    • Archival Concepts: Support accessibility to large historical European Commission archival holdings
  • #3: Computational Curation of a Digitized Record Series of WWII Japanese-American Internment
    [William Underwood, Richard Marciano, Sandra Laib, Carl Apgar, Luis Beteta, Waleed Falak, Marisa Gilman, Riss Hardcastle, Keona Holden, Yun Huang, David Baasch, Brittni Ballard, Tricia Glaser, Adam Gray, Leigh Plummer, Zeynep Diker, Mayanka Jha, Aakanksha Singh, and Namrata Walanj — University of Maryland, USA]

    bill_underwood
    SlidesPaper

    • Computational Methods: Topic Modelling for concept extraction from large EC archival holdings

    • Archival Concepts: Support accessibility to large historical European Commission archival holdings
  • #4: The Cybernetics Thought Collective Project: Using Computational Methods to Reveal Intellectual Context in Archival Material
    [Bethany Anderson, Christopher Prom, Kevin Hamilton, James Hutchinson, Mark Sammons, and Alex Dolski — University of Illinois at Urbana-Champaign, USA]

    chris_prom
    SlidesPaper

    • Computational Methods: Archival materials contextual discovery

    • Archival Concepts: Annotation, entity extraction, NLP, machine learning

10:35 – 10:45 Questions and Discussion


10:45 – 11:05 Coffee break


11:05 – 12:25 Session 2: Curation and Appraisal (talks: 20 mins each)

  • #5: Towards Automated Quality Curation of Video Collections from a Realistic Perspective
    [Todd Goodall, Maria Esteva, Sandra Sweat, and Alan Bovik — University of Texas, USA]

    todd_goodall
    SlidesPaper

    • Computational Methods: Feature computing from video records, automated quality prediction, scalable HPC

    • Archival Concepts: Collection assessment, quality-aware metadata for video collections to inform appraisal, preservation, and access decisions, quality detection in videos

  • #6: Line Detection in Binary Document Scans: A Case Study with the International Tracing Service Archives
    [Benjamin LeeUnited States Holocaust Memorial Museum, USA]

    ben_lee
    SlidesPaper

    • Computational Methods: Line detection, image segmentation

    • Archival Concepts: Classification of archival images
  • #7: Auto-Categorization & Future Access to Digital Archives
    [Nathaniel Payne and Jason BaronUniversity of British Columbia, CAN & Of Counsel, Drinker Biddle & Reath LLP, USA]

    jason_baron
    SlidesPaper

    • Computational Methods: Auto-categorization, auto-classification, e-discovery, machine learning

    • Archival Concepts: Recordkeeping
  • #8: Heuristics for Assessing Computational Archival Science (CAS) Research: The Case of the Human Face of Big Data Project
    [Myeong Lee, Yuheng Zhang, Shiyun Chen, Edel Spencer, Jhon Dela Cruz, Hyeonggi Hong, and Richard Marciano — University of Maryland, USA]

    shiyun
    SlidesPaper

    • Computational Methods: Heuristics for CAS research,

    • Archival Concepts: Iterative design, value-sensitive design

12:25 – 12:45 Session 3: CAS Methods (talk: 20 min)

  • #9: What Can a Knowledge Complexity Approach Reveal About Big Data and Archival Practice?
    [Nicola HorsleyThe Netherlands Institute for Permanent Access to Digital Research Resources, NL]

    nicola_horsley
    SlidesPaper

    • Computational Methods: Digital narrative with big data,

    • Archival Concepts: Knowledge complexity in archives

  • 12:45 – 2:00 Lunch


    2:00 – 3:00 Session 3 CAS Methods cont. (talks: 20 mins each)

    • #10: Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections
      [Tim HutchinsonUniversity of Saskatchewan Library, CAN]

      tim_hutchinson
      SlidesPaper

      • Computational Methods: NLP, NER, sentiment analysis

      • Archival Concepts: PII
    • #11: Identifying Epochs in Text Archives
      [Tobias Blanke and Jon Wilson — King’s College London, UK]

      michael_bryant1
      SlidesPaper

      • Computational Methods: Cultural analytics, topic modeling

      • Archival Concepts: Classification of time-coded
        collections of textual collections into epochs and periods
    • #12: GraphQL for Archival Metadata: An Overview of the EHRI GraphQL API
      [Mike BryantKing’s College London, UK]

      michael_bryant2
      SlidesPaper

      • Computational Methods: APIs for cultural heritage materials, graph databases

      • Archival Concepts: Structured data interfaces to archival materials

    3:00 – 3:40 Session 4: Creation and Management of Current Records (talks: 20 mins each)

    • #13: The Blockchain Litmus Test
      [Tyler SmithAdventium Labs, USA]

      tyler_smith
      SlidesPaper

      • Computational Methods: Blockchain, secure computing,
        trustworthiness

      • Archival Concepts: Decentralized recordkeeping
    • #14: A Typology of Blockchain Recordkeeping Solutions and Some Reflections on their Implications for the Future of Archival Preservation
      [Victoria LemieuxUniversity of British Columbia, CAN]

      vicki_lemieux
      SlidesPaper

      • Computational Methods: Blockchain, computational validation, distributed ledger, computational trust

      • Archival Concepts: Recordkeeping, digital preservation,
        archival trust

    3:40 – 4:05 Questions and Discussion


    4:05 – 4:25 Coffee break


    4:25 – 4:55 Demos


    4:55 – 5:15 Student Session:

    • Moderator: Michael KurtzStudents: LEFT TO RIGHT — Jennifer Proctor, Claire McDonald , Will Thomas
      lbsc3

      Seven graduate students at the U. Maryland participated in a fall 2017 seminar exploring the eight case studies proposed in the 2017 Foundational Paper: “Archival records and training in the Age of Big Data”, Marciano, Lemieux, Hedges, Esteva, Underwood, Kurtz, Conrad, LINK, to be published in “Advances in Librarianship – Re-Envisioning the MLIS: Perspectives on the Future of Library and Information Science Education”, Editors: Lindsay C. Sarin, Johnna Percell, Paul T. Jaeger, & John Carlo Bertot.

      Students offered to discuss educational takeaways, and methods of incorporating CAS into the Master’s of Library and Information Science (MLIS) education in order to better address the needs of today’s MLIS graduates looking to employ both ‘traditional’ archival principles in conjunction with computational methods.

    5:15 Closing Remarks

    room2



    Introduction to workshop:
    The large-scale digitization of analog archives, the emerging diverse forms of born-digital archive, and the new ways in which researchers across disciplines (as well as the public) wish to engage with archival material, are resulting in disruptions to transitional archival theories and practices. Increasing quantities of ‘big archival data’ present challenges for the practitioners and researchers who work with archival material, but also offer enhanced possibilities for scholarship through the application of computational methods and tools to the archival problem space, and, more fundamentally, through the integration of ‘computational thinking’ with ‘archival thinking’.

    Our working definition of Archival Computational Science (CAS) is:

    Contributing to the development of the theoretical foundations of a new trans-discipline of computer and archival science

    This workshop will explore the conjunction (and its consequences) of emerging methods and technologies around big data with archival practice and new forms of analysis and historical, social, scientific, and cultural research engagement with archives. We aim to identify and evaluate current trends, requirements, and potential in these areas, to examine the new questions that they can provoke, and to help determine possible research agendas for the evolution of computational archival science in the coming years. At the same time, we will address the questions and concerns scholarship is raising about the interpretation of ‘big data’ and the uses to which it is put, in particular appraising the challenges of producing quality – meaning, knowledge and value – from quantity, tracing data and analytic provenance across complex ‘big data’ platforms and knowledge production ecosystems, and addressing data privacy issues.

    This is the 2nd workshop at IEEE Big Data addressing Computational Archival Science (1st CAS workshop at: http://dcicblog.umd.edu/cas/ieee_big_data_2016_cas-workshop/). This will builds on three earlier workshops on ‘Big Humanities Data’ organized by the same chairs at the 2013-2015 conferences, and more directly on a symposium held in April 2016 at the University of Maryland (http://dcicblog.umd.edu/cas/dcickcl-invited-cas-symposium-apr-2016/).

    Research topics covered:
    Topics covered by the workshop include, but are not restricted to, the following:

    • Application of analytics to archival material, including text-mining, data-mining, sentiment analysis, network analysis.
    • Analytics in support of archival processing, including e-discovery, identification of personal information, appraisal, arrangement and description.
    • Scalable services for archives, including identification, preservation, metadata generation, integrity checking, normalization, reconciliation, linked data, entity extraction, anonymization and reduction.
    • New forms of archives, including Web, social media, audiovisual archives, and blockchain.
    • Cyber-infrastructures for archive-based research and for development and hosting of collections
    • Big data and archival theory and practice
    • Digital curation and preservation
    • Crowd-sourcing and archives
    • Big data and the construction of memory and identity
    • Specific big data technologies (e.g. NoSQL databases) and their applications
    • Corpora and reference collections of big archival data
    • Linked data and archives
    • Big data and provenance
    • Constructing big data research objects from archives
    • Legal and ethical issues in big data archives

     
    Program Chairs:
    Dr. Mark Hedges
    Department of Digital Humanities (DDH)
    King’s College London, UK

    Prof. Victoria Lemieux
    School of Library, Archival and Information Studies
    University of British Columbia, Canada

    Prof. Richard Marciano
    Digital Curation Innovation Center (DCIC)
    College of Information Studies
    University of Maryland, USA

    Program Committee Members:
    The program chairs will serve on the Program Committee, as will the following:

    Dr. Maria Esteva
    Data Intensive Computing
    Texas Advanced Computing Center (TACC), USA

    Dr. Bill Underwood
    Digital Curation Innovation Center (DCIC)
    College of Information Studies
    University of Maryland, USA

    Prof. Michael Kurtz
    Digital Curation Innovation Center (DCIC)
    College of Information Studies
    University of Maryland, USA

    Mark Conrad
    National Archives and Records Administration (NARA)

    Dr. Tobias Blanke
    Department of Digital Humanities
    King’s College London, UK