Finding light in dark archives: using AI to connect context and content in email


Email archives are important historical resources, but access to such data poses a unique archival challenge and many born-digital collections remain dark, while questions of how they should be effectively made available remain. This paper contributes to the growing interest in preserving access to email by addressing the needs of users, in readiness for when such collections become more widely available. We argue that for the content of email to be meaningfully accessed, the context of email must form part of this access. In exploring this idea, we focus on discovery within large, multi-custodian archives of organisational email, where emails’ network features are particularly apparent. We introduce our prototype search tool, which uses AI-based methods to support user-driven exploration of email. Specifically, we integrate two distinct AI models that generate systematically different types of results, one based upon simple, phrase-matching and the other upon more complex, BERT embeddings. Together, these provide a new pathway to contextual discovery that accounts for the diversity of future archival users, their interests and level of experience.

Publication DOI:
Divisions: College of Business and Social Sciences > Aston Business School
Additional Information: © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The authors gratefully acknowledge funding support by the Arts & Humanities Research Council (UK) and National Endowment for the Humanities (USA) as part of the US-UK Partnership Development Grants, Grant AH/T013060/1.
Uncontrolled Keywords: Email archives,Born-digital collections,Computational archive studies,Contextual email discovery
Publication ISSN: 1435-5655
Last Modified: 15 Apr 2024 07:40
Date Deposited: 31 Jan 2022 16:41
Full Text Link:
Related URLs: https://link.sp ... 146-021-01369-9 (Publisher URL)
PURE Output Type: Article
Published Date: 2021-12-31
Published Online Date: 2021-12-31
Accepted Date: 2021-11-23
Authors: Decker, Stephanie (ORCID Profile 0000-0003-0547-9594)
Kirsch, David A.
Venkata, Santhilata Kuppili
Nix, Adam



Version: Published Version

License: Creative Commons Attribution

| Preview

Export / Share Citation


Additional statistics for this record