From tree to network: reordering an archival catalogue. (1st July 2020)
- Record Type:
- Journal Article
- Title:
- From tree to network: reordering an archival catalogue. (1st July 2020)
- Main Title:
- From tree to network: reordering an archival catalogue
- Authors:
- Bell, Mark
- Abstract:
- Abstract : Purpose: This paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics. Design/methodology/approach: The approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network. Findings: The summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales. Originality/value: Although the techniques used to create distributions such as topic modelling and word frequencyAbstract : Purpose: This paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics. Design/methodology/approach: The approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network. Findings: The summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales. Originality/value: Although the techniques used to create distributions such as topic modelling and word frequency counts, are not new and have been used to compare documents, to the best of the knowledge applying the averaging approach to the archival catalogue is new. This provides an interesting method for zooming in and out of a collection, creating networks at different levels of granularity according to user needs. … (more)
- Is Part Of:
- Records management journal. Volume 30:Number 3(2020)
- Journal:
- Records management journal
- Issue:
- Volume 30:Number 3(2020)
- Issue Display:
- Volume 30, Issue 3 (2020)
- Year:
- 2020
- Volume:
- 30
- Issue:
- 3
- Issue Sort Value:
- 2020-0030-0003-0000
- Page Start:
- 379
- Page End:
- 394
- Publication Date:
- 2020-07-01
- Subjects:
- Archives -- Network analysis -- Record linkage -- Topic modelling
Records -- Management -- Periodicals
651.5 - Journal URLs:
- http://www.emeraldinsight.com/journals.htm?issn=0956-5698 ↗
http://www.emeraldinsight.com/ ↗ - DOI:
- 10.1108/RMJ-09-2019-0051 ↗
- Languages:
- English
- ISSNs:
- 0956-5698
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 7325.792500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22206.xml