Heritage institutions are places in which works of art, historical records, and other objects of cultural or scientific interest are sheltered and made accessible to the public. The equivalent of that in the digital world, is already taking shape, through digitization and sharing of digital-born or digitized objects on online platforms. In this article we shed light on how the issue of structured data about heritage institutions is being tackled by Wikipedia, and its sister Wikidata, through their “Sum of All GLAM” project..
Access to these objects, and information about them, is provided and mediated both through platforms maintained by the heritage sector itself and through more general-purpose platforms, which often serve as a first point of entry for the wider public. These platforms include Google, Facebook, YouTube, and Wikipedia, which also happen to be among the most visited websites on the Web. In this emerging data and platform ecosystem, Wikipedia and related Wikimedia projects play a special role as they are community-driven, non-profit endeavours. Moreover, these projects are working hard to make data and information available in a free, connected and structured manner, for anybody to re-use.
There are various layers of information about heritage institutions, ranging from descriptions of institutions themselves and descriptions of their collections, to descriptions of individual items. There may be digital representations of these items, and in some cases even searchable content within the items. Figure 1 illustrates how the top four layers of data and information are currently addressed in Wikipedia, with Wikidata and Wikimedia Commons increasingly focussing on providing structured and linked data alongside the unstructured or semi-structured encyclopaedic information contained in Wikipedia articles.
Figure 1: Heritage data and content in the context of Wikipedia and its sister projects
Structured data about institutions and collections, as well as some item-level data are maintained on Wikidata, which serves as Wikipedia’s repository for structured data. Wikimedia Commons serves as Wikipedia’s repository for media files and is currently being prepared for the linked data era through the “Structured Data on Wikimedia Commons” project. This project accompanies the transition of Wikimedia Commons to linked open data, foresees the provision of item level metadata as linked data and monitors the implementation of the IIIF standard, to allow easier cross-platform manipulation and media file sharing. While similar efforts are on their way at all the different levels of information, we will focus the the remainder of the article on a project that is dedicated to improving data quality and completeness of the top layer, i.e. the data about the heritage institutions themselves. This project lays the foundations for an International Knowledge Base for Heritage Institutions. The project is currently managed by Wiki Movement Brazil in cooperation with OpenGLAM Switzerland and will be expanded to further countries in the near future; it is also being coordinated with “FindingGLAMs”, a project run by Wikimedia Sweden, UNESCO, and the Wikimedia Foundation, which pursues similar goals, but addresses different layers of information, including aspects related to structured data on Wikimedia Commons.
To succeed, the International Knowledge Base for Heritage Institutions needs to address all stages of the linked data value chain, from data provision to data use (figure 2):
Figure 2: Core processes of linked data publication (source: eCH-0205 – Linked Open Data)
Parts of the data have already been ingestedinto Wikidata and the relevant elements of the ontology have already been implemented so currently most of the effort is going into data maintenance. The goal is to provide the data in a coherent way that makes it fit for its use in Wikidata infoboxes (see figure 3 for an example). However, before it is ready to go there are various issues to be addressed, such as data quality, correct data modelling, and data completeness.
Figure 3: Wikipedia article with infobox containing structured data
The goal of the “Sum of All GLAM” project is to complete entries for all the heritage institutions of a given country, with all the data that is required for the infoboxes. To monitor progress in achieving this goal, we are currently putting in place various instruments that can be used by community members to focus their efforts in improving existing data entries (see figure 4 for an example). While the issues related to data modelling will be addressed by members of the Wikidata community at an international level, the project team is planning to involve members of the heritage community in the various countries to help improve completeness of the data and to make sure that all the data entries are properly sourced. While existing Wikidata community members are expected to work on this hand-in-hand with members of the heritage community, the project will heavily rely on the heritage sector to help keep the information about their institutions up-to-date in the longer run. In fact, in the case of medium-sized and larger institutions, regularly updating their existing Wikidata entry should eventually become part of the tasks carried out by an institution’s communication department. For smaller institutions, other solutions need to be found – possibly via the intermediation of umbrella organizations or specialized institutions which take care of coordination at a national level.
Figure 4: Table indicating the completeness of data about museums for different countries
To involve the members of the heritage sector in various countries, internationalization is being pursued early in the project: To do so, a well-documented model project is currently implemented in Brazil, which can in turn be implemented in other countries. To make sure that the project documentation is fit for its purpose, international partners ready to implement parts of the project in their country are currently being recruited. And to facilitate the tailoring of the project to local needs, the model project will be broken down into various modules that can be implemented separately or in combination with other modules, as the local partners see fit. In the second part of this article we will describe some of these modules.
Part 2 of this article is published here.
The working title, GLAM stands for “Galleries, Libraries, Archives, Museums”; the acronym is commonly used to refer to heritage institutions.