The central role of base registers and standards data in breaking data silos

Base registers or standards data are central components of a Linked Data ecosystem. Together with jointly used data models or ontologies, they ensure that data sets can be linked across organisational boundaries. Without them, Linked Data would not be possible This article explains what basic registers or standards data are and shows why it makes sense to prioritise their publication as Linked Open Data as part of an effective Open Government Data strategy. In the context of an ongoing project commissioned by eGovernment Switzerland, a catalogue of prioritisation criteria was developed for this purpose, which serves as a basis for screening existing data holdings and should enable a targeted data publication strategy. Linked Data is the technology of choice when it comes to breaking through organisational data silos and providing data sets in such a way that they can be linked and shared as easily as possible with data sets from other organisations. Preparing data for Linked Data always involves a certain amount of effort, typically on the part of the data holders. It is an investment in interoperability for the benefit of future data users, which often include the data holders themselves. Ensuring interoperability is not a one-way street, but usually requires concerted action by various actors. In the context of Linked Data, interoperability is ensured by three elements (Estermann 2018):

  • On the technical infrastructurelevel, standard web technologies such as HTTP,RDF and URIs are used to describe and provide data. They form the basic prerequisite for cross-server, semantic queries of the data. The data is typically stored in triple stores, which can be queried via a SPARQL endpoint.
  • The semantic infrastructure consists of data models or ontologies. These describe the concepts contained in the data and map the relationship between these concepts. Ontologies exist in different forms, which differ primarily in terms of their complexity. Ontologies of a lower degree of complexity are sometimes referred to as catalogues, glossaries, thesauri or taxonomies, which are also commonly summarised under the generic term “controlled vocabularies”. If different data sets are described by means of the same ontologies, they are interoperable at the semantic level.
  • Registers of proper names,so-called “named entities”, serve to uniquely identify the different instances of a class. By defining persistent identifiers for the different instances of a class, they make it possible to make statements about the same person, the same organisation, the same administrative unit, etc. in the context of different data sets. In the context of statistical offices and other government agencies, there is often talk of “basic registers”. These registers typically claim to list all existing instances of a class in a given administrative unit and are usually characterised by the fact that a public authority has a statutory mandate to maintain them from the official side. The definition of the European Interoperability Framework (European Commission 2017) also mentions this official duty to collect, update and preserve, and emphasises the intended further use of the registers by third parties. Basic registers can thus be seen as “master data” of public administration and the provision of public services.A similar function is assigned to “norm files”, as they are known from the library world – they serve, for example, to clearly identify persons or works in the context of a library catalogue. Since Linked Data was created to link data across organisational and domain boundaries, today basic registers and standards files are often used beyond their originally intended domain. If different base registers or standards files describe the same instances, corresponding concordance databases are used – a prominent example from the library world is the Virtual International Authority File (VIAF), which links the standards files of the national libraries of various countries. Another prominent example of such a central data hub for “entities” of various classes is Wikidata (Allison-Cassin & Scott 2018).

Since base registers and standards files are often provided by public authorities and play a key role with regard to establishing a well-functioning Linked Data ecosystem, the systematic provision of base registers and standards files is a central aspect of an effective Open Government Data strategy. This conclusion was also reached in the study by the Bern University of Applied Sciences on the further development of the Swiss data standard for open data platforms (Haller et al. 2018), which also pointed out the important role of Linked Data when it comes to sustainably improving the findability, evaluation and usability of open data holdings for third parties. In order to comply with two of the study’s key recommendations for action, E-Government-Switzerland commissioned a project in spring 2019 at the request of the Swiss Federal Archives as part of the Innovation Promotion Programme, with the aim of systematically addressing the publication of basic registers and controlled vocabularies as Linked Open Data. The project is committed to the action research approach: On the one hand, several basic registers and central vocabularies are to be published on the federal Linked Data platform. On the other hand, the foundations are to be laid for a data publication strategy that is as effective and efficient as possible. The following questions are in the foreground:

  • What are the relevant basic registers and central vocabularies with regard to the publication of LOD by Swiss authorities? What are valid criteria for prioritisation?
  • To what extent are the relevant base registers and vocabularies already available as LOD? What are the reasons why they are not yet available as LODs?
  • What are the practical challenges in preparing basic registers and central vocabularies for LOD?

The focus of the project is on the publication of Linked Open Data by Swiss authorities. Of primary interest are therefore basic registers and controlled vocabularies that can be used in connection with Swiss government data. By “basic registers” we mean all types of “named entities”, regardless of their official designation; in the case of controlled vocabularies, taxonomies (e.g. naming and hierarchical structure of Swiss administrative units) and sets of characteristics for certain attributes (e.g. gender) are of particular interest. To identify relevant data, the project team is conducting an initial screening of potentially interesting data sets and is seeking exchange with various Swiss authorities that are among the “first movers” in terms of Linked Data publication. In a further step, starting in summer 2019, the Swiss LOD community will be consulted and asked for input with regard to the identification and prioritisation of basic registers and controlled vocabularies. Prioritisation criteria A set of criteria has been developed for the prioritisation of base registers and controlled vocabularies, taking into account three dimensions: (i) the potential for use, (ii) the (technical and legal) feasibility of data publication, and (iii) the willingness of the data holder. The focus is on the following aspects:

  • Potential for use:
  1. To what extent can the data be linked to data sets from Swiss authorities that have already been published as Linked Data or whose publication is planned for the period 2019-2020?
  2. How high is the usage potential in the area of open government data? – Evaluation based on concrete usage scenarios in connection with open data.
  3. How high is the potential for use within the public administration or within individual organisations or organisational networks? – Evaluation based on concrete usage scenarios in connection with non-public data.
  • Feasibility:
  1. Quality of existing data: What effort would have to be made to provide the data in sufficient quality?
  2. Completeness of existing data: What effort would have to be made to provide the data in sufficient completeness?
  3. Scope and complexity of the data: What effort is involved in actually preparing the data for Linked Data?
  4. Legal situation of the data: May the data be released according to the current legal situation? (Data protection, confidentiality, fee regulations, etc.)
  5. What effort is to be expected to ensure the transformation of the data into Linked Data in reasonable time intervals? – – B.: The data should not be published once, but kept up to date on an ongoing basis. However, data stocks can vary greatly in terms of their update frequency; therefore, the technical arrangements required to ensure timely transformation of the data in the long term also vary greatly.
  • Readiness of the data holder
  1. To what extent is the data owner willing to actively support the transformation of the data or even to take responsibility for it?
  2. To what extent is the data owner able to support the transformation of the data or to undertake it himself? In what time horizon? What support services would be necessary?

*This text is the 1st part of an article, the 2nd part will appear shortly.


References

Creative Commons Licence

AUTHOR: Beat Estermann

Beat Estermann is deputy head of the Institute Public Sector Transformation at BFH Wirtschaft, where he coordinates the specialist group "Data & Infrastructure". He has been dealing with Linked Open Data issues for several years in the context of research projects and consulting mandates on behalf of public authorities, memory and cultural institutions.

Create PDF

Related Posts

None found

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *