Tag Archive for: Linked Open Data

Basic registers and standards data as precursors for Linked Data

Base registers are central components of a Linked Data ecosystem. Together with commonly used data models or ontologies, they ensure that data sets can be linked with each other even across organisational boundaries. Without them, Linked Data would not be possible. Based on an ongoing project that aims to advance the publication of Linked Open Data by Swiss authorities, we describe the status quo and the planned measures to systematically promote the publication of relevant basic registers and vocabularies. As described in an earlier article (Estermann 2019), a project commissioned by eGovernment Switzerland will identify those data sets that can serve as basic registers or central vocabularies in connection with the publication of Linked Open Data (LOD) by Swiss authorities. Their timely publication as Linked Open Data would promote the linking of public authority data. The fact that the publication of basic registers or central vocabularies is a very important topic in Switzerland was also shown at the Opendata.ch/2019 unconference held at the beginning of July: The question of which basic registers and vocabularies Swiss authorities should publish as LOD was rated by the participants as one of the most important questions and was dealt with in a workshop. In order to identify those basic registers and vocabularies that have the greatest potential for use in the context of Swiss government data, the Bern University of Applied Sciences conducted an initial screening of data sets as part of an eGovernment Switzerland project. Two approaches were pursued in parallel:

  • Screening of existing databases of Swiss authorities with regard to their suitability as base registers or vocabularies.
  • Screening of Wikidata for suitability as a base register or vocabulary in connection with data publication by Swiss authorities.

The screening was supplemented by a survey of Swiss authorities that already publish data as Linked Data or plan to do so in the near future. In the process, additional data from the field of memory institutions and digital humanities was identified, especially in the area of archives and libraries. In the following, the advantages and disadvantages of these different types of data sources are briefly discussed and initial shortlists are presented, which will then be commented on and supplemented by the Swiss LOD community in an open process.

Data holdings of Swiss authorities

Most of the data holdings of the Swiss authorities are created and maintained on the basis of a legal mandate. It can therefore be assumed not only that the data are of high quality, but also that the continuity of data publication is guaranteed, i.e. that the data will also be maintained and made available in the future. However, it is important to bear in mind that the mere fact that the data are provided by public authorities is no guarantee of data quality. Data quality must be thought of as a process and only becomes tangible in connection with concrete applications. A diverse and frequent use of data generally increases data quality, since errors and deficiencies in the data are often only discovered when it is used. In the case of some official data (e.g. commercial register, municipal directory), it can be assumed that they are used regularly and in different contexts; in the case of others, the previous context of use and the frequency of use remain largely in the dark (e.g. cantonal monument lists). Unfortunately, only a few public administration datasets are published as Linked Open Data today, and the feasibility and willingness of the various data holders with regard to such publication generally still needs to be clarified. Based on the screening and the outcome of the above-mentioned workshop, we have drawn up an initial shortlist of data sets from Swiss public authorities that could serve as basic registers or controlled vocabularies in connection with the publication of Swiss public authority data as Linked Open Data:

NameResponsible authorityShort description
UID registerFSOAll companies operating in Switzerland are listed in the UID register. The information on the companies is accessible to the administration (UID offices), to the company itself and partly to the public.
Commercial registerCantonal commercial register officesIn Switzerland, the commercial registers are organised in a decentralised manner and are kept by the cantons. The commercial registers are public and serve to constitute and identify companies. Their purpose is to record and disclose facts relevant to commercial and corporate law, thereby helping to ensure legal certainty and protect third parties.
TERMDATFederal Chancellery (FC)TERMDAT is the multilingual terminology database of the Swiss federal administration and contains, among other things, the official names of all federal offices. A partial implementation as Linked Data has already been prototypically realised.
NomenclaturesFSOThe FSO nomenclatures include in particular:

  • Municipal directory,
  • Historicised municipal directory,
  • Postcode directory.

In addition, versioned matching between postcodes and FSO municipality numbers would be desirable.

Official list of municipalities swisstopoOfficial list of localities with postcode and perimeter.
Federal Register of Buildings and Dwellings (GWR)FSORecords the most important basic data on buildings and dwellings in Switzerland for statistical and administrative purposes.
NOGAFSOThe “general classification of economic activities” (Nomenclature générale des activités économiques) is used for the consistent use of sector names in statistical evaluations.
ISCOFSOInternational Standard Classification of Occupations for the consistent use of occupational names in statistical evaluations.

This list should be understood as a suggestion of which existing datasets should be published as Linked Open Data with the highest priority from a usage perspective.

Wikidata

Data sets in Wikidata have the advantage that they have a very good degree of coverage due to the crowdsourcing approach, and missing data can be easily created or added. In addition, data from Wikidata can be immediately integrated with a worldwide Linked Data cloud, since reconciliation with other data sets takes place immediately during data entry, and not only after data publication, as is often the case with other data sets. However, the crowdsourcing approach also leads to certain problems, especially with regard to data quality. This can only be ensured with additional effort, e.g. by identifying and limiting the data to reliable sources. Furthermore, there is a considerable need for data cleansing and harmonisation of modelling practices in various areas. Here too, based on the screening, we have drawn up an initial shortlist of data sets in Wikidata that could serve as basic registers or controlled vocabularies in connection with the LOD publication of Swiss government data:

NameWikidata queryNo. of entries (June 2019)
Administrative units of Switzerlandhttps://w.wiki/53U5139
Swiss organisationshttps://w.wiki/53x12596
Swiss memory institutionshttps://w.wiki/5Gm2169
People born in Switzerlandhttps://w.wiki/53V24537
People who died in Switzerlandhttps://w.wiki/53X13396
People with Swiss nationalityhttps://w.wiki/53Z31006
People with a connection to Switzerland (citizenship, place of birth or death, place of work or residence)https://w.wiki/53c40549
Buildings in Switzerlandhttps://w.wiki/53f20147
Swiss Cultural Property of National or Regional Importance (KGS Inventory)https://w.wiki/53j13121
Languageshttps://w.wiki/53m12987
Taxonshttps://w.wiki/53o2549556
Water bodies in Switzerlandhttps://w.wiki/53q2942
Mountains in Switzerlandhttps://w.wiki/53r7965
Chemical compoundshttps://w.wiki/53$162545
Human sex or gender (vocabulary)https://w.wiki/54610+
Fabrics from which objects are made (vocabulary)https://w.wiki/5483318
Colours used to identify objects (vocabulary)https://w.wiki/54D61
Colourshttps://w.wiki/54C191

It could also be interesting to publish official authority data directly in Wikidata. This would have the advantage of directly opening up a high potential for use in an international context, since the data can be combined more easily with data from other countries. Such an approach is particularly useful for topics that are also to be covered in Wikipedia articles. In order to ensure the semantic interoperability of data across national borders, appropriate coordination between the data publishing bodies is required. If this is not already being done elsewhere, this coordination can take place directly within the Wikidata community.

Data from the field of memory institutions and digital humanities

The National Library and the two archives interviewed also pointed out the importance of international standards data and vocabularies. These include, for example, the Gemeinsame Normdatei (GND), which is maintained cooperatively by the German National Library and the German-language library networks, as well as the Virtual Internet Authority File (VIAF) and the Dewey Decimal Classification, both of which are operated by the US Online Computer Library Center (OCLC). With regard to the networking of Swiss holdings, other standards data and directories that relate specifically to Switzerland also play a role:

NameOperatorBrief description
Common standards file (GND)German National LibrarySubject index for persons, corporate bodies, congresses, geography, subject headings and titles of works. It is mainly used for the cataloguing of literature in libraries, but is also increasingly used by archives, museums, projects and in web applications.
Virtual International Authority File (VIAF)OCLCVirtual international authority file linking 25 national authority files via a concordance file.
Dewey Decimal ClassificationOCLC Online Computer Library CenterThe most widely used international classification for indexing library holdings. It is mainly used in the Anglo-American language area.
Photography MetadataPhoto CHMetadata on Swiss photographers and photography holdings (photographers, places of work, institutions, holdings, exhibitions).
Inventory of research libraries in SwitzerlandSwissbib/UB BaselData on the approximately 900 Swiss research libraries connected to the library metacatalogue of Swissbib.
Authority files on Swiss historyhistHubNamed entities (persons, places), typologies (professions, place types) and vocabularies (first names, concepts) relevant to historical holdings on Switzerland. Some of these are still under construction.
Metadata of the Historical Dictionary of SwitzerlandHLSMetadata on entries in the Historical Dictionary of Switzerland (coordinates, persons, organisations, links to GND and VIAF).
MetagridSAGW / DodisConcordance file for historical reference data with reference to Switzerland.

Historicised databases as a major challenge

The availability and use of historicised data holdings poses a particular challenge. This topic is highlighted again and again in discussions about the publication of Open Government Data as Linked Data, including at the workshop mentioned above. It is not only about the availability itself, which is still incomplete today (for example, municipal perimeters). It is also about how different historicised data sets can be linked: This is often not easy today, as different historicisation approaches have been used for the historicisation of the various data sets.

Use scenarios

As can be seen from the survey of Swiss authorities that already publish data as linked data or plan to do so in the near future, the additional effort that is put into the preparation and linking of the data with other inventories is motivated by the fact that this will allow the data to be used as a source of information:

  1. an improved search in the holdings can be offered in the future (e.g. multilingual search in historical holdings of the Federal Archives; geolocalised search in holdings of the State Archives of Basel-City);
  2. new insights can be generated (e.g. linking of FOEN data holdings or information from the commercial register with statistical key figures from the FSO; integration of semantically enriched archive catalogues in research environments); and
  3. increasing transparency (e.g. tariff of Swiss electricity suppliers; data from electricity market monitoring).

Next steps

The tables above reflect the current status regarding the basic registers and vocabularies that should be made available as Linked Data with the highest priority from a user perspective. In the coming weeks, we will be seeking further input from the Swiss LOD community to add to the tables and the list of possible usage scenarios, so that we end up with a broadly supported and prioritised list of basic registers and vocabularies. In a next step, we will work through this list in dialogue with the data holders in order to take into account not only the dimension of the potential for use but also the evaluation criteria of “feasibility” and “willingness of the data holder” (see Estermann 2019). The result of this next step will be several data sets prepared for LOD, as well as an analysis of the challenges and hurdles with regard to the conversion of further data sets to Linked Data. Based on this analysis, recommendations for further action will then be formulated. The first part of the article has already been published.


Bibliography

Creative Commons LicenceCreate PDF

Related Posts

None found

Data as an innovation driver of the smart city

Data has already been called the “oil of the 21st century”. But data will also form the basis for many processes in the intelligent city of the future – the smart city. This will require a platform in which data from a wide variety of sources – sensors and the Internet of Things, open data, government data, data from social media and other third-party providers – can be processed, linked and analysed to extract valuable information and make it available as Linked Open Data. Based on this, both cities and private providers can offer new-value applications and services; the platform thus becomes a location factor and an innovation driver. With increasing digitalisation, society is facing new challenges. At the same time, increased urbanisation is taking place. The societal challenges thus manifest themselves most clearly in the city: densification, public transport, efficient use of resources such as energy and water, security, and – central to the city dweller – improvement of the quality of life. It is therefore worthwhile to address the societal challenges in the urban environment first; a recent OECD study (OECD, 2015) also refers to “cities as hubs for data-driven innovation”. A research project coordinated by the BFH called “City Platform as a Service – Integrated and Open”, or CPaaS.io for short, was launched in July 2016. The project is a collaboration between partners from Europe and Japan and is funded under Horizon 2020 and by the Japanese NICT. It aims to build a cloud-based platform for cities and urban regions that will provide the basis for urban data infrastructure and innovation. The need for such a platform is supported e.g. by a study (Vega-Gorgojo et al., 2015): The study emphasises that “the city will need platforms that support digitalisation and the use of data, culminating in Big Data”, and that “the smart city must work with platforms on which data can be analysed and shared with other sources” smart-city-innovation The goal of an innovation platform is ambitious. It is not just about realising a technical platform, or connecting complementary technologies such as the Internet of Things, Big Data and Cloud. Other projects do that too. Smart City Innovation means that the platform, or new applications and services based on the platform, provide real added value for society and for the actors in the city – residents, visitors, private companies and the public administration. To achieve this, the platform must be open, both in terms of the integration of other data sources and the access of third parties to the data (keyword: open data), naturally in compliance with data protection. In the urban environment, the integration of open public authority data is of particular interest. The project benefits from the fact that more and more authorities are following this trend and publishing their data on open data portals – in Switzerland, for example, on opendata.swiss, but the city of Zurich is also one of the pioneers in this field. CPaaS.io will go one step further here and also make the relevant data available as Linked Data. This means that the data is semantically annotated and also provided with metadata, e.g. on the provenance and quality of the data. Only this enables a simplified machine integration and use of the data in further applications. This can be used during large events, for example: In which direction do streams of visitors move? How has public transport been adapted to the current situation? How is the system reacting to dangerous situations, accidents, weather conditions, etc.? In order to identify beneficial applications for society, to implement them in the project, and thus to be able to validate the benefits of the platform, the involvement of cities is of central importance. To this end, the project has been able to initiate cooperation with several cities that already have experience in the areas of Open Data or Smart City. In Europe, these are Amsterdam, Murcia and Zurich, and in Japan Sapporo, Yokosuka and Tokyo. Field trials are planned in several of these cities. We are convinced that the longer we have more and more data, the more important it becomes to be able to master the social and economic challenges. Based on data infrastructures like the ones CPaaS.io will deliver, new applications and services will be offered and transparency will be increased. And for cities, this will become an important location factor, because innovative companies will prefer to settle where such platforms are available that they can use to provide their services.


Project details Duration: 30 months. Partners: Bern University of Applied Sciences, AGT, NEC, Odin Solutions, The Things Network, University of Surrey, YRP Ubiquitous Networking Laboratory, ACCESS Co, Microsoft Japan, Ubiquitous Computing Technology Corporation, University of Tokyo. Acknowledgements logo-eu The project is funded by the European Union’s Horizon 2020 Research and Innovation Programme (Grant Agreement n° 723076) and NICT in Japan (Management Number 18302).


Sources

  • OECD (2015). Data-Driven Innovation: Big Data for Growth and Well-Being. Paris: OECD Publishing, p. 379ff.
  • Vega-Gorgojo, G., et al. (2015). Case study reports on positive and negative externalities. EU FP7 Project BYTE, pp. 141 & 138.
Creative Commons LicenceCreate PDF

Related Posts