From Dataset to Knowledge Graph: The “Chronology of Events 1940-1944” at the Academy of Athens (poster)
The paper deals with the curation of a collection created by the Modern Greek History Research Centre of the Academy of Athens in Greece. The collection is based on an archival series of the British Foreign Office from the period between the outbreak of the Greek-Italian War (October 1940) and the end of the Nazi occupation in Greece (October 1944). Based on this material, the researchers have compiled a detailed chronology, which records specific events, but also information exchange, proposals, plans and reports by Greeks, British, Allies and resistance organizations. The Chronology was published in two volumes.[1] The entries were also captured in a database that allows multiple searches and extraction of data beyond the indexes contained in the publication. Yet this digitized collection had remained a closed corpus of information, without any interconnection with relevant collections of other institutions.
Three years ago, the Chronology was incorporated in the action “The 1940s in Greece” of the project APOLLONIS, in the framework of which the partners of the project, based on the metadata of various collections pertaining to that period, created common indexes and knowledge bases to increase the collections’ accessibility and interoperability, and to familiarize researchers with the possibilities of datasets curation.
Οne of the outcomes of this action was a linked data-based knowledge graph representation of the collections to support expressive semantic queries. The first step for the construction of the knowledge graph was the production of linked data representations of the collections using a mapping tool.[2] The mappings used both standard and custom vocabularies to cover collection-specific modeling needs. The second step was the automatic enrichment of the items in the knowledge graph with annotations, which were produced for three semantic dimensions (place, time, person/organization) by applying NERD tools on the relevant fields. For the Academy of Athens Chronology, these tools produced Geonames, TimeLine[3] and Wikidata annotations. The resulting knowledge graph was further extended with relevant vocabularies (eg. DBPedia, Greek Historical Periods[4]) by including cross vocabulary equivalence alignments for identical terms and containment alignments for temporal terms.
Through these annotations, the knowledge graph allowed unified multi-vocabulary searches over all collections, expressed in any of the supported vocabularies. To exploit the knowledge graph, a search application was built that supported combined cross collection queries over the three dimensions. Through limited reasoning support, the application was able to also answer semantic queries by exploiting the vocabulary hierarchies and other term categorizations.
Practical evaluation showed that, although the quality of the results depended on the quality of the automatically produced annotations, and the complexity of the queries affected performance, knowledge graph technologies can enable researchers to locate relevant material through expressive queries exploiting the levels of abstraction provided by the underlying knowledge. For the Academy of Athens, the outcome was particularly fruitful: firstly, its collection participated in a rich knowledge graph that facilitated research on the specific period; secondly, interconnectivity highlighted documentation inconsistencies that have to be adjusted to improve the quality of semantic queries.
[1] M. Spiliotopoulou & P. Papastratis (eds), Chronology of Events 1940-1944. From the Documents of the British Foreign Office, two vols, Athens 2002 and 2004 (in Greek).
[2] A. Chortaras, G. Stamou, D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs, LDOW@WWW 2018
[3] TimeLine vocabulary, http://sw.islab.ntua.gr/timeline/
[4] Greek Historical Periods vocabulary, Greek National Documentation Center, https://www.semantics.gr/authorities/vocabularies/historical-periods/vocabulary-entries