2009-07-13

ArticleRead (17): Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web

Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web, By John Goodwin, Catherine Dolbear and Glen Hart, in Transactions in GIS, Vol. 12, Issue suppl.1, 2008. Pages: 19–30

This article reviewed is drawn from the UK Ordance Survey (OS) paper and its presentation that was presented at the Terra Cognita Workshop held in connection with 7th International Semantic Web Conference (ISWC2008). Inevitably some of the “future work” mentioned in this article such as using OWL for domain ontology and ontology modules are now able to be found on their outlets.

Nonetheless, what I have found of greatest interest is that they started their initial work to participate in the Semantic Web by working on Place Name first, encoding spatial information in RDF, using RDF schema to create the ontology. In addition, the OS approach solves some important spatial data problems within the Semantic Web such as that is for end users – this approach supports the query “find me things of type X in [or next to] area y” and that it can be done without the need of geometric computation(Section 3.3). They also provide alternative strategies to tackle confusions caused by owl:sameAs construct which has been suggested by W3C in the identity linking. Instead, they consider to use rdfs:seeAlso or coref:duplicate to bundle URIs that are known to be in some way related together as alternatives to link RDF nodes from different graphs (see P.27). Plus, they plan to build up ”settlement gazetteer” in the “future” is what we have to pay close attention to.

In general, this article is divided into four parts:

(1)introduction and motivation,
(2)the confusion between the administrative geography of Great Britain and unofficial sources (i.e. GeoNames),
(3) the creation of RDF datasets
(4) adding their geo data to the Web of Linked Data.

One of the reasons that the OS develops this Place Name RDF prototype is to investigate the technical challenges and limitations of creating RDF based geo-resources. The RDF approach may offer the potential to solve traditional problems in integrating different relational database schemas or the syntaxes of different file formats, and the chance to provide geospatial data to end user in a more flexible form over the web.

On the other hand, the RDF cannot support any form of spatial indexing, buffering or containment within a user defined area. For a geo data provider, the question of modularisation for the RDF/XML file representation in a manageable and coherent chunk remains unsolved.

A flagship suggestion for the published data to be able to be found over the Web is to refer to the Vocabulary for Interlinked Datasets (voiD). The voiD is an RDF based schema to describe linked datasets. With voiD the discovery and usage of linked datasets can be performed both effectively and efficiently. The heart of voiD has two classes: A dataset (void:Dataset) is a collection of data; and the interlinking is modelled by a linkset (void:Linkset) which is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. (see the png )

The OS experience in linking their data to the Web as a discussion in section 4.2 is refreshing. They raise four issues: identity, modularisation, provenance and authorisation. They argue that in OWL-DL, the owl:sameAs should not be used because for the issue of identity and the semantic accuracy of the links that:

[there is no single common entity or “non information resource” that everyone is mapping back to; instead there are multiple different representations of what may be similar or overlapping concepts. For example, there are many different ways of describing London's spatial extent – official boundaries of Greater London, or a vaguer extent denoted by estate agents or local people. Although in specific contexts, it may be sufficient to state that these are the same thing, in the general case, it is not.] (p.27)

In a critical view on the issue of modularisation, the OS is aware to create small RDF documents containing a description of individual resources, which may include details of other, closely related resources, as called “slicing”. However, they also note that ,for a geo data provider, the question of modularisation for the RDF/XML file representation in a manageable and coherent chunk to help users who need to manipulate large triple sets remains unsolved. The existing solutions include:

(1) repeating the URI in both graph, but this is not recommended in the W3C; or
(2)assigning one graph as the primary dataset, then using rdfs:isDefinedBy to link to the newly minted URIs in the secondary dataset.

Unfortunately, nethither solution is ideal.

Issues of provenance and authorisation of the data has always a problem in data quality. There are several other points in this article that readers may find stimulating and novel and a few that seem more akin to technical design (such as Named Graph or the OAuth protocol rather than CC licesing in the RDF dataset and the SPARQL endpoint of the OS service.