2008-12-22

ArticleRead (14): DBpedia: a nucleus for a web of open data

DBpedia: a nucleus for a web of open data, By S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives, The 6th International Semantic Web Conference (ISWC 2007) Busan, Korea, November 2007, in LNCS 4825, pp.722-735


The last 30 years have seen a number of attempts by computer scientists with an interest in information integration research and proceeded alongside efforts in Semantic Web with associated technology developments. However, the current Web is still challenged by these tasks. Auer et al. (2007) in this article attempt to integrate information from across various web systems and make Wikipedia information a machine-readable representation both in structural formats and semantic data sets.

The authors provide a relatively comprehensive overview of existing problems and challenges such as:
(1) Web information has not been fully accessible to a general audience
(2) inconsistency, ambiguity, uncertainty, and data provenance of grass-roots data
(3) the need of using collaborative sharing of dynamic data approaches to build the Semantic Web in grass-roots-style
(4) the need of a new model of structured information representation and management


Extending concepts and approaches from the W3C Linking Open Data community project and extract structured information from Wikipedia, the authors argue in favor of a triple model of Resource Description Framework (RDF) that provides a flexible data model for representing and publishing information on the Web. RDF is a basic foundation to give one or more types to a resource set in triples: (subject, predicate, object) or (subject, property, property value) . RDF triples extracted from data sets, in this DBpedia model, are basic components that can be shared, exchanged, and processed queries in a variety of Semantic Web applications.

Several of the most valuable datasets including articles described with concepts, Infoboxes (data attributed for concepts), categories or article categories using SKOS, Yago Types (instances using YAGO classification), internal page links, as well as RDF links, are provided for download as a set of RDF files which are identified by their own URI reference.