2008-12-22

ArticleRead (14): DBpedia: a nucleus for a web of open data

DBpedia: a nucleus for a web of open data, By S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives, The 6th International Semantic Web Conference (ISWC 2007) Busan, Korea, November 2007, in LNCS 4825, pp.722-735


The last 30 years have seen a number of attempts by computer scientists with an interest in information integration research and proceeded alongside efforts in Semantic Web with associated technology developments. However, the current Web is still challenged by these tasks. Auer et al. (2007) in this article attempt to integrate information from across various web systems and make Wikipedia information a machine-readable representation both in structural formats and semantic data sets.

The authors provide a relatively comprehensive overview of existing problems and challenges such as:
(1) Web information has not been fully accessible to a general audience
(2) inconsistency, ambiguity, uncertainty, and data provenance of grass-roots data
(3) the need of using collaborative sharing of dynamic data approaches to build the Semantic Web in grass-roots-style
(4) the need of a new model of structured information representation and management


Extending concepts and approaches from the W3C Linking Open Data community project and extract structured information from Wikipedia, the authors argue in favor of a triple model of Resource Description Framework (RDF) that provides a flexible data model for representing and publishing information on the Web. RDF is a basic foundation to give one or more types to a resource set in triples: (subject, predicate, object) or (subject, property, property value) . RDF triples extracted from data sets, in this DBpedia model, are basic components that can be shared, exchanged, and processed queries in a variety of Semantic Web applications.

Several of the most valuable datasets including articles described with concepts, Infoboxes (data attributed for concepts), categories or article categories using SKOS, Yago Types (instances using YAGO classification), internal page links, as well as RDF links, are provided for download as a set of RDF files which are identified by their own URI reference.

2008-12-17

ArticleRead (13): User Experience at Google: focus on the user and all else will follow

User Experience at Google: focus on the user and all else will follow. by Au, I., et al. (2008) In CHI 2008 Proceedings Extended Abstracts, ACM Press (2008), pp 3681-3686

Which research approaches should ensure that user experiences are interpreted to reflect underlined norms of online users, and promise a better identification for designers to predict user behaviors in the system design process? The case of Google in this article demon
strates a multi-method of user experience based on its corporate philosophy: “Follow the user and all else will follow”.

On the one hand, Google have traditionally sought to adopt their data-driven approach by applying web analytics of quantitative investigation in reflecting what is happening. On the other hand, built on the qualitative approach, Google interpret contextual factors of why users interact with the system designs via field research, diary studies, face-to-face interview. Such an approach is applied by the Google user experience (UX) team in exploring user behavior of Google Maps for the mobile application. They follow a method called Mediated Data Collection approach, in which participants and mobile technologies are assumed to mediate data collection about use in natural settings. Therefore, methods such as prior research on log analysis, recorded usage, focus group study, or field trial, telephone interviews, lab debriefs are combined to utilize the investigations on user behaviour.

This article stresses the bottom-up company culture as a key for designers and project managers to understand the essence of user experience. Three techniques are employed by: (1) injecting the corporate DNA to educate and train engineers and PMs about user experience (i.e. the ‘Life of a User’ training program and ‘Field Fridays’) (2) scaling to support hundreds of projects by UX team (3) helping focus projects on user needs by UX team or user research knowledge base.

Unlike the traditional desktop software design updated on annual basis, Google UX team practices some agile techniques to respond the rapid web cycles. For examples, solutions include guerilla usability testing, prototyping on the fly, online experimentation or enabling a live instant messaging dialogue between observers and moderator during lab-based testing.

The above three approaches are also combined with a global product perspective of designing for multiple countries. In sum, these 4 combinations of the Google case provide us an alternate analytic framework, and best enlist the methodologies for the studying of online user experience practically and implicitly.



2008-12-09

ArticleRead (12): The credibility of volunteered geographic information


The credibility of volunteered geographic information, By Andrew J. Flanagin and Miriam J. Metzger, in GeoJournal (2008) 72:137–148


The study of Flanagin and Metzger (2008) was to exam the issues of information and source credibility in the context of volunteered geographic environment (VGI).

VGI with its similar concepts such as GIS/2, neogeography, or ‘‘geography without geographers’’ has been regarded as an extension of public participation geographic information systems (PPGIS); collaborative GIS; participatory GIS; Community Integrated GIS (CIGIS) to the general public. While the advance of social computing has parallel effects on the production and availability of user-generated geo data, the need to re-conceptualize the traditional definitions of information and source credibility has been proposed here.

The credibility of VGI is strongly suggestive based on two concepts from Goodchild(2007)'s "humans as sensors" as well as the perspective of social science which the credibility is "a subjective perception on the part of the information receiver". In contrast, the credibility of VGI taken from the notion of "credibility-as-perception" is functioned as the relatively objective properties of information, rather than "a subjective perception" while compared with the traditional geo information formed by a few individual authority perceptions.

The overall recommendations for the credibility judgments of VGI are listed eight points in the figure below. Research directions such as: on the user motivations; "credibility transfer" phenomena (geo data has been perceived more objective than other forms of user-generated data); market implications; measurement issues (e.g. the provenance of VGI); or the effects of VGI on the social, educational, and political contexts are suggested.


2008-11-04

ArticleRead (11): Assertion and authority: the science of user-generated geographic content

Assertion and authority: the science of user-generated geographic content. By M.F. Goodchild (2008) Proceedings of the Colloquium for Andrew U. Frank's 60th Birthday. GeoInfo 39. Department of Geoinformation and Cartography, Vienna University of Technology.

Goodchild's article examines what user-generated geo information (volunteered geographic information/ VGI) are different from traditional & professional geo information. The main assumption Goodchild has proposed is that the individual is similar to an expert in the geography of his or her activity space, based on the concept of individual geographic familiarity. The geo production contributed by general people (neo-explorers who mainly take the inductive role ) are in the levels of raw data observations and information for specific use, while the level of geo knowledge are produced by professionals (taking both inductive and deductive roles of empiricism) through theories, models, and formal procedures conducting analytic capabilities and functions.

As the traditional data quality can be controlled by the authority of mapping agencies through formal spefifications, production mechanisms, and programs or project control, Goodchild suggests two mechanisms for VIG quality control. First, he notes the value of local expertise in the sense of community mapping while national mapping agencies ignore them in the mapping and editing process. Second, he offers a structure of data editing process by building several hierarchy of editor levels based on the use of local expertise to exam the data quality. As a better framework through semiotic analysis can provide a systematic structure of this issue, we close our review with a table utilizing semiotics to fully understand the potentials and implications of this article.


2008-10-02

ArticleRead (10): Patterns of Information Search and Access on the World Wide Web


Patterns of Information Search and Access on the World Wide Web: Democratizing
Expertise or Creating New Hierarchies? By Alexandre Caldas, Ralph Schroeder, Gustavo S. Mesch, William H. Dutton, Journal of Computer-Mediated Communication 13 (2008) 769–793



The patterns of information search and access are found to have no power law distribution on the Web scale, but this result is not consisted in the case of small clusters of expertise within a narrower scale.
There are three propositions of arguments on what patterns of access to information over the Web: Power Law (winner-take-all); Non Power Law (egalitarian effect of search engines); Benkler’ complex interpretation (The Wealth of Network).

To investigate the extent that the use of alternative search technologies decentralizes access to scientific knowledge, Caldas et al (2008) focus on quantitative methods for analysing the Web in terms of the structure of hyperlinks among Web resources. Together with the qualitative interviewing approach, they test two hypothesis: (1) different search engines result in different outcomes; and (2) centrality, connectivity and subgroup structure can be used to identify regularities and patterns of the Web.


Two hypothesis have been tested positively on the Web scale through comparing six main search engines (Google, Yahoo, MSNSearch, AskJeeves, Gigablast, and ScholarGoogle) with a set of three keywords for each six global topics (climate change, poverty, HIV/AIDS, terrorism, trade reform, and Internet and society). Regularities of results in Web Networks show that the average distance among reachable pairs is small, and the density within link blocks is relatively low. Therefore, they view this fragmented nature of these Web network as the evidence of “democratization” of the Web. Moreover, the significant clustering process, which defines a clique as a subset of the network with at least 3 nodes interlinked, is the main reasoning for the “reinforcement (clique) effects and not the power law effect.


2008-04-30

ArticleRead (9): Recent Contributions to the Mathematical Theory of Communication

Recent Contributions to the Mathematical Theory of Communication,By Warren Weaver, In Claude Shannon, A Mathematical Theory of Communication,IL:The University of Illinois Press,1949

What are concepts the theory develops?
  1. Deal with the statistical character of a whole ensemble of messages
  2. In statistical terms the two words information & uncertainty are partners
  3. “missing information”*==* the entropy *==* the language of arithmetic *==* the language of language
Indentify three levels of communication problems as table 1 shows:


Schematic diagram of a general communication system and three main categories of communication systems classification as figure 1 shows:


Schematic diagram of a general communication system: minor additions for Level B as figure 2 shows:


Note: This review was mainly completed as a homework while taking the Humanity Informatics Class lectured by Professor Ching-Chun Hsieh in Dec. 2006.

ArticleRead (8) :Information as sign: semiotics and information science

Information as sign: semiotics and information science, By Douglas Raber & John M. Budd, Journal of Documentation, 2003, 59, 5, pp.507-522

By the definition of Linguistic Sign from Fernand de Saussure (1959), Raber and Budd (2003) try to define information in two parts as “Text” and “Content” in parallel with Saussure’s sign (Signifier and Signified). Table 1 shows how Raber and Budd taking information as Saussure’s linguistic Sign. Table 2 illustrates why semiotics can be applied in information science.

Table 1: Information as Saussure’s Sign

Table 2: Information Science and Semiotics

Note: This review was mainly completed as a homework while taking the Humanity Informatics Class lectured by Professor Ching-Chun Hsieh in Jan. 2007.

ArticleRead (7): Four Ethical Issues of the Information Age.

Four Ethical Issues of the Information Age. By Richard O. Mason., MIS Quarterly, Vol.10 No.1,pp.5-12. Mar., 1986

Ever since Mason (1986) proposed to apply Privacy, Accuracy, Property and Accessibility (PAPA) to be the guiding principles of ethical issues in the information age, PAPA has been used widely in studies such as human behavior and information technology; information management, organization science, as well as the foundation of information security system designs. PAPA was ahead in its time, and still remains great impacts for us to take a profound thinking today by its three basic questions raised in the article:
  1. Whether the kind of society being created is the one we want?
  2. Should we pay sepcial concern on the PAPA issues since we are in the forefront of creating this new society?
  3. Although information shape the intellectual capital, the weakness of building intellectual capital is that people's intellectual capital will decrease:
  • whenever they lose their personal information without being compensated for it,
  • when they are precluded access to information which is of value to them,
  • when they have revealed information they hold intimate, or
  • when they find out that the information upon which their living depends is in error.

Problems and Issues have been identified in table1; the final “should” and “should not” with some cases analysis for PAPA guiding principles are proposed in table 2. Until today, we have seen how information technology progress has been phenomenal. We also have been challenged by information ethical crisis we never had before. Could it be that a balance between human sentiment, issues of law and justice, and moral or ethical concerns emerged within PAPA of information age, we are still in the process of wondering, indeed.

Table1: Issues and Problems of PAPA


Table2: A question of Should or Should Not

Note: This review was mainly completed as a homework while taking the Humanity Informatics Class lectured by Professor Ching-Chun Hsieh in May 2007.

2008-03-31

ArticleRead (6): The folksonomy tag cloud: when is it useful?

The folksonomy tag cloud: when is it useful? By James Sinclair and Michael Cardew-Hall ,Journal of Information Science 2008 34: 15-29

With the assumption of folksonomy systems affecting user perceptions and patterns, it is interesting to see what empirically can be found from a user interface point to see how tag cloud impact users in information exploration. Sinclair and Cardew-Hall in this paper clearly conclude their findings in small-scale enterprise context, which supports arguments of Mathes (2004) and Brooks & Montanex (2006), that the usability of tag cloud is a social navigation aid tool when broad, general or vague information exploration is taken up. Increasingly, evidences from empirical survey support the function of tagging for broad categorization. [e.g. Noll and Meinel, 2007]

A proper appreciation of this research with which the need to evaluate Tag Cloud in its usability is asserted in its visual summary design, and its ability to serve for non-specific information discovery. Such results are also given weights to a substantial literature reviews of many pro-and-con characteristics of tag clouds. Here, we try to summarize both ends in usability and sociability analysis in the table below.


A very interesting section of this article is that: only 2 out of 89 participants with high level computer background in their experiment are familiar with the tagging mechanism. This percentage is surprisingly low while comparing to the overview that almost one third of online American users have used tagging mechanisms. Out of most curiosity is that since this study is a research on user patterns and perception, there is a missing data analysis to undertake. While the study has concluded that the cost of a query is reduced in the tag cloud scenario (compared with more typing efforts in search box), should the mean tags tagged per article of each participant need to be considered as one of the factors in the cost analysis? Indeed, this remains a question to explore.


2008-03-25

ArticleRead (5): Clustering versus Faceted Categories for Information Exploration

Clustering versus faceted categories for information exploration, By MA Hearst, in Communications of the ACM, Volume 49 , Issue 4 (April 2006)

Based on usability perspective, this paper reveals the complex of two grouping mechanisms: clustering and faceted classification.

Traditional top-down and predefined methods like clustering approaches have benefits in their algorithms and automaticabilities while in some bottom up user-oriented methods, the hierarchical faceted categories (HFC) as the author has proposed in particular, is in favour of locating user interest through some manual setting of category hierarchies which are associated with multiple facets.


This paper first discusses some advantages and disadvantage of clustering. Simple clustering algorithms for designers and clarifying vague queries for users by returning the dominant themes as results are main reasons lead designers to take the clustering approach. However, empirical evidence does not support these usabilities. Second, the author explains why clustering method is not a useful and effective tool in information exploration and proposes the hierarchical faceted categories (HFC) approach with an introduction to their prototype: The Flamenco Open Source faceted classification project.

Table 1 shows the comparision of clustering and faceted classification

2008-03-06

ArticleRead (4) : Collaborative Tagging and Semiotic Dynamics

Collaborative Tagging and Semiotic Dynamics, By C Cattuto, V Loreto, L Pietronero ,Arxiv preprint cs.CY/0605015, 2006
From the Cover: Semiotic dynamics and collaborative tagging, Proceedings of the National Academy of Sciences, 2007 - National Acad Sciences


In “Collaborative Tagging and Semiotic Dynamics”, Cattuto, Loreto and Pietronero set down what a user pattern looks like in a social tagging system through empirical statistic analysis of tag co-occurrence. The Yule-Simon model on probability and statistics basis has been used to investigate the long-term memory of users’ tag-vocabulary activities in one of the social tagging system, del.icio.us. A semiotic conceptual model for the tri-partite graph to structure a post as (user, resource,{tag}) is proposed. Therefore, the tri-partite concept which is original from semiotic dynamic literatures is illustrated in the tile as a highlight.

In order to overcome the need for complexity of experimental data, this analysis procedure employs a tag-centric construction view on del.icio.us system. By factoring out two parameters of (users, resource) and adding the set of time parameter from the post, the results of co-occurrence of tagging activities are shown to be consistent with available theoretical calculations in Power Law and Zipf’s Law. Typical applications of utilizing these two statistical theories are well recognized in phenomenon analysis such as in natural language; self-organization and human activity; access patterns; as well as memory–kernel of cognitive psychology. This joint experiment with the well-proved theories offers an alternative method to explore social tagging phenomenon, and for our review to add value on their ideas and research attentions on user behaviors and semiotic concept.

Controversially, however, the research method is likely to be criticized from the semiotic point of views.

First of all, the confusion of two semiotic schools is presented. The authors intend and develop the tri-partite graph from semiotic dynamic concept which follows Charles Sanders Peirce’s (1839-1914) sign theory remarkably in its basic triadic relation within a sign, namely (Represent, Object, Interpretant). The authors have attempted to adopt this triadic elements and rephrase them from (forms {words}, referents {objects}, meanings {categories}) of Steels and Kaplan (1999) to (user, resource,{tag}) in social tagging concept.

However, the authors’ reference of semiotic dynamic is the work of Ke et.al (2002) who adopts the
Ferdinand de Saussure (1857–1913) school of semiotics which takes a sign being constructed within a dual relation (signifier, signified). Since these two semiotic schools have been in debates for decades, the authors adopting these two papers as their definition for semiotic dynamic may lead to confusion in general.
Picture 1: Steels and Kaplan (1999)'s Semiotic Dynamic

Secondly, their proposal for the tag-centric calculation method contradicts their own arguments favoring semantic context. In such context, semantic meaning is supossed to deal with the same Object (the same resource / bookmark in this research) to investigate the relation between different users and users' tags on their co-occurrence. Picture 1 shows the original method for the co-occurrence of items for their semantic meaning in Steels and Kaplan (1999) ‘s semiotic dynamic which the authors have cited from. Picture 2 shows the authors' method for calculating the co-occurrence of tags. Different objects (resources /bookmarks refering to) and different users(interpreters) are ignored in this case. The main focus is on the different tags’ relations, namely frequency and co-occurrence especially in low-high rank tags. Note that our review is not to argue that the authors’ work cannot result in user’s activity patterns since the Power Law and Zipf’s law have been well-proved in such domain. To be specific, if the authors’ work is not in the semiotic dynamic domain, the contradiction may not be this tremendous.



Picture 2: the authors' semiotic dynamic?

2008-02-22

ArticleRead (3): A Definition of Information

A Definition of Information , By A.D. Madden, Aslib Proceedings vol. 52, No.9, p.343-, 2000.10.

In his article A.D. Madden has drawn some attention to the interpretation of information in the aspect of context.

After reviewing literatures defining information: as a representation of knowledge; as data in the environment; as part of the communication process; as a resource or commodity, the author has an attempt to further defining “information” in a perspective of “informing contexts”.

Three major elements in his Information-in-Context Model are defined as: “authorial context” which is a message being originated, “readership context” which is a message being received and interpreted, as well as “the message” which is the information being transmitted.

The idea of taking information reception and interpretation within personal and community paradigms in social-cultural contexts is valuable for most understating of the definition of information. However, the author rephrases the definition of information for the context-reliant model of information reception in the conclusion without clear explanation about “stimulus”, “system” and “system relationship” . The rephrases of the definition makes the information more blur in the end.

The general idea of Madden’s definition of information can be summarized as the figure shown.

Note: This review was mainly completed as a homework while taking the Humanity Informatics Class lectured by Professor
Ching-Chun Hsieh in December 2006.


2008-02-21

ArticleRead (2): Semiotics and Programming Languages

Semiotics and Programming Languages” By H. Zemanek,, In Communications of the ACM, vol. 9, no. 3, Mar. 1966, pp. 139-143.

Almost, a half-decade ago, in the 1965 ACM Programming Languages and Pragmatics Conference, we have Heinz Zemanek’s one article highlighting the issue of semiotics particularly in the pragmatic aspect as a relation between programming languages (PL) and their application fields.

Taking sign theory of the logic schools from two Charles: C.S. Peirce and C.W. Morris, Zemanek has adopted semiotic concepts and terminologies to programming languages, i.e. syntax, semantics, and pragmatics.

According to Morris’s pragmatic definition, which is different from Peirce, – the study of the relation of signs to interpreters, Zemanek argues: “There is always pragmatics because there is always an observer and because no language makes sense without interpretation.” The justification of pragmatics has been made by the existence of interpreters, interpretation and their relations to the PL.

Two types of pragmatics have been further identified as: “the mechanical pragmatics” and “the human pragmatics” since the principle of PL from Zemanek’s definition is the communication of programs between computers, from man to computers, from man to man, as well as from man to himself.

One issue concerning Zemanek’s prediction about “the central application of pragmatics around the computer” that deserves to be stressed is to make it possible for computer to “speak more and more and to restrict the human user in the practical situation to point at YES or NO, or some more equally simple choices, while the computer talks.” The interest in the study of how semiotics advance the computer and programming languages to reach this goal reveals a compelling vision to be crafted further on.

Outline:


2008-02-14

ArticleRead (1): The pragmatic web: a manifesto

The pragmatic web: a manifesto, By Mareike Schoop, Aldo de Moor, and Jan L.G. Dietz, Communications of the ACM, May 2006 Vol.49, No.5

This article is composed of 7 paragraphs. From a context driven perspective, the authors support some preliminary thoughts of Pragmatic Web of Munindar Singh (2002). A similar sense of pragmatics in essential issues of context-based, community as well as collaboration structure has advanced two International Conferences on the Pragmatic Web with this Manifesto in 2006 and 2007, and is supposed to be further advanced at the 3rd Conference in Uppsala, Sweden, 2008.

In the first place, the article starts with existing Semantic Web problems such as complex format of RDF and ontology, as well as the insufficient context-free assumption which may not satisfy Web functions in communication, consensus building, and cooperatively modifying ontologies. It then shifts towards the crucial challenge of how to build a socio-technical infrastructure to leverage the Web from Semantics to Pragmatics.

In particular, authors are devoted to the concept which ontologies co-evolve with their communities of use, and within conversation between communities in practice. Thus, the aim of Pragmatic Web is to increase human collaboration effectively by proper technologies. Some proposals for the implementation of this Pragmatic Vision has been drawn on, for instance, building systems :
(1) for ontology negotiation
(2) for ontology-based business interaction
(3) for pragmatic ontology building efforts in communities of practice, in other words, for goal-oriented discourses in communities.
This is maybe the strongest part of the article, and it was later then the most cited Pragmatic Vision for scholars of interests.

It is somewhat, however, not clear that the conclusion part brought out a theoretical foundation in the language-action perspective as an analysis approach for the Pragmatic Web. By contrast, Singh (2002) did go back to the original pragmatic scholarship for its foundations of the theory of Signs by Charles Morris. However, it is not possible to do justice to whether the importance of Sign theory (Semiotics) should be or should not be in the Pragmatic Web research agenda in only one sentence or one paragraph. Instead, the pragmatic research should draw attentions to questions such as: how the Semiotic School find a place for the Web Science, and how the Web Science Community re-negotiate and collaborate with the Semiotic communities to work out the definition of Pragmatic Web in practice.