Measuring Article Quality in Wikipedia: Models and Evaluation

Posted on February 18, 2008. Filed under: Data Quality Measurement | Tags: , |


Authors
- Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady W. Lauw, and Ba-Quy Vuong

Year – 2007

Published inACM Conference on Information and Knowledge Management (CIKM’07)

Linkhttp://www.hadylauw.com/cikm07.pdf

Importance to my Research - Very High

MY REVIEW
The article by Hu et al. (2007) is very intresting and I could’nt stop reading it once I started. It address the highly important issue of assesing quality of Wikipedia articles and proposes few practical solutions which can be used to evaluate article quality. I came across this article on his
website. I also found some other intresting article which I am currently reading, but for now I will write a review on this paper.

The paper proposes three algorithms – BASIC, PEERReview and PROBReview. All these algorihtms basically rely on assessing the “quality of the article” & “authority of contributor” i.e. a person who is regarded to be an authority in a subject area writes an article on Wikipedia, it is considered to be a good article. The authors provide equations to calcuate the same, but since I am not that good at mathematics, all these scary equations freak me out, but I would thank the authors for explaining those equations in a lay mans language, which definitely helped me to understand the idea, much better than the mathematical jargon. But well a research paper without these equations doesn’t look like a scientific publicaiton.

The PeerReview Model includes the expertise of the reviewers who can change the article, make corrections as well as add more information. The additions made by reviewers improve the quality as well as show consensus amongst other authors. Hence this model considers that a reivew from an authority is considered to be valualbe and the quality of the article is analysed under the lights of this parameter too.

The final algorithm termed as PROBReview goes a step further and argues that it is not always possible that one reviewer will review the entire article, hence the PeerReview model is improved by finding the probability of something, not sure what, but hopefully the author would reply to this post and leave some comments.

Based on this understanding I have a few questions:-

  1. What happens if the authority contributors write an article in a different domain i.e. a domain in which they are not expert? does the authority still exists or the authors have not considered domain expertise?
  2. Does fixing grammatical errors like adding commas, semi-colons etc is considered adding quality and if yes are they considered equivalent to words?
  3. Are all words considered in assessing quality or some words like [a, or, in, the etc] are not included, if yes, which words are and which aren’t included and why?
  4. I would suggest analyzing concepts would be more useful than just words i.e. if a reviewer adds new concepts then it would be more useful than simply relying on words that are added or deleted. This may be difficult and more at semantic level, but I think this should be the next logical step in this research

Cite this article as
Review on “Measuring Article Quality in Wikipedia: Models and Evaluation” by V. Potdar, 18th Feb 2008, Available Online http://drvidy.wordpress.com/2008/02/18/measuring-article-quality-in-wikipedia-models-and-evaluation/

ABSTRACT
Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our Basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview model introduces the review behavior into measuring article quality. Finally, our ProbReview models extend PeerReview with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

Important Terms

  • Content Driven Reputation
  • Transaction Driven Reputation
  • Content Survival in Revision History (Text & Edit Survival)
  • NDGC Metric

Related Useful References
Content Quality
D. Anthony, S. Smith, and T. Williamson. Explaining quality in Internet collective goods: Zealots and good samaritans in the case of Wikipedia, 2005. Retireved online: http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf

T. Cross. Puppy smoothies: Improving the reliability of open, collaborative wikis, 2006. Retrieved online:
http://www.firstmonday.org/issues/issue11_9/cross/index.html

J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.

A. Lih. Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. In Proc. of the 5th International Symposium on Online Journalism, April 2004

E.-P. Lim, B.-Q. Vuong, H. W. Lauw, and A. Sun. Measuring qualities of articles contributed by online communities. In Proc. of WI’06, pages 81–87,December 2006.

Content Semantics & Evaluation
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proc. of SIGIR’99, pages 121–128, 1999.

Spam & Security
Z. Gy¨ongyi, P. Berkhin, H. Garcia-Molina, and J. Pedersen. Link spam detection based on mass estimation. In Proc. of VLDB’06, pages 439–450, 2006.

Z. Gy¨ongyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proc. of VLDB’04, pages 576–587, 2004

Semantics
P. Sch¨onhofen. Identifying document topics using the Wikipedia category network. In Proc. of WI’06, pages 456–462, 2006.

M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedness using Wikipedia. In Proc. of AAAI’06, pages 1419–1424, 2006.

Measurement
J. Voss. Measuring Wikipedia. In Proc. of the 10th International Conference of the International Society for Scientometrics and Informatics, pages 221–231,July 2005.

Make a Comment

Make a Comment: ( 3 so far )

blockquote and a tags work here.

3 Responses to “Measuring Article Quality in Wikipedia: Models and Evaluation”

RSS Feed for Vidy’s Blog on Social Software Comments RSS Feed

Hi Vidy,

We are glad to hear your useful comments to our work. With regard to the above posted questions, let me answer them one after another.

Our current approach computes authority and quality for a given collection of articles only. Suppose the article collection represents a domain, the authority derived will therefore be domain specific. Such authority will not be directly applicable to another article collection/domain.

All punctuations are removed in our content comparison.

We remove all stopwords (such as a, an, the, in, etc) as listed in http://snowball.tartarus.org/algorithms/english/stop.txt, which is commonly adopted in NLP research.

We appreciate your pointing out future direction for this research. We will look into the idea of concept modeling further.

Regards,

Hi Meiqun,

Your paper and research is very interesting. I was wondering whether your research team has considered the possibility of evaluating contributor authorities within a domain and its related domains. If so, what are your thoughts about doing this?

For example, information systems (IS) is often considered an interdisciplinary field of study. It borrows ideas and concepts from computer science (CS), software engineering, information technology, business and psychology (i.e. how to deal with people in project management, business analysis etc…) as well as having their own. A quality IS article contributor could arguably have some of their IS derived authority transferred to the CS domain and vice versa for a CS article contributor.


Where's The Comment Form?

    About

    Blog on Latest Research in the field of Social Software & Web 2.0

    RSS

    Subscribe Via RSS

    • Subscribe with Bloglines
    • Add your feed to Newsburst from CNET News.com
    • Subscribe in Google Reader
    • Add to My Yahoo!
    • Subscribe in NewsGator Online
    • The latest comments to all posts in RSS
    • Subscribe in Rojo

    Meta

Liked it here?
Why not try sites on the blogroll...