Review Spam Detection

Posted on March 5, 2008. Filed under: Spam |

Authors – Nitin Jindal and Bing Liu

Year – 2007

Published inProceedings of the International World Wide Web Conference Committee (IW3C2). 

Link - http://www.www2007.org/htmlposters/poster930/

Importance to my Research - High

MY REVIEW
In this paper, the authors make an attempt to study a new cateogory of spam i.e review spam and also provide insights into a detection approach to identify such spam. The review spams are divided into two categories

  1. Duplicate Reviews – Duplicate or nearly similar reviews about the same product or multiple products.
  2. Spam Reviews – Fake reviews or reviews which are falsified or non-trustworthy.

The authors propose to use Shingle method to detect duplicate reviews. Detecting duplicate content or near duplicate content has become a major concern with the flourshing of blogs and forums. Authors who spend real time and effort to compile a good article and post it on the Internet are at the risk of copyright infringments.

So far the issue of copyright infringments had been a major concern for  music and film industry, but it is becoming more and more prevenlant in the text domain as well. Digital Watermarking algorithms for text need to be developed to address this issue, however the success for this technology would be quite limited as digital watermarking relies on hiding copyright signal in noise, which is not available in text, as much as in audio or video files. So implementing shingle method should do as of now. This can only detect duplication not prevent duplication.

I liked the authors idea of treating duplicate reviews as positive training examples of spam and use that to model the features of non-duplicate reviews. I guess this approach made their job a bit easier, because finding duplicate reviews is easy using Shingle Method, and assuming that the duplicate reviews would correspond to spam is an educated guess, which would work on most occassions as spammers dont have time to write fresh reviews, since it would not be cost-benefitial to them.

more details coming soon…….

Cite this article as
Critical Review on “Review Spam Detection” by V. Potdar, 29th Feb, 2008. Available Online – http://drvidy.wordpress.com/2008/03/05/review-spam-detection/

Abstract
It is now a common practice for e-commerce Web sites to enable their customers to write reviews of products that they have purchased. Such reviews provide valuable sources of information on these products. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions. In this paper, we make an attempt to study review spam and spam detection. To the best of our knowledge, there is still no reported study on this problem.  

 Important Terms

  1. Opinion Based Applications
  2. Review Spam
  3. Logistic Regression
  4. 2-Class Classification Problem
  5. Naive Bayes (Text Classification)

Reference Sheet
Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This gives good incentives for review/opinion spam [8].

There are generally two types of spam reviews.

  1. The first type consists of those that deliberately mislead readers or automated opinion mining systems by giving undeserving positive opinions to some target products in order to promote them and/or by giving unjust or malicious negative reviews to some other products in order to damage their reputation.
  2. The second type consists of non-reviews (e.g., ads) which contain no opinions on the product. 

Review spam is related to but also different from Web or email spam.

  1. The objective of Web spam is to attract people to some target pages by manipulating the content of the pages and/or their link structures so that they will be ranked high by search engines. Spam emails are mainly ads.
  2. Spam reviews are very different as they give false opinions, which are much harder to detect even manually. Thus, most existing methods for detecting web spam and email spam [3, 7, 9, 11] are unsuitable for review spam.

We discovered that spam activities are widespread. For example,

  1. We found a large number of duplicate and near-duplicate reviews written by the same reviewers on different products or
  2. by different reviewers (possibly different userids of the same persons) on the same products or different products.

We propose to perform spam detection based on duplicate finding and classification. For classification, we regard spam detection as a 2-class classification problem, spam and non-spam.

To build a classification model, we need labeled training examples of spam reviews and non-spam reviews. Recognizing whether a review is a spam review or not is extremely difficult by manually reading the reviews because one can carefully craft a spam review which is just like any other innocent review and the number of spam reviews is also small.

We tried to read a large number of reviews and were unable to identify reliable spam reviews except finding a few obvious advertisements, which are irrelevant to the products being reviewed and contain no opinions. Thus, other ways have to be used to find training examples. 

Useful References
 

Useful Links
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 

Make a Comment

Make a Comment: ( 1 so far )

blockquote and a tags work here.

One Response to “Review Spam Detection”

RSS Feed for Vidy’s Blog on Social Software Comments RSS Feed

[...] of all Dr. Vidy Potdar wrote a great review on this paper on his blog. After I wrote my review I realized that. so here is my [...]


Where's The Comment Form?

    About

    Blog on Latest Research in the field of Social Software & Web 2.0

    RSS

    Subscribe Via RSS

    • Subscribe with Bloglines
    • Add your feed to Newsburst from CNET News.com
    • Subscribe in Google Reader
    • Add to My Yahoo!
    • Subscribe in NewsGator Online
    • The latest comments to all posts in RSS
    • Subscribe in Rojo

    Meta

Liked it here?
Why not try sites on the blogroll...