Hello folks!

Today we are going to see how to rank the reviews using Entropy of the review. For my previous work on the dataset, please check the previous posts.
Now, the main need was to design an algorithm that outputs a single metric that can be used to rank the reviews. What possible metric can be used that totally signifies the information content in a review? Its nothing but Entropy (α Amount of Information in a Review).

Shannon’s information entropy concept is used to measure the amount of information in reviews. For the online review classification problem, the entropy is computed as follows:

Let 𝑆 = {𝑠1, 𝑠2, …., 𝑠q} be the set of categories in the review space. The expected information needed to classify a review ‘i’ is given:


The average amount of information contributed by a term ‘t’ in a class ‘𝑠i’:


Information Gain (derived from entropy) is the expected entropy reduction by knowing the existence of a term ‘t’:


                     41 42 43

  • 𝑃(𝑠i) = Probability of reviews in category ‘𝑠i’ among all reviews.
  • 𝑃(𝑡) = Probability of reviews which contain term ‘𝑡’ among all reviews.
  • 𝑃(𝑠i | 𝑡) = Probability of reviews which contain term ‘𝑡’ and which is included in category ‘𝑠i’ out of all reviews which contain ‘𝑡’
  • 𝑃(𝑠i | t ̅ )= Probability of reviews which do not contain term ‘𝑡’ and which belongs to category ‘𝑠i’ out of all reviews which do not contain ‘𝑡’.

The above mathematical model calculates the reduction of entropy by knowing the occurrence of a specified term. It considers not only the term’s occurrence, but also the term’s non-occurrence. This value indicates the term’s contribution and predicting ability. A word has higher helpfulness gain which means it has more contribution for classification. For a binary classification, this value can be used to measure the amount of contribution of term ‘t’ to a class si.

In this case, there are only two categories, “Helpful” and “Unhelpful”, are considered. Let s1 be “Unhelpful” and s2 be “Helpful”. In order to provide the difference of prediction ability for two categories, we provide the formulation of helpfulness gain which represents a term’s contribution amount to the class of “Helpful” reviews. The helpfulness gain of a term tj is calculated as:


The helpfulness gain of term tj, which represents the importance and the prediction ability of words, is addressed by this equation.

From the discussion in previous section the helpfulness gain represents a words’ ability of correctly predicting a documents allocation to the category of “Helpful” or “Unhelpful” reviews. So, the summarization of the helpful gain of all words in a review indicates the review’s helpfulness. In this approach, the review’s content (words) will be analyzed and the helpfulness gain will be calculated for each word in product reviews. In order to predict the helpfulness of a review ‘di’, the helpfulness score function is as follows:


  • 𝑊 = Number of stemmed words in review ‘di’.


This equation can be seen as the total helpful information delivered by a review document. This function is utilized to model the helpfulness value of reviews. This value may be greater than 1, so a normalization factor is introduced to ensure that the calculated score value remains in the range of {0,1}. As a result, tuples of {‘di’, score(di)} are returned from the algorithm. Finally, the product reviews will be ranked based on their corresponding score(di) values. Reviews with higher score values are more helpful than others.

With a set T of training reviews and a set T’ of test reviews, the helpfulness prediction process is shown as follows:

  • Find the gain values for every non-stop word from T.
  • Calculate the helpfulness score for every review of T’.
  • Normalize the helpfulness score.
  • Sort T’ in descending order based on their helpfulness score.


Also, for evaluation of the proposed ranking algorithm, consider the product: ‘Motorola Bluetooth Smart Controller for Android – Bluetooth Headset – Retail Packaging’, by the proposed algorithm, the following ranking of reviews is seen:

  1. I love this headset. It’s light, has good voice quality, louder earpiece volume than the phone itself (Mot V710) and the battery outlasts the phone.  A previous reviewer mentioned having to carry 2 chargers.  Motorola makes a Y adapter so you can charge both your phone and headset from 1 charger.  I think I paid under $10 for it but don’t quote me.  I love the headset but when I’m in an airport, it picks up the background noise as loudly as my voice.  If it was noise canceling it would have gotten 5 stars.  One last note: My phone doesn’t ring when connected to the headset so you must have the headset on to know you have an incoming call.  This is a complaint about the phone though not the headset. Update 03-05-2005I just bought an HS-810 because I keep getting complaints of background noise being louder than my voice.  I like it better but the 820 does have better incoming sound quality.  But all in all, I like the 810 much better.  The flip open design is especially nice.  Search for the HS810 to see my complete review. Update 07-21-2005My Motorola V710 went in the dunk a few months ago so I had to buy a new one.  The new software allows the phone to ring even when the headset is connected.  Much better.
  2. and i\’m fairly disappointed! i bought this little gem at a cingular kiosk in the mall for $80 and i am thinking it\’s going back before my 30 days are up! in summary, considering bluetooth still has some improvements to make, this is just NOT worth $80, period. even if this device worked flawlessly, it might be worth about $30 at best.here are my gripes:* it says it has a range of about 30 feet. hmmm…not really. i am even using this headset with a motorola phone (V505) and i can get about 20 feet away from it at best before the audio quality goes south. not the biggest deal in the world. even when i am close to the phone however, there can be some interferance (“crackling”) and distortion. usually mild though. it seems like my microwave interferes with it (when it\’s cooking)!* doesn\’t fit snugly on my ear. rather, it kind of “hangs” there. it doesn\’t have any kind of mechanism for adjustment, so here\’s hoping it fits your ear. using this headset while lying down is hard to do; the device doesn\’t seem to want to stay put.* drops the bluetooth connection every now and again, even when i am at close range.the “good”:* looks cool! ;)* battery life seems to be pretty good.* small, lightweighti am going to look into some other options. this simply was a waste of money.


  1. The sounds works fine on most calls. it is not very good on call overseas. people don\’t hear very well. there are many BETTER headsets out there.

Its seen that the first review is positive whereas send is negative, so this algorithm ranks without taking into account the polarity of the review, which is good as we want to read both positive and negatives before buying the product. Also, an interesting observation is that big reviews appear on the top, which is a bonus!

These are all the things that I have explored on this dataset. Hope you all go and find some more interesting insights in ranking those reviews. Get back to me if you do!