Thursday, May 10, 2012

Entropy in Information Retrival

Information Theory


•Value or content of a message is based on how much the receiver’s uncertainty (entropy) is reduced
•Predictability of the message (impact of content)
 –Very predictable – low uncertainty – low entropy
  •Hello, good day, how are you? Fine.
 –Unpredictable – high uncertainty – high entropy
  •Move your car. Leave the building.


Information Content

Function H defines the Information Content:   H(p) = -log p
p is the a priori probability that a message could be predicted
So, if a receiver can predict a message
With p=1 then H(1) =  0
If cannot predict message
Then p=0 and H(0) is undefined

so the smaller p is,  the larger H(p) is
in other words, the less predictable of a message, the more information the message contains


Calculation of Entropy

Example:
Receive one letter of the alphabet
H = log 1/26 or 4.7 bits if all equally likely
4.14 bits given known distribution
Given n messages, the average information content (bits) of any one of those messages is
H =




Average Entropy is maximized when all messages are equally likely
When would this occur?














Using Entropy

Information Content is additive
H(p1, p2) = H(p1) + H( p2)
So what??
Google Queries
some terms have more information value
some retrieval messages have more information value
 SO??

No comments:

Post a Comment