Thursday, May 10, 2012

Entropy in Information Retrival

Information Theory

•Value or content of a message is based on how much the receiver’s uncertainty (entropy) is reduced
•Predictability of the message (impact of content)
 –Very predictable – low uncertainty – low entropy
  •Hello, good day, how are you? Fine.
 –Unpredictable – high uncertainty – high entropy
  •Move your car. Leave the building.

Information Content

Function H defines the Information Content:   H(p) = -log p
p is the a priori probability that a message could be predicted
So, if a receiver can predict a message
With p=1 then H(1) =  0
If cannot predict message
Then p=0 and H(0) is undefined

so the smaller p is,  the larger H(p) is
in other words, the less predictable of a message, the more information the message contains

Calculation of Entropy

Receive one letter of the alphabet
H = log 1/26 or 4.7 bits if all equally likely
4.14 bits given known distribution
Given n messages, the average information content (bits) of any one of those messages is
H =

Average Entropy is maximized when all messages are equally likely
When would this occur?

Using Entropy

Information Content is additive
H(p1, p2) = H(p1) + H( p2)
So what??
Google Queries
some terms have more information value
some retrieval messages have more information value

1 comment:

  1. Casinos & Games at Hacksaw Casino
    Hacksaw Casino is the perfect place for true 제주 출장마사지 Vegas style entertainment. 의정부 출장샵 Come out in 거제 출장안마 and join the new 김제 출장마사지 Hacksaw Casino experience 태백 출장안마 in the Hacksaw