Saturday, 31 January 2015

Information entropy as an anthropomorphic concept

After this post I wrote a paper on this subject, find it in arxiv.org

One of the most interesting opinions that I have heard about entropy has been formed by E.T. Jaynes and E.P. Wigner (link). According to them, entropy is an anthropomorphic concept in the sense that in a physical system there can be found many thermodynamic systems. The physical system can be examined from many points of view examining different variables each time and calculating entropy differently. In this post and in the paper in arxiv.org that followed it (link), I discuss how I think that we may apply this concept in information entropy, how Shannon’s definition of entropy can fit in Jayne’s and Bank’s opinion. I also present a case in which I have used this combination of ideas in the past in an Android application.
Information entropy has been defined by Claude Shannon (link). It quantifies the amount of information that is transmitted by a channel or more generally the information content that there is in a message M. Message M consists of a set of n symbols, let P(i) be the probability of appearance of any symbol i in M then for all n symbols stands that entropy H = -ΣP(i)logP(i). Probability P(i) equals to the frequency of appearance of i in M.
Let’s generalize this idea. Let S be a set of objects that we may describe their properties using v variables for each one. Let each variable take values from a range [a, b] of discrete values. Now we have a lot of information about S and we can use entropy in order to analyze it. Exactly as Jaynes describes the examination of physical systems we may choose any subset v’ of v in order to examine the information content of S. The choice of v’ depends on how we care to examine S and this provides the anthropomorphic characteristic to our analysis. Using information entropy we can examine the distribution of the properties described by v’ in the objects of S.
For simplicity reasons let us examine S considering two properties for its objects described by variables X and Y. For variable X and for probability P(x) in the appearance of any value x entropy is H(X)=-ΣP(x)logP(x). Same for Y, it stands that H(Y)=-ΣP(y)logP(y). In order to combine these two entropies we have to consider if they are dependent.
As it is known for independent X and Y stands that H(X,Y) = H(X) + H(Y).
For joint probability P(x, y) stands H(X,Y) = -ΣΣ P(x, y)log P(x, y).
For conditional entropy H(X | Y) which stands for the probability P(y, x) when X depends on Y entropy is H(X | Y) = -ΣP(y, x)log(P(y)/P(y, x)).
These considerations about entropy variables are different from the classical definitions of Shannon. Shannon defines conditional probability P(x, y) as the probability of appearance of symbol x which depends on the appearance of symbol y. While the above probabilities consider properties x and y of one discrete object.
Unconsciously I had used in the past all the above for the development of an android application; you may find it here or here. The goal of the application was to rate the strength of passwords for handheld devices. A strong password M must be complex but as we focus in handheld devices the frequency of appearance of each character in M is not the only property we may consider. The user of each device has to change the keyboard appearance of the device in order to write upper case characters, lower case, symbols or numbers. This makes the input of M more complex than in desktop computers and we may consider this property when rating the complexity of M.
The rating of password strength still regards characters of a simple string M but let us look into them as discrete objects of more than one property. Variable X is defined on the set of characters available to the user and P(x) is the frequency of appearance of each character x in M. Variable Y is the property of each symbol of being an upper case character, lower case, symbol or a number. Then P(y) is the probability of y being upper case character, lower case, symbol or number. The two variables are independent; X does not depend on Y. As a result H(X,Y)=H(X) + H(Y). Calculating entropy using the last equation provides a more accurate analysis of the information content of M.
For a more detailed presentation see my paper currently published in arxiv.org.