Decision tree entropy

5/4/2023

As x gets closer and closer to 0, the log(x) will get smaller and smaller and leans towards -infinity. What real number do you need so if you raise 2 to that real number, you would get 0? There is no such real number! Because we can never get 0 by raising anything to the power of anything. Now, with this in mind, what do you think of log(0) with the base of 2 (or with any base for that matter!). For example, log(8) with the base of 2 will output 3 and 2 to the power of 3 will give you 8. Below, you can see all the math involved:Īn observant individual might have noticed that when either of the p(+) or p(-) is 0 we will encounter an interesting situation in computing the entropy! The log(x) when x=0 is not defined! Why? Let’s not forget that Log(x) with the base of a, spits out the power to which you need to raise a to get to x. Finally, by plugging in these critical values for P(+) and P(-) in the Entropy function, we can find the maximum value of the Entropy in the binary classification problem. Then, when we find the critical value for P(+) we can easily find P(-) as we know that: P(+) = 1 – P(-). In order to do this, we will have to take the partial derivative of Entropy with respect to one of the proportions (say P(+)), and set it to 0. I can actually prove to you that the Entropy for a binary classification problem has a maximum of 1 If and Only If p(+) = P(-) = 0.5. Meaning that P(+) = 0.5 which automatically implies that P(-) = 0.5.įinally, in a binary classification problem, if P(+) and P(-) are NOT equal, for example P(+) = 0.7 and P(-) = 0.3, the Entropy is ALWAYS between 0 and 1.

This happens when our set S is EQUALLY divided into positive and negative examples. This is the state of utter confusion and highest disorder and entropy. The entropy function for a binary classification has the maximum value of 1. This is where we have absolute certainty and if you remember in our last post, the leaves of a decision tree are in such state of purity and certainty. So, the members of S are either ALL positive or ALL negative. In a binary classification problem, when Entropy hits 0 it means we have NO entropy and S is a pure set. It seems that this function has indeed a maximum, and it hits 0 ONLY in 2 cases. On the vertical axis we can see the Entropy of the set S. On the horizontal axis we can see the proportion of positive examples (i.e., 1 – Proportion of negative examples) in the set S.

0 Comments

Decision tree entropy

Leave a Reply.

Author

Archives

Categories