Cross browser testing using BrowserStack

It is very rare when building web apps that you can guarantee exactly which operating system, browser and version of that browser, that your users will be arriving on. Even then the chances are that…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Decision Tree in Machine Learning

A decision tree is a Supervised Machine Learning Algorithm that can be used for Classification and Regression problems but is mostly preferred for solving Classification problems.

A Decision tree is nothing but a nested if-else Statement.

image source: Suraj Gusain(Myself)

Terminologies in Decision Tree:

image source: javatpoint

Step 1: Begin with your dataset, which should have some feature variables and classification or regression output.

Step 2: Determine the best feature in the dataset to split the data.

Step 3: Split the data into subsets that contain the correct values for this best feature. This splitting basically defines a node on the tree i.e a splitting point based on a certain feature from our data.

Step 4:Recursively generate new tree nodes by using the subset of data created from step 3.

Conclusion:

Programatically speaking Decision trees are nothing but a giant structure of nested if-else statements.

Mathematically speaking Decision tree use hyperplanes that run parallel to any one of the axes to cut your coordinates system into hyper cuboids.

Entropy is nothing but the measure of disorder or you can call it the measure of purity/impurity.

In simple terms More Knowledge less entropy.

The Mathematical formula for entropy:

Some Important Notes:

2. For a 2-class problem, the minimum entropy is 0 and the maximum entropy is 1.

3. For more than 3 classes the minimum entropy is 0 but the maximum can be greater than 1.

image source :towards data science

From the above diagram, we can say that in our target variable only one class (yes/No) the entropy is 0. If equal no class(yes/no) in our target variable the entropy is 1.

I.G is a metric used to train Decision trees. The metric measures the quality of the split. I.G is based on the decreases in entropy after a dataset split on an attribute. Constructing a decision tree is all about finding an attribute that returns the highest information gain.

image source: Chegg

E(p)=9/14*logbase2(9/14)-5/14*logbase2(5/14)=0.94

Outlook:

Sunny : 3(No class label) 2(Yes class label)

Overcast: 4(Yes class label)

Rain: 3(yes class label) 2(No class label)

Cal. entropy of children:

E(sunny)=0.97

E(Overcast)=0

E(Rain)=0.97

Step 3:

E(Sunny)=5/14 E(Overcast)=4/14 E(Rain)=5/14

Calculate information gain=E(Parent)-0.69

0.97–0.69=0.28

Step 5: Whichever column has the highest information gain the algorithm will select that column to split the data

Step 6: Find Information Gain Recursively.

The decision tree then applies a recursive Greedy Search algorithm in a top-bottom fashion to find I.G at the entry-level of the tree.

Once a leaf node is reached (Entropy =0) no more splitting is done.

Gini Impurity:

Gini impurity is a function that determines how well a decision tree was split. Basically, it helps us to determine which splitter is best so that we can build a pure decision tree. Gini impurity ranges values from 0 to 0.5.18

If you are working with a large dataset then use Gini Impurity. Because Gini impurity is computationally fast as compared to Entropy.

Entropy Gives Balance Split.

2. Minimal data preparation is required

3. The cost of using the tree for inference is logarithmic in the number of data points used to train the tree.

2. Prone to errors for imbalanced data

That’s all for the understanding of the Decision tree Algorithm in Machine Learning! Keep your eye out for more blogs coming soon that will go into more depth on specific topics.

If you enjoy my work and want to keep up to date with the latest publications or would like to get in touch, I can be found on Medium at SURAJ GUSAIN — Thanks!

If you like this post, a tad of extra motivation will be helpful by giving this post some claps 👏. I am always open to your questions and suggestions. You can share this on Facebook, Twitter, and Linkedin, so someone in need might stumble upon this.

You can reach me at:

Happy Learning:)

Add a comment

Related posts:

Ensayo Final

En el campo contable existe una discusión respecto a la regulación, los intereses y la información que deben revelar las administraciones públicas, pertenecientes al sector dentro del que se…

Make Money from YouTube with Zero Skills

During this pandemic, most of us are working from home, and may be thinking of generating some passive or additional income by posting videos on YouTube, Writing Blogs, Finding freelancer jobs, etc,……

Stop Ignoring People With Your Technology

We walk down the street with headphones in our ears. We sit in waiting rooms and scroll on our phones. These behaviors are not just rude, they are distracting. Literally, you are distracting yourself…