Decision Tree Algorithm

Data in itself is merely a number, so is age, yet we make a big fuss about it.No matter how old you are, you are just one step away from Victory.

Before we could go in understanding what is Decision Tree Algorithm, it is imperative to refresh the Basics of Machine Learning first, when it comes to the types of problems handled.

There are three basic categories of Problems that you will encounter in Machine learning. They are Classification, Regression and Clustering

Classification problems: In Classification problems you would have to make instantaneous decisions like, “Yes” or “No”, “good” or “bad”, “accepted” or “rejected”, “true” or “false”

Regression problems: You know the stock value of a product today, you should be able to predict the stock value of the product 2 weeks from now. In other words, the data is continuous in nature.

Clustering problems. Here the data ought to be clustered. Example, most visited discount shelves in a supermarket. In online shops, people who buy Ties, also bought, cuff links and tie pins. The products are organized in specific pattern, example: like purchase orders or choice preference of customers.

The 4 main problem solving techniques used in solving Classification Machine learning problems

  1. Naive Bayes
  2. Logistic Regression
  3. Decision Tree
  4. Random Forest

The first two are employed for non complex data sets, while the other two are used for complex data sets.

In this post we shall primarily focus on Decision Tree Algorithms, starting with its definition

What is a Decision Tree Algorithm

Decision Tree is a tree shaped Diagram used to assist in finalizing the course of Action intended. A branch in a decision tree represents a decision.

Types of Problems that Decision Tree can solve

Decision Trees can solve Classification problems and Regression problems.

The Classification Decision tree will determine the outcomes for If-Then condition. For example, If you work hard, Then you shall pass your exam. Determining the best Race car based on the 1 km race timings.

The Regression Decision tree: This model used when the target variable is continuous or numerical in nature.

Below is a simple example of a Decision Tree Algorithm. You have intention to start a business. You have two proposals. One to start a business that sells Ladies Hand Bags, and the other would be to start selling Ladies Shoes. If you were to Sell Hand bags, then the amount of money made on this Model would be $1000. On the other hand, if you were to sell Shoes, then the amount of money made on this model, would be $900.

Which of these Models would you choose? Obviously, Selling Hand Bags, why? Because the returns are more. But is this the Right Decision?

The above figure just illustrates the basics, what if Selling Hand Bags, has 50% Chance of success and a 50% Chance of failure and similarly, selling Shoes, has a 50% Chance of success and a 50% Chance of failure, then how would the Decision Tree Algorithm look like.

Why one of these Business would you choose now?

The Decision is based on the following formula

Now which of the Business Models would you Decide on pursuing?

Obviously, it would be Selling Shoes

The values that these formula produces is called as the Expected Value

Interpretation of the Expected Value

The Expected value does not mean that every time you will make a profit of $400, in the shoes selling business. It only means, if you did the Identical Shoes selling business very many times, then your Average earnings will probably be, $400 per time. Note the word, Probably.

Pros of using Decision Tree Algorithm

They are simple to use

They provide a lucid understanding to complex routines

The model works on Visualization, thus it captivates, both
the learner and the implementer.

Doesn’t require complex data preparation

Categorical data and Numerical data is handled with ease.

Even if a data doesn’t fit, still it can be used to effect the prediction.

Cons of using Decision Tree Algorithm

Overfitting

The focus is just on one particular situation instead of a
generalized solution.

High
Variance

The Decision Tree model can get unstable due to small
changes in the data. The balance will be lost, and this in turn will impact the
decision arrived at.

Low
Bias

This would impair the decision tress due to its inability to work with new incoming data.

Terms required for Decision Tree

Entropy

This is the measure that defines the unpredictability in the
data set.

Information gain

This is the measure that defines the decrease in
unpredictability after the data set is split.

Leaf Node

This carries the decision

Root Node

The top most decision node is known as the root node

0 responses on "Decision Tree Algorithm"

Leave a Message

Your email address will not be published. Required fields are marked *

Mathz.org, All rights reserved. SiteLock
error: Content is protected !!