Skip to main content

Random Forest Algorithm

Random Forest is an ensemble machine learning algorithm that follows the bagging technique. The base estimators in the random forest are decision trees. Random forest randomly selects a set of features that are used to decide the best split at each node of the decision tree.

Looking at it step-by-step, this is what a random forest model does:

1. Random subsets are created from the original dataset (bootstrapping).

2. At each node in the decision tree, only a random set of features are considered to decide the best split.

3. A decision tree model is fitted on each of the subsets.

4. The final prediction is calculated by averaging the predictions from all decision trees.

To sum up, the Random forest randomly selects data points and features and builds multiple trees (Forest).

Random Forest is used for feature importance selection. The attribute (.feature_importances_) is used to find feature importance.

Some Important Parameters:-

1. n_estimators:- It defines the number of decision trees to be created in a random forest.

2. criterion:- "Gini" or "Entropy."

3. min_samples_split:- Used to define the minimum number of samples required in a leaf node before a split is attempted

4. max_features: -It defines the maximum number of features allowed for the split in each decision tree.

5. n_jobs:- The number of jobs to run in parallel for both fit and predict. Always keep (-1) to use all the cores for parallel processing.

Comments

Popular posts from this blog

What is P Value ?

In Data Science interviews, one of the frequently asked questions is ‘What is P-Value?”. According to American Statistical Association, “A p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”  That’s hard to grasp, yes? Alright, lets understand what really is p value in small meaningful pieces to make it very clear. When and how is p-value used? To understand p-value, you need to understand some background and context behind it. So, let’s start with the basics. p-values are often reported whenever you perform a statistical significance test (like t-test, chi-square test etc). These tests typically return a computed test statistic and the associated p-value. This reported value is used to establish the statistical significance of the relationships being tested. So, whenever you see a p-valu...

Introduction to Datascience

Data Science has become one of the most demanded jobs of the 21st century. What is Data Science? “Data Science is about extraction, preparation, analysis, visualization, and maintenance of information. It is a cross-disciplinary field which uses scientific methods and processes to draw insights from data. ” As a data scientist, you take a complex business problem, compile research from it, creating it into data, then use that data to solve the problem. A Data Scientist, specializing in Data Science, not only analyzes the data but also uses machine learning algorithms to predict future occurrences of an event. Therefore, we can understand Data Science as a field that deals with data processing, analysis, and extraction of insights from the data using various statistical methods and computer algorithms. It is a multidisciplinary field that combines mathematics, statistics, and computer science. Why Data Science? So, after knowing what exactly Data Science is, you must explore ...

Data Science Skills

Below are some of the data science skills that every data scientist must know: 1. Change is the only constant It’s not about “Learning Data Science”, it’s about “improving your Data Science skills! The subjects you are learning currently in Grad School are important because no learning go waste but, the real world practicality is totally different from the theory of the books which is taught for decades. Don’t cramp the information, rather understand the big picture. A report states that 50% of things that you learn today regarding IT will be outdated in 4 years. Technology can become obsolete but, learning can’t be. You should have the attitude of learning, updating your knowledge and focusing on your skills(Get your Basics clear) and not on the information you learn! This will help you to survive in this tough and competitive world (I am not scaring you, I am just asking you to prepare your best! You should start focusing on the below skills for becoming a data scientist –...