Skip to main content

Random Forest Algorithm

Random Forest is an ensemble machine learning algorithm that follows the bagging technique. The base estimators in the random forest are decision trees. Random forest randomly selects a set of features that are used to decide the best split at each node of the decision tree.

Looking at it step-by-step, this is what a random forest model does:

1. Random subsets are created from the original dataset (bootstrapping).

2. At each node in the decision tree, only a random set of features are considered to decide the best split.

3. A decision tree model is fitted on each of the subsets.

4. The final prediction is calculated by averaging the predictions from all decision trees.

To sum up, the Random forest randomly selects data points and features and builds multiple trees (Forest).

Random Forest is used for feature importance selection. The attribute (.feature_importances_) is used to find feature importance.

Some Important Parameters:-

1. n_estimators:- It defines the number of decision trees to be created in a random forest.

2. criterion:- "Gini" or "Entropy."

3. min_samples_split:- Used to define the minimum number of samples required in a leaf node before a split is attempted

4. max_features: -It defines the maximum number of features allowed for the split in each decision tree.

5. n_jobs:- The number of jobs to run in parallel for both fit and predict. Always keep (-1) to use all the cores for parallel processing.

Comments

Popular posts from this blog

Daily Task performed by Data Scientist at Work place - Life of a Data Scientist

Data Science is a multidimensional field that uses scientific methods, tools, and algorithms to extract knowledge and insights from structured and unstructured data.But in reality, he does so much more than just studying the data. I agree that all his work is related to data but it involves a number of other processes based on data.Data Science is a multidisciplinary field. It involves the systematic blend of scientific and statistical methods, processes, algorithm development and technologies to extract meaningful information from data. The average Data Scientist’s work week as follows: Typical work weeks devour around 50 hours. The Data Scientists generally maintain internal records of daily results. The Data Scientists also keep extensive notes on their modeling projects for repeatable processes. The good Data Scientists can begin their career with a $80k salary, and the high-end experts can hope to make $400K. The industry attrition rate for DS is high as organizations fre...

Differentiate between univariate, bivariate and multivariate analysis.

Univariate analysis are descriptive statistical analysis techniques which can be differentiated based on one variable involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis. The bivariate analysis attempts to understand the difference between two variables at a time as in a scatterplot. For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis. Multivariate analysis deals with the study of more than two variables to understand the effect of variables on the responses.

Data Science Skills

Below are some of the data science skills that every data scientist must know: 1. Change is the only constant It’s not about “Learning Data Science”, it’s about “improving your Data Science skills! The subjects you are learning currently in Grad School are important because no learning go waste but, the real world practicality is totally different from the theory of the books which is taught for decades. Don’t cramp the information, rather understand the big picture. A report states that 50% of things that you learn today regarding IT will be outdated in 4 years. Technology can become obsolete but, learning can’t be. You should have the attitude of learning, updating your knowledge and focusing on your skills(Get your Basics clear) and not on the information you learn! This will help you to survive in this tough and competitive world (I am not scaring you, I am just asking you to prepare your best! You should start focusing on the below skills for becoming a data scientist –...