Blog | Data Science and Technology

Posts

Showing posts from November, 2020

Differentiate between univariate, bivariate and multivariate analysis.

Univariate analysis are descriptive statistical analysis techniques which can be differentiated based on one variable involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis. The bivariate analysis attempts to understand the difference between two variables at a time as in a scatterplot. For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis. Multivariate analysis deals with the study of more than two variables to understand the effect of variables on the responses.

Random Forest Algorithm

Random Forest is an ensemble machine learning algorithm that follows the bagging technique. The base estimators in the random forest are decision trees. Random forest randomly selects a set of features that are used to decide the best split at each node of the decision tree. Looking at it step-by-step, this is what a random forest model does: 1. Random subsets are created from the original dataset (bootstrapping). 2. At each node in the decision tree, only a random set of features are considered to decide the best split. 3. A decision tree model is fitted on each of the subsets. 4. The final prediction is calculated by averaging the predictions from all decision trees. To sum up, the Random forest randomly selects data points and features and builds multiple trees (Forest). Random Forest is used for feature importance selection. The attribute (.feature_importances_) is used to find feature importance. Some Important Parameters:- 1. n_estimators: - It defines the number of decision tree...