Skip to main content

Goals of ML Problem ?

The goal of any machine learning problem is to find a single model that will best predict our wanted outcome. Rather than making one model and hoping this model is the best/most accurate predictor we can make, ensemble methods take a myriad of models into account, and average those models to produce one final model. It is important to note that Decision Trees are not the only form of ensemble methods, just the most popular and relevant in DataScience today.



Comments

Popular posts from this blog

Differentiate between univariate, bivariate and multivariate analysis.

Univariate analysis are descriptive statistical analysis techniques which can be differentiated based on one variable involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis. The bivariate analysis attempts to understand the difference between two variables at a time as in a scatterplot. For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis. Multivariate analysis deals with the study of more than two variables to understand the effect of variables on the responses.

DataScience Mandatory skills for 2020

The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market . Data Science is a competitive field, and people are quickly building more and more skills and experience. This has given rise to the booming job description of Machine Learning Engineer, and therefore, my advice for 2020 is that all Data Scientists need to be developers as well. To stay competitive, make sure to prepare yourself for new ways of working that come with new tools. 1. Agile Agile is a method of organizing work that is already much used by dev teams. Data Science roles are filled more and more by people who’s original skillset is pure software development, and this gives rise to the role of Machine Learning Engineer.More and more, Data Scientists/Machine Learning Engineers are managed as developers: continuously making improvements to Mac...

Statistics Interview Questions Part-1

Q1. What is the difference between “long” and “wide” format data? In the  wide-format , a subject’s repeated responses will be in a single row, and each response is in a separate column. In the  long-format , each row is a one-time point per subject. You can recognize data in wide format by the fact that columns generally represent groups. Q2. What do you understand by the term Normal Distribution? Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve. Figure:   Normal distribution in a bell curve The random variables are distributed in the form of a symmetrical, bell-shaped curve. Properties of Normal Distribution are as follows; Unimodal -one mode Symmetrical -left and right halves are mirror image...