Skip to main content

Goals of ML Problem ?

The goal of any machine learning problem is to find a single model that will best predict our wanted outcome. Rather than making one model and hoping this model is the best/most accurate predictor we can make, ensemble methods take a myriad of models into account, and average those models to produce one final model. It is important to note that Decision Trees are not the only form of ensemble methods, just the most popular and relevant in DataScience today.



Comments

Popular posts from this blog

Why Central Limit Theorem is Important for evey Data Scientist?

The Central Limit Theorem is at the core of what every data scientist does daily: make statistical inferences about data. The theorem gives us the ability to quantify the likelihood that our sample will deviate from the population without having to take any new sample to compare it with. We don’t need the characteristics about the whole population to understand the likelihood of our sample being representative of it. The concepts of confidence interval and hypothesis testing are based on the CLT. By knowing that our sample mean will fit somewhere in a normal distribution, we know that 68 percent of the observations lie within one standard deviation from the population mean, 95 percent will lie within two standard deviations and so on. In other words we can say " It all has to do with the distribution of our population. This theorem allows you to simplify problems in statistics by allowing you to work with a distribution that is approximately normal."  The CLT is...

Most Used Algorithm by DataScientist

We will discuss mostly machine learning algorithms that are important for data scientists and classify them based on supervised and unsupervised roles. I will provide you an outline for all the important algorithms that you can deploy for improving your data science operations. Here is the list of top Data Science Algorithms that you must know to become a data scientist. Let’s start with the first one – 1. Linear Regression Linear Regression is a method of  measuring the relationship between two continuous variables . The two variables are – Independent Variable – “x” Dependent Variable – “y” In the case of a simple linear regression, the independent value is the predictor value and it is only one. The relationship between x and y can be described as: y = mx + c Where m is the slope and c is the intercept. Based on the predicted output and the actual output, we perform the calculation 2. Logistic Regression Logistic Regression is used for binary classificat...

Math Skills required for Data Science Aspirants

The knowledge of this essential math is particularly important for newcomers arriving at data science from other professions, Specially whosoever wanted to transit their career in to Data Science field (Aspirant). Because mathematics is backbone of Data science , you must have knowledge to deal with data, behind any algorithm mathematics plays an important role. Here am going to iclude some of the topics which is Important if you dont have maths background.  1. Statistics and Probability 2. Calculus (Multivariable) 3. Linear Algebra 4.  Methods for Optimization 5. Numerical Analysis 1. Statistics and Probability Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. Here are the topics you need to be familiar with: Mean, Median, Mode, Standard deviation/variance, Correlation coefficient and the covariance matrix, Probability distribution...