Skip to main content

Math Skills required for Data Science Aspirants

The knowledge of this essential math is particularly important for newcomers arriving at data science from other professions, Specially whosoever wanted to transit their career in to Data Science field (Aspirant). Because mathematics is backbone of Data science , you must have knowledge to deal with data, behind any algorithm mathematics plays an important role. Here am going to iclude some of the topics which is Important if you dont have maths background. 

1. Statistics and Probability
2. Calculus (Multivariable)
3. Linear Algebra
4.  Methods for Optimization
5. Numerical Analysis

1. Statistics and Probability

Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc.

Here are the topics you need to be familiar with: Mean, Median, Mode, Standard deviation/variance, Correlation coefficient and the covariance matrix, Probability distributions (Binomial, Poisson, Normal), p-value, Baye’s Theorem (Precision, Recall, Positive Predictive Value, Negative Predictive Value, Confusion Matrix, ROC Curve), Central Limit Theorem, R_2 score, Mean Square Error (MSE), A/B Testing, Monte Carlo Simulation

2. Multivariable Calculus

Most machine learning models are built with a dataset having several features or predictors. Hence, familiarity with multivariable calculus is extremely important for building a machine learning model.

Here are the topics you need to be familiar with: Functions of several variables; Derivatives and gradients; Step function, Sigmoid function, Logit function, ReLU (Rectified Linear Unit) function; Cost function; Plotting of functions; Minimum and Maximum values of a function

3. Linear Algebra

Linear algebra is the most important math skill in machine learning. A data set is represented as a matrix. Linear algebra is used in data preprocessing, data transformation, dimensionality reduction, and model evaluation.

Here are the topics you need to be familiar with: Vectors; Norm of a vector; Matrices; Transpose of a matrix; The inverse of a matrix; The determinant of a matrix; Trace of a Matrix; Dot product; Eigenvalues; Eigenvectors

4. Optimization Methods

Most machine learning algorithms perform predictive modeling by minimizing an objective function, thereby learning the weights that must be applied to the testing data in order to obtain the predicted labels.

Here are the topics you need to be familiar with: Cost function/Objective function; Likelihood function; Error function; Gradient Descent Algorithm and its variants (e.g. Stochastic Gradient Descent Algorithm)

5. Numerical Analysis

Its very good to have numerical analysis knowledge like time series analysis , forecasting

Best Youtube channel to learn:


Best Blog to read :

https://towardsdatascience.com/

 

Comments

Popular posts from this blog

Why Central Limit Theorem is Important for evey Data Scientist?

The Central Limit Theorem is at the core of what every data scientist does daily: make statistical inferences about data. The theorem gives us the ability to quantify the likelihood that our sample will deviate from the population without having to take any new sample to compare it with. We don’t need the characteristics about the whole population to understand the likelihood of our sample being representative of it. The concepts of confidence interval and hypothesis testing are based on the CLT. By knowing that our sample mean will fit somewhere in a normal distribution, we know that 68 percent of the observations lie within one standard deviation from the population mean, 95 percent will lie within two standard deviations and so on. In other words we can say " It all has to do with the distribution of our population. This theorem allows you to simplify problems in statistics by allowing you to work with a distribution that is approximately normal."  The CLT is...

Most Used Algorithm by DataScientist

We will discuss mostly machine learning algorithms that are important for data scientists and classify them based on supervised and unsupervised roles. I will provide you an outline for all the important algorithms that you can deploy for improving your data science operations. Here is the list of top Data Science Algorithms that you must know to become a data scientist. Let’s start with the first one – 1. Linear Regression Linear Regression is a method of  measuring the relationship between two continuous variables . The two variables are – Independent Variable – “x” Dependent Variable – “y” In the case of a simple linear regression, the independent value is the predictor value and it is only one. The relationship between x and y can be described as: y = mx + c Where m is the slope and c is the intercept. Based on the predicted output and the actual output, we perform the calculation 2. Logistic Regression Logistic Regression is used for binary classificat...