Skip to main content

Myth about Data Science - A must know for all Data Science enthusiast


1. Only Coder /Programmer can only become a Data Science

No, its not correct. People who is having Basic Programming skills like Python/R or atleast who can learn basic programming can come in to this field.Here i wanted to suggest people who is having Engineering background /Software they can choose Python as a programming and The person who wanted to transit their career in to data science field but coming from non Engineering background like Arts,Commerce,Science they can prefer R as a Programming language . Here am not saying for non technical background can not learn python , its bit difficult to understand the basic and algorithm but if they are ready to learn no issues, they can take any of these either Python or R., I have Mentioned while choosing any of these which one is good according to me in another article i.e python, you can refer my article to get better understanding.

2. Data Scientist are master of all technology .

No, fact is that you should have knowledge on basic data science skills , you dont required to be expertise. A famous line " Jack of all trades " we should have knowledge or basic idea about terminology , algorithm how to apply on the data , basic statistical skill , no one would master in this Era, specially when technology is keep on growing. So i can say you must familiar about the terminilogy and logic and their uses and main thing how to apply these in our data.

3. Data Science is all about tools and technology.

No , its not . Data science is not just about tools and technology, because just applying some lines of code and executing the algorithm and getting the good accuracy is not data science, you should know how to interpret the algoithm and understanding the algorithm /Choosing the algoirthm which is best for the particular problems which is correct or not.

4. Data Analyst and Data Science both works same:


“A data scientist is someone who can predict the future based on past patterns whereas a data analyst is someone who merely curates meaningful insights from data.”

“A data scientist job roles involves estimating the unknown whilst a data analyst job roles involves looking at the known from new perspectives.”

“A data scientist is expected to generate their own questions while a data analyst finds answers to a given set of questions from data.”

“A data analyst addresses business problems but a data scientist not just addresses business problems but picks up those problems that will have the most business value once solved.”

“Data analysts are the one who do the day-to-day analysis stuff but data scientists have the what ifs.”

This is what Abraham Cabangbang, Senior Data Scientist at LinkedIn commented on the difference between data analyst and data scientist -

“It’s definitely a gray area. At my previous company I did both analyst and Data scientist jobs and as an analyst we were more customer facing; the tasks we did were directly related to the tangible business needs—what the customers wanted/requested. It was very directed. The scientist role is a little more free form. The first thing I did as a data scientist is work on building out internal dashboards, basically surfacing information that we were tracking on the back end, but weren’t being used by the data analysts for any reasons; for example, we might have lacked the infrastructure to display it, or the data was just not very well processed. It really wasn’t anything tailored out from a customer need, but came from what I noticed the analyst team needed in order to do their job.”.

Comments

Popular posts from this blog

Math Skills required for Data Science Aspirants

The knowledge of this essential math is particularly important for newcomers arriving at data science from other professions, Specially whosoever wanted to transit their career in to Data Science field (Aspirant). Because mathematics is backbone of Data science , you must have knowledge to deal with data, behind any algorithm mathematics plays an important role. Here am going to iclude some of the topics which is Important if you dont have maths background.  1. Statistics and Probability 2. Calculus (Multivariable) 3. Linear Algebra 4.  Methods for Optimization 5. Numerical Analysis 1. Statistics and Probability Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. Here are the topics you need to be familiar with: Mean, Median, Mode, Standard deviation/variance, Correlation coefficient and the covariance matrix, Probability distribution...

Data Analytics Interview Questions - Part 1

Q1. Python or R – Which one would you prefer for text analytics? We will prefer Python because of the following reasons: Python  would be the best option because it has Pandas library that provides easy to use data structures and high-performance data analysis tools. R  is more suitable for machine learning than just text analysis. Python performs faster for all types of text analytics. Q2. How does data cleaning plays a vital role in the analysis? Data cleaning can help in analysis because: Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with. Data Cleaning helps to increase the accuracy of the model in machine learning. It is a cumbersome process because as the number of data sources increases, the time taken to clean the data increases exponentially due to the number of sources and the volume of data generated by these sources. It might take up to 80% of the time for just c...

Data Science Interview Questions -Part 2

1) What are the differences between supervised and unsupervised learning? Supervised Learning Unsupervised Learning Uses known and labeled data as input Supervised learning has a feedback mechanism  Most commonly used supervised learning algorithms are decision trees, logistic regression, and support vector machine Uses unlabeled data as input Unsupervised learning has no feedback mechanism  Most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, and apriori algorithm 2) How is logistic regression done? Logistic regression measures the relationship between the dependent variable (our label of what we want to predict) and one or more independent variables (our features) by estimating probability using its underlying logistic function (sigmoid). The image shown below depicts how logistic regression works: The formula and graph for the sigmoid function is as shown: 3) Explain the steps in making a deci...