Skip to main content

what data scientist spend the most time doing


Generally we think of data scientists building algorithms,exploring data and doing predictive analysis. That's actually not what they spend most of their time doing however , we can see in the in the graph most of the time Data scientist are involved in data cleaning part , as in real world scenario we are mostly getting the data which is messey, we can feed the data after cleaning , ML model will not work if the data is messey, Data cleaning is very very important so mostly data analyst and data scientists are involved in this task.

60 percent: Cleaning and organising Data

According to a study, which surveyed 16,000 data professionals across the world, the challenge of dirty data is the biggest roadblock for a data scientist. Often data scientists spend a considerable time formatting, cleaning, and sometimes sampling the data, which will consume a majority of their time.Hence, a data scientist, the need for you to ensure that you have access to clean and structured data can save you a lot of time and will help you get done with the work quickly.

19 percent: Collecting data

One of the major challenges that Data Science professionals face is finding the relevant data sets to work with. Many a time organisation’s data lakes are nothing but a dumping ground with relevant and irrelevant data sets. 

9 percent: Modelling/machine learning

Once the first two use cases have been sorted, a data scientist is then left with the task of suggesting machine learning and predictive modelling as per business requirements.

It is said that one of the hardest parts of being a data scientist is not exactly developing a problem, rather it is about defining a given problem and finding means to measure the solution. This is even more pertinent when the clients do not have a clear idea of what they want. So if your models do not deliver the outcomes in correlation with the business requirement, then you are left with the daunting task of explaining discrepancies and understanding what went wrong and where.

“Often, analysts are given vague goals by the business. “Help me improve my bottom line by 15%” or “Identify the biggest problems our customers are facing” are not precise enough problem statements for the analysts. Enough time needs to be spent on understanding the exact business problem and then converting this business problem into an analytics problem that can be solved with data,” Gaurav Vohra co-founder & CEO of Jigsaw Academy notes.

5 percent: Other

Since Data Science is a mix of business use-cases, mathematics, statistics, programming and communication skills, data scientists are not singularly tasked with data handling alone. As another Quora user sums up, a data scientist is also required to perform a number of other tasks which include:

Undirected research and frame open-ended industry questions
Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities
Communicate predictions and findings to management and IT departments through effective data visualizations and reports
Recommend cost-effective changes to existing procedures and strategies

4 percent: Refining algorithms

This process might take months before to make the necessary changes and this can be achieved through a number of ways, often leaving the data scientist with perplexing questions choosing the right way to do so.

3 percent: Building training sets

Data Sets are the essential component or the building blocks upon which the data scientist builds his project. At times, the data scientist will have to perform scaling, decomposition, aggregation transformations on the data before they can train their models.

Comments

Popular posts from this blog

Data is the New oil of Industry?

Let's go back to 18th century ,when development was taking its first footstep.The time when oil was considered to be the subset of industrial revolution. Oil than tends to be the most valuable asset in those time. Now let's come back in present. In 21st century, data is vigorously called the foundation of information revolution. But the question that arises is why are we really calling data as the new oil. Well for it's explanation Now we are going to compare Data Vs Oil Data is an essential resource that powers the information economy in much the way that oil has fueled the industrial economy. Once upon a time, the wealthiest were those with most natural resources, now it’s knowledge economy, where the more you know is proportional to more data that you have. Information can be extracted from data just as energy can be extracted from oil. Traditional Oil powered the transportation era, in the same way that Data as the new oil is also powering the emerging transportation op...

Math Skills required for Data Science Aspirants

The knowledge of this essential math is particularly important for newcomers arriving at data science from other professions, Specially whosoever wanted to transit their career in to Data Science field (Aspirant). Because mathematics is backbone of Data science , you must have knowledge to deal with data, behind any algorithm mathematics plays an important role. Here am going to iclude some of the topics which is Important if you dont have maths background.  1. Statistics and Probability 2. Calculus (Multivariable) 3. Linear Algebra 4.  Methods for Optimization 5. Numerical Analysis 1. Statistics and Probability Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. Here are the topics you need to be familiar with: Mean, Median, Mode, Standard deviation/variance, Correlation coefficient and the covariance matrix, Probability distribution...

Statistics Interview Questions Part-1

Q1. What is the difference between “long” and “wide” format data? In the  wide-format , a subject’s repeated responses will be in a single row, and each response is in a separate column. In the  long-format , each row is a one-time point per subject. You can recognize data in wide format by the fact that columns generally represent groups. Q2. What do you understand by the term Normal Distribution? Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve. Figure:   Normal distribution in a bell curve The random variables are distributed in the form of a symmetrical, bell-shaped curve. Properties of Normal Distribution are as follows; Unimodal -one mode Symmetrical -left and right halves are mirror image...