Skip to main content

Data Science Skills

Below are some of the data science skills that every data scientist must know:

1. Change is the only constant

It’s not about “Learning Data Science”, it’s about “improving your Data Science skills!

The subjects you are learning currently in Grad School are important because no learning go waste but, the real world practicality is totally different from the theory of the books which is taught for decades. Don’t cramp the information, rather understand the big picture.

A report states that 50% of things that you learn today regarding IT will be outdated in 4 years. Technology can become obsolete but, learning can’t be. You should have the attitude of learning, updating your knowledge and focusing on your skills(Get your Basics clear) and not on the information you learn!

This will help you to survive in this tough and competitive world (I am not scaring you, I am just asking you to prepare your best! You should start focusing on the below skills for becoming a data scientist –

Business Skills
Practical Skills like Math and statistical skills
Coding Skills(As I said, Learn and don’t Cramp)
Soft Skills like People skills, Social skills, Data Visualisation, Presentation and Communication (Emotional intelligence quotients- I feel this is the most imp. one)

2. Essential Data Science Ingredients – Tools for Data Science

Companies employ Data Scientists to help them gain insights about the market and to better their products. There are several tools needed for Data Science. Skills like Big Data Technologies, UNIX, Machine Learning, Python, R, SQL, etc. are needed to master the art. You can start with the last three skills (PYTHON, R, SQL) from now. It will reap you benefits in the future. These three are the most used skills in the current scenario.

3. Become the Jack of all and Master of none for the BELOW points only

a. Develop the mind, discover new things and enhance your imagination- In short, Start Reading.
The above benefits are good enough to start reading NOW! Also, make sure that you start reading outside your discipline. It will help you to get exposure to a wide range of different techniques and problems and to get comfortable jumping feet-first into a new topic.

b. Try to analyze new types of Data
Data can be in any form. You can easily interpret data if it is in the form of text and images. But, try to interpret the Data in a Video or in an Audio. It can be in a Pre-trained model, in a Relational database, and in a Time series form. The last three might be tough for you as of now, but you can start with Video and Audio form.

c. Get Inspired with New Ideas and Be Impacted: Talk to new people
Create a network with people you can learn from. If someone further along the road can mentor you in the right manner, it can add direction to your career. Talk to people with a technical background who are outside your field. You will get new information, ideas, and loop-holes where you can create value.

Also, talking to people with a non-technical background will help you to enhance your soft-skills. You will get a chance to explain to them the technicals of your specific academic background.

d. You can now cry over spilt milk with Version Control
Imagine! You have got control over your actions and just by a click, the actions are revocable and controllable. Wouldn’t life be perfect then?

Don’t know about real life but, you can control that in the reel life. If a mistake is made you can turn back the clock and compare earlier versions of the code to help fix the mistake while minimizing disruption to all team members.

Here you can figure out your mistakes and what you broke. It’s very good for individual projects and can help you to master the art if practiced regularly.

4. Never settle

Here I am not talking about the beautifully designed phone with the premium build quality & the best technology to users around the world. I am asking you to not Stop at “Good Enough”.

This means that if a model is not accurate and needs additional tuning then it should not be left at the good enough stage. This will be a major factor that will differentiate you from the other Data Scientist. Bring perfection in the task and make sure you answer every single question you can with the data. The best thing is to try to add value. If someone else finds it valuable I bet it will be the most valuable work for you.


Comments

Popular posts from this blog

CondaValueError: Value error: invalid package specification

Recently I was trying to create Conda Environment and wanted to install Tensorflow but i have faced some issue , so i have done some research and done trouble shooting related to that . Here am going to share how to trouble shoot if you are getting Conda Value error while creating Conda environment and install tensorflow . Open Anaconda Prompt (as administrator if it was installed for all users) Run  conda update conda Run the installer again Make sure all pkg are updated: Launch the console from Anaconda Navigator and conda create -n mypython python=3.6.8 After Installing Conda environment please active the conda now :  conda activate mypython once conda environment has been activated kindly install tensorflow 2.0 by using this command pip install tensorflow==2.0.0 once Tensorflow has been successfully install kindly run the command :  pip show tensorflow Try to Run Comman PIP Install Jupyter lab and after ins...

Important Python Libraries for Data Science

Python is the most widely used programming language today. When it comes to solving data science tasks and challenges, Python never ceases to surprise its users. Most data scientists are already leveraging the power of Python programming every day. Python is an easy-to-learn, easy-to-debug, widely used, object-oriented, open-source, high-performance language, and there are many more benefits to Python programming.People in Data Science definitely know about the Python libraries that can be used in Data Science but when asked in an interview to name them or state its function, we often fumble up or probably not remember more than 5 libraries. Important Python Libraries for Data Science: Pandas NumPy SciPy Matplotlib TensorFlow Seaborn Scikit Learn Keras 1. Pandas Pandas (Python data analysis) is a must in the data science life cycle. It is the most popular and widely used Python library for data science, along with NumPy in matplotlib. With around 17,00 comments on GitH...

How to deal with missing values in data cleaning

The data you inherit for analysis will come from multiple sources and would have been pulled adhoc. So this data will not be immediately ready for you to run any kind of model on. One of the most common issues you will have to deal with is missing values in the dataset. There are many reasons why values might be missing - intentional, user did not fill up, online forms broken, accidentally deleted, legacy issues etc.  Either way you will need to fix this problem. There are 3 ways to do this - either you will ignore the missing values, delete the missing value rows or fill the missing values with an approximation. Its easiest to just drop the missing observations but you need to very careful before you do that, because the absence of a value might actually be conveying some information about the data pattern. If you decide to drop missing values : df_no_missing = df.dropna() will drop any rows with any value missing. Even if some values are available in a row it will still get dropp...