Skip to main content

20 Must know Data Science Interview Questions by kdnuggets

The Most important questions which is generally asked by the technical panel :

1. Explain what regularization is and why it is useful.
2. Which data scientists do you admire most? which startups?
3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
4. Explain what precision and recall are. How do they relate to the ROC curve?
5. How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?
6. What is root cause analysis?
7. Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples.
8. What is statistical power?
9. Explain what resampling methods are and why they are useful. Also explain their limitations.
10. Is it better to have too many false positives, or too many false negatives? Explain.
11. What is selection bias, why is it important and how can you avoid it?
12. Give an example of how you would use experimental design to answer a question about user behavior.
13. What is the difference between "long" and "wide" format data?
14. What method do you use to determine whether the statistics published in an article (e.g. newspaper) are either wrong or presented to support the author's point of view, rather than correct, comprehensive factual information on a specific subject?
15. Explain Edward Tufte's concept of "chart junk."
16. How would you screen for outliers and what should you do if you find one?
17. How would you use either the extreme value theory, Monte Carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
18. What is a recommendation engine? How does it work?
19. Explain what a false positive and a false negative are. Why is it important to differentiate these from each other?
20. Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?

Answers from kdnuggets : https://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html

Happy Learning...!!

Comments

Popular posts from this blog

Daily Task performed by Data Scientist at Work place - Life of a Data Scientist

Data Science is a multidimensional field that uses scientific methods, tools, and algorithms to extract knowledge and insights from structured and unstructured data.But in reality, he does so much more than just studying the data. I agree that all his work is related to data but it involves a number of other processes based on data.Data Science is a multidisciplinary field. It involves the systematic blend of scientific and statistical methods, processes, algorithm development and technologies to extract meaningful information from data. The average Data Scientist’s work week as follows: Typical work weeks devour around 50 hours. The Data Scientists generally maintain internal records of daily results. The Data Scientists also keep extensive notes on their modeling projects for repeatable processes. The good Data Scientists can begin their career with a $80k salary, and the high-end experts can hope to make $400K. The industry attrition rate for DS is high as organizations fre...

Why Central Limit Theorem is Important for evey Data Scientist?

The Central Limit Theorem is at the core of what every data scientist does daily: make statistical inferences about data. The theorem gives us the ability to quantify the likelihood that our sample will deviate from the population without having to take any new sample to compare it with. We don’t need the characteristics about the whole population to understand the likelihood of our sample being representative of it. The concepts of confidence interval and hypothesis testing are based on the CLT. By knowing that our sample mean will fit somewhere in a normal distribution, we know that 68 percent of the observations lie within one standard deviation from the population mean, 95 percent will lie within two standard deviations and so on. In other words we can say " It all has to do with the distribution of our population. This theorem allows you to simplify problems in statistics by allowing you to work with a distribution that is approximately normal."  The CLT is...

Data Science Skills

Below are some of the data science skills that every data scientist must know: 1. Change is the only constant It’s not about “Learning Data Science”, it’s about “improving your Data Science skills! The subjects you are learning currently in Grad School are important because no learning go waste but, the real world practicality is totally different from the theory of the books which is taught for decades. Don’t cramp the information, rather understand the big picture. A report states that 50% of things that you learn today regarding IT will be outdated in 4 years. Technology can become obsolete but, learning can’t be. You should have the attitude of learning, updating your knowledge and focusing on your skills(Get your Basics clear) and not on the information you learn! This will help you to survive in this tough and competitive world (I am not scaring you, I am just asking you to prepare your best! You should start focusing on the below skills for becoming a data scientist –...