Skip to main content

20 Must know Data Science Interview Questions by kdnuggets

The Most important questions which is generally asked by the technical panel :

1. Explain what regularization is and why it is useful.
2. Which data scientists do you admire most? which startups?
3. How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
4. Explain what precision and recall are. How do they relate to the ROC curve?
5. How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?
6. What is root cause analysis?
7. Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples.
8. What is statistical power?
9. Explain what resampling methods are and why they are useful. Also explain their limitations.
10. Is it better to have too many false positives, or too many false negatives? Explain.
11. What is selection bias, why is it important and how can you avoid it?
12. Give an example of how you would use experimental design to answer a question about user behavior.
13. What is the difference between "long" and "wide" format data?
14. What method do you use to determine whether the statistics published in an article (e.g. newspaper) are either wrong or presented to support the author's point of view, rather than correct, comprehensive factual information on a specific subject?
15. Explain Edward Tufte's concept of "chart junk."
16. How would you screen for outliers and what should you do if you find one?
17. How would you use either the extreme value theory, Monte Carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
18. What is a recommendation engine? How does it work?
19. Explain what a false positive and a false negative are. Why is it important to differentiate these from each other?
20. Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?

Answers from kdnuggets : https://www.kdnuggets.com/2016/02/21-data-science-interview-questions-answers.html

Happy Learning...!!

Comments

Popular posts from this blog

Statistics Interview Questions Part-1

Q1. What is the difference between “long” and “wide” format data? In the  wide-format , a subject’s repeated responses will be in a single row, and each response is in a separate column. In the  long-format , each row is a one-time point per subject. You can recognize data in wide format by the fact that columns generally represent groups. Q2. What do you understand by the term Normal Distribution? Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve. Figure:   Normal distribution in a bell curve The random variables are distributed in the form of a symmetrical, bell-shaped curve. Properties of Normal Distribution are as follows; Unimodal -one mode Symmetrical -left and right halves are mirror image...

CondaValueError: Value error: invalid package specification

Recently I was trying to create Conda Environment and wanted to install Tensorflow but i have faced some issue , so i have done some research and done trouble shooting related to that . Here am going to share how to trouble shoot if you are getting Conda Value error while creating Conda environment and install tensorflow . Open Anaconda Prompt (as administrator if it was installed for all users) Run  conda update conda Run the installer again Make sure all pkg are updated: Launch the console from Anaconda Navigator and conda create -n mypython python=3.6.8 After Installing Conda environment please active the conda now :  conda activate mypython once conda environment has been activated kindly install tensorflow 2.0 by using this command pip install tensorflow==2.0.0 once Tensorflow has been successfully install kindly run the command :  pip show tensorflow Try to Run Comman PIP Install Jupyter lab and after ins...

Differentiate between univariate, bivariate and multivariate analysis.

Univariate analysis are descriptive statistical analysis techniques which can be differentiated based on one variable involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can the analysis can be referred to as univariate analysis. The bivariate analysis attempts to understand the difference between two variables at a time as in a scatterplot. For example, analyzing the volume of sale and spending can be considered as an example of bivariate analysis. Multivariate analysis deals with the study of more than two variables to understand the effect of variables on the responses.