Skip to main content

15 Common questions for Machine Learning...!!

 1. What is logistic regression?

Logistic regression is a machine learning algorithm for classification. In this algorithm, the probabilities describing the possible outcomes of a single trial are modelled using a logistic function.

2. What is the syntax for logistic regression?

Library: sklearn.linear_model.LogisticRegression

Define model: lr = LogisticRegression()

Fit model: model = lr.fit(x, y)

Predictions: pred = model.predict_proba(test)

3. How do you split the data in train / test?

Library: sklearn.model_selection.train_test_split

Syntax: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

4. What is decision tree?

Given a data of attributes together with its classes, a decision tree produces a sequence of rules that can be used to classify the data.

5. What is the syntax for decision tree classifier?

Library: sklearn.tree.DecisionTreeClassifier
Define model: dtc = DecisionTreeClassifier()
Fit model: model = dtc.fit(x, y)
Predictions: pred = model.predict_proba(test)

6. What is random forest?

Random forest classifier is a meta-estimator that fits a number of decision trees on various sub-samples of datasets and uses average to improve the predictive accuracy of the model and controls over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement.

7. What is the syntax for random forest classifier?

Library: sklearn.ensemble.RandomForestClassifier
Define model: rfc = RandomForestClassifier()
Fit model: model = rfc.fit(x, y)
Predictions: pred = model.predict_proba(test)

8. What is gradient boosting?

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

9. What is the syntax for gradient boosting classifier?

Library: sklearn.ensemble.GradientBoostingClassifier
Define model: gbc = GradientBoostingClassifier()
Fit model: model = gbc.fit(x, y)
Predictions: pred = model.predict_proba(test)

10. What is SVM?

Support vector machine is a representation of the training data as points in space separated into categories by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

11. What is the difference between KNN and KMeans?

KNN:

Supervised classification algorithm
Classifies new data points accordingly to the k number or the closest data points

KMeans:

Unsupervised clustering algorithm

Groups data into k number of clusters.

12. How do you treat missing values?

Drop rows having missing values

DataFrame.dropna(axis=0, how=’any’, inplace=True)

Drop columns

DataFrame.dropna(axis=1, how=’any’, inplace=True)

Replace missing values with zero / mean

df[‘income’].fillna(0)
df[‘income’] = df[‘income’].fillna((df[‘income’].mean()))

13. How do you treat outliers?

Inter quartile range is used to identify the outliers.
Q1 = df[‘income’].quantile(0.25)
Q3 = df[‘income’].quantile(0.75)
IQR = Q3 — Q1
df = df[(df[‘income’] >= (Q1–1.5 * IQR)) & (df[‘income’] <= (Q3 + 1.5 * IQR))]

14. What is bias / variance trade off?

Definition

The Bias-Variance Trade off is relevant for supervised machine learning, specifically for predictive modelling. It’s a way to diagnose the performance of an algorithm by breaking down its prediction error.

Error from Bias

Bias is the difference between your model’s expected predictions and the true values.

This is known as under-fitting.

Does not improve with collecting more data points.

Error from Variance

Variance refers to your algorithm’s sensitivity to specific sets of training data.
This is known as over-fitting.
Improves with collecting more data points.

15. How do you treat categorical variables?

Replace categorical variables with the average of target for each category

Image for post
Image for post
by applying one hot encoding we can treat Categorical variable..!!

Comments

Popular posts from this blog

Statistics Interview Questions Part-1

Q1. What is the difference between “long” and “wide” format data? In the  wide-format , a subject’s repeated responses will be in a single row, and each response is in a separate column. In the  long-format , each row is a one-time point per subject. You can recognize data in wide format by the fact that columns generally represent groups. Q2. What do you understand by the term Normal Distribution? Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve. Figure:   Normal distribution in a bell curve The random variables are distributed in the form of a symmetrical, bell-shaped curve. Properties of Normal Distribution are as follows; Unimodal -one mode Symmetrical -left and right halves are mirror image...

Scope of an Artificial Intelligence

Artificial Intelligence has grown exponentially in the past decade, and so have the career opportunities as an AI expert/specialist. But what exactly does an AI expert do? Also, is becoming an expert the only option while pursuing a career in artificial intelligence?I don’t have any programming/ coding background. Can I still work as an AI expert? And, what specialization or skill set do I need to acquire to get into this field? Skills Required to Build a Career in Artificial Intelligence 1. Sound Mathematical and Algorithmic Understanding To be an ideal candidate in AI, you need to have solid knowledge of applied mathematics and a set of algorithms. Having proficiency in problem-solving and analytical abilities will help you in performing tasks in a more efficient way. You must also have reasonable knowledge of statistics and probability. This helps in understanding various models of AI, like Naive Bayes, Gaussian Mixture Model, etc. 2. Basic Know-How of Programmin...

Data Science Interview Questions -Part 2

1) What are the differences between supervised and unsupervised learning? Supervised Learning Unsupervised Learning Uses known and labeled data as input Supervised learning has a feedback mechanism  Most commonly used supervised learning algorithms are decision trees, logistic regression, and support vector machine Uses unlabeled data as input Unsupervised learning has no feedback mechanism  Most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, and apriori algorithm 2) How is logistic regression done? Logistic regression measures the relationship between the dependent variable (our label of what we want to predict) and one or more independent variables (our features) by estimating probability using its underlying logistic function (sigmoid). The image shown below depicts how logistic regression works: The formula and graph for the sigmoid function is as shown: 3) Explain the steps in making a deci...