730+ Machine Learning (ML) Solved MCQs

Machine learning is a subset of artificial intelligence that involves the use of algorithms and statistical models to enable a system to improve its performance on a specific task over time. In other words, machine learning algorithms are designed to allow a computer to learn from data, without being explicitly programmed.

These multiple-choice questions (MCQs) are designed to enhance your knowledge and understanding in the following areas: Computer Science Engineering (CSE) .

Take a Test

101.	MLE estimates are often undesirable because
A.	they are biased
B.	they have high variance
C.	they are not consistent estimators
D.	none of the above
Answer» B. they have high variance

102.	The difference between the actual Y value and the predicted Y value found using a regression equation is called the
A.	slope
B.	residual
C.	outlier
D.	scatter plot
Answer» A. slope

103.	Neural networks
A.	optimize a convex cost function
B.	always output values between 0 and 1
C.	can be used for regression as well as classification
D.	all of the above
Answer» C. can be used for regression as well as classification

104.	Linear Regression is a _______ machine learning algorithm.
A.	supervised
B.	unsupervised
C.	semi-supervised
D.	can\t say
Answer» A. supervised

105.	Which of the following methods/methods do we use to find the best fit line for data in Linear Regression?
A.	least square error
B.	maximum likelihood
C.	logarithmic loss
D.	both a and b
Answer» A. least square error

106.	Which of the following methods do we use to best fit the data in Logistic Regression?
A.	least square error
B.	maximum likelihood
C.	jaccard distance
D.	both a and b
Answer» B. maximum likelihood

107.	Lasso can be interpreted as least-squares linear regression where
A.	weights are regularized with the l1 norm
B.	the weights have a gaussian prior
C.	weights are regularized with the l2 norm
D.	the solution algorithm is simpler
Answer» A. weights are regularized with the l1 norm

108.	Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?
A.	auc-roc
B.	accuracy
C.	logloss
D.	mean-squared-error
Answer» D. mean-squared-error

109.	Simple regression assumes a __________ relationship between the input attribute and output attribute.
A.	quadratic
B.	inverse
C.	linear
D.	reciprocal
Answer» C. linear

110.	In the regression equation Y = 75.65 + 0.50X, the intercept is
A.	0.5
B.	75.65
C.	1
D.	indeterminable
Answer» B. 75.65

111.	The selling price of a house depends on many factors. For example, it depends on the number of bedrooms, number of kitchen, number of bathrooms, the year the house was built, and the square footage of the lot. Given these factors, predicting the selling price of the house is an example of ____________ task.
A.	binary classification
B.	multilabel classification
C.	simple linear regression
D.	multiple linear regression
Answer» D. multiple linear regression

112.	Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?
A.	you will add more features
B.	you will remove some features
C.	all of the above
D.	none of the above
Answer» A. you will add more features

113.	We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in training set and test set randomly. Now we increase the training set size gradually. As the training set size increases, What do you expect will happen with the mean training error?
A.	increase
B.	decrease
C.	remain constant
D.	can’t say
Answer» D. can’t say

114.	We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Suppose we use a linear regression method to model this data. To test our linear regressor, we split the data in training set and test set randomly. What do you expect will happen with bias and variance as you increase the size of training data?
A.	bias increases and variance increases
B.	bias decreases and variance increases
C.	bias decreases and variance decreases
D.	bias increases and variance decreases
Answer» D. bias increases and variance decreases

115.	Regarding bias and variance, which of the following statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model. (i) Models which overfit are more likely to have high bias (ii) Models which overfit are more likely to have low bias (iii) Models which overfit are more likely to have high variance (iv) Models which overfit are more likely to have low variance
A.	(i) and (ii)
B.	(ii) and (iii)
C.	(iii) and (iv)
D.	none of these
Answer» B. (ii) and (iii)

116.	Which of the following indicates the fundamental of least squares?
A.	arithmetic mean should be maximized
B.	arithmetic mean should be zero
C.	arithmetic mean should be neutralized
D.	arithmetic mean should be minimized
Answer» D. arithmetic mean should be minimized

117.	Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y. Now Imagine that you are applying linear regression by fitting the best fit line using least square error on this data. You found that correlation coefficient for one of it’s variable(Say X1) with Y is 0.95.
A.	relation between the x1 and y is weak
B.	relation between the x1 and y is strong
C.	relation between the x1 and y is neutral
D.	correlation can’t judge the relationship
Answer» B. relation between the x1 and y is strong

118.	In terms of bias and variance. Which of the following is true when you fit degree 2 polynomial?
A.	bias will be high, variance will be high
B.	bias will be low, variance will be high
C.	bias will be high, variance will be low
D.	bias will be low, variance will be low
Answer» C. bias will be high, variance will be low

119.	Which of the following statements are true for a design matrix X ∈ Rn×d with d > n? (The rows are n sample points and the columns represent d features.)
A.	least-squares linear regression computes the weights w = (xtx)−1 xty
B.	the sample points are linearly separable
C.	x has exactly d − n eigenvectors with eigenvalue zero
D.	at least one principal component direction is orthogonal to a hyperplane that contains all the sample points
Answer» D. at least one principal component direction is orthogonal to a hyperplane that contains all the sample points

120.	Point out the wrong statement.
A.	regression through the origin yields an equivalent slope if you center the data first
B.	normalizing variables results in the slope being the correlation
C.	least squares is not an estimation tool
D.	none of the mentioned
Answer» C. least squares is not an estimation tool

121.	Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?
A.	you will add more features
B.	you will remove some features
C.	all of the above
D.	none of the above
Answer» A. you will add more features

122.	If X and Y in a regression model are totally unrelated,
A.	the correlation coefficient would be -1
B.	the coefficient of determination would be 0
C.	the coefficient of determination would be 1
D.	the sse would be 0
Answer» B. the coefficient of determination would be 0

123.	Regarding bias and variance, which of the following statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model. (i) Models which overfit are more likely to have high bias (ii) Models which overfit are more likely to have low bias (iii) Models which overfit are more likely to have high variance (iv) Models which overfit are more likely to have low variance
A.	(i) and (ii)
B.	(ii) and (iii)
C.	(iii) and (iv)
D.	none of these
Answer» B. (ii) and (iii)

124.	Which of the following statements are true for a design matrix X ∈ Rn×d with d > n? (The rows are n sample points and the columns represent d features.)
A.	least-squares linear regression computes the weights w = (xtx)−1 xty
B.	the sample points are linearly separable
C.	x has exactly d − n eigenvectors with eigenvalue zero
D.	at least one principal component direction is orthogonal to a hyperplane that contains all the sample points
Answer» D. at least one principal component direction is orthogonal to a hyperplane that contains all the sample points

125.	Problem in multi regression is ?
A.	multicollinearity
B.	overfitting
C.	both multicollinearity & overfitting
D.	underfitting
Answer» C. both multicollinearity & overfitting

730+ Machine Learning (ML) Solved MCQs

MLE estimates are often undesirable because

The difference between the actual Y value and the predicted Y value found using a regression equation is called the

Neural networks

Linear Regression is a _______ machine learning algorithm.

Which of the following methods/methods do we use to find the best fit line for data in Linear Regression?

Which of the following methods do we use to best fit the data in Logistic Regression?

Lasso can be interpreted as least-squares linear regression where

Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?

Simple regression assumes a __________ relationship between the input attribute and output attribute.

In the regression equation Y = 75.65 + 0.50X, the intercept is

Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?

Which of the following indicates the fundamental of least squares?

Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y. Now Imagine that you are applying linear regression by fitting the best fit line using least square error on this data. You found that correlation coefficient for one of it’s variable(Say X1) with Y is 0.95.

In terms of bias and variance. Which of the following is true when you fit degree 2 polynomial?

Which of the following statements are true for a design matrix X ∈ Rn×d with d > n? (The rows are n sample points and the columns represent d features.)

Point out the wrong statement.

Suppose, you got a situation where you find that your linear regression model is under fitting the data. In such situation which of the following options would you consider?

If X and Y in a regression model are totally unrelated,

Which of the following statements are true for a design matrix X ∈ Rn×d with d > n? (The rows are n sample points and the columns represent d features.)

Problem in multi regression is ?

How can we best represent ‘support’ for the following association rule: “If X and Y, then Z”.

Choose the correct statement with respect to ‘confidence’ metric in association rules

What are tree based classifiers?

What is gini index?

Which of the following sentences are correct in reference to Information gain? a. It is biased towards single-valued attributes b. It is biased towards multi-valued attributes c. ID3 makes use of information gain d. The approact used by ID3 is greedy

Multivariate split is where the partitioning of tuples is based on a combination of attributes rather than on a single attribute.

Gain ratio tends to prefer unbalanced splits in which one partition is much smaller than the other

The gini index is not biased towards multivalued attributed.

Gini index does not favour equal sized partitions.

When the number of classes is large Gini index is not a good choice.

Attribute selection measures are also known as splitting rules.

his clustering approach initially assumes that each data instance represents a single cluster.

Which statement is true about the K-Means algorithm?

KDD represents extraction of

The most general form of distance is

Which of the following algorithm comes under the classification

Hierarchical agglomerative clustering is typically visualized as?

The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,from being considered for counting support

The distance between two points calculated using Pythagoras theorem is

Which one of these is not a tree based learner?

Which one of these is a tree based learner?

What is the approach of basic algorithm for decision tree induction?

Which of the following classifications would best suit the student performance classification systems?

Given that we can select the same feature multiple times during the recursive partitioning of the input space, is it always possible to achieve 100% accuracy on the training data (given that we allow for trees to grow to their maximum size) when building decision trees?

This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration

Which of the following sentences are correct in reference to
Information gain?
a. It is biased towards single-valued attributes
b. It is biased towards multi-valued attributes
c. ID3 makes use of information gain
d. The approact used by ID3 is greedy

Given that we can select the same feature multiple times during the recursive partitioning of
the input space, is it always possible to achieve 100% accuracy on the training data (given
that we allow for trees to grow to their maximum size) when building decision trees?

126.	How can we best represent ‘support’ for the following association rule: “If X and Y, then Z”.
A.	{x,y}/(total number of transactions)
B.	{z}/(total number of transactions)
C.	{z}/{x,y}
D.	{x,y,z}/(total number of transactions)
Answer» C. {z}/{x,y}

127.	Choose the correct statement with respect to ‘confidence’ metric in association rules
A.	it is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.
B.	a high value of confidence suggests a weak association rule
C.	it is the probability that a randomly selected transaction will include all the items in the consequent as well as all the items in the antecedent.
D.	confidence is not measured in terms of (estimated) conditional probability.
Answer» A. it is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.

128.	What are tree based classifiers?
A.	classifiers which form a tree with each attribute at one level
B.	classifiers which perform series of condition checking with one attribute at a time
C.	both options except none
D.	none of the options
Answer» C. both options except none

129.	What is gini index?
A.	it is a type of index structure
B.	it is a measure of purity
C.	both options except none
D.	none of the options
Answer» B. it is a measure of purity

130.	Which of the following sentences are correct in reference to Information gain? a. It is biased towards single-valued attributes b. It is biased towards multi-valued attributes c. ID3 makes use of information gain d. The approact used by ID3 is greedy
A.	a and b
B.	a and d
C.	b, c and d
D.	all of the above
Answer» C. b, c and d

131.	Multivariate split is where the partitioning of tuples is based on a combination of attributes rather than on a single attribute.
A.	true
B.	false
Answer» A. true

132.	Gain ratio tends to prefer unbalanced splits in which one partition is much smaller than the other
A.	true
B.	false
Answer» A. true

133.	The gini index is not biased towards multivalued attributed.
A.	true
B.	false
Answer» B. false

134.	Gini index does not favour equal sized partitions.
A.	true
B.	false
Answer» B. false

135.	When the number of classes is large Gini index is not a good choice.
A.	true
B.	false
Answer» A. true