730+ Machine Learning (ML) Solved MCQs

Machine learning is a subset of artificial intelligence that involves the use of algorithms and statistical models to enable a system to improve its performance on a specific task over time. In other words, machine learning algorithms are designed to allow a computer to learn from data, without being explicitly programmed.

These multiple-choice questions (MCQs) are designed to enhance your knowledge and understanding in the following areas: Computer Science Engineering (CSE) .

Take a Test

201.	This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:
A.	k-means clustering
B.	conceptual clustering
C.	expectation maximization
D.	agglomerative clustering
Answer» A. k-means clustering

202.	Which one of the following is the main reason for pruning a Decision Tree?
A.	to save computing time during testing
B.	to save space for storing the decision tree
C.	to make the training set error smaller
D.	to avoid overfitting the training set
Answer» D. to avoid overfitting the training set

203.	You've just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem?
A.	your decision trees are too shallow.
B.	you need to increase the learning rate.
C.	you are overfitting.
D.	incorrect data
Answer» A. your decision trees are too shallow.

204.	The K-means algorithm:
A.	requires the dimension of the feature space to be no bigger than the number of samples
B.	has the smallest value of the objective function when k = 1
C.	minimizes the within class variance for a given number of clusters
D.	converges to the global optimum if and only if the initial means are chosen as some of the samples themselves
Answer» C. minimizes the within class variance for a given number of clusters

205.	Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1. Single-link 2. Complete-link 3. Average-link
A.	1 and 2
B.	1 and 3
C.	2 and 3
D.	1, 2 and 3
Answer» D. 1, 2 and 3

206.	In which of the following cases will K-Means clustering fail to give good results? 1. Data points with outliers 2. Data points with different densities 3. Data points with round shapes 4. Data points with non-convex shapes
A.	1 and 2
B.	2 and 3
C.	2 and 4
D.	1, 2 and 4
Answer» D. 1, 2 and 4

207.	Hierarchical clustering is slower than non-hierarchical clustering?
A.	true
B.	false
C.	depends on data
D.	cannot say
Answer» A. true

208.	High entropy means that the partitions in classification are
A.	pure
B.	not pure
C.	useful
D.	useless
Answer» B. not pure

209.	Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?
A.	decision trees
B.	density-based clustering
C.	model-based clustering
D.	k-means clustering
Answer» B. density-based clustering

210.	The main disadvantage of maximum likelihood methods is that they are _____
A.	mathematically less folded
B.	mathematically less complex
C.	mathematically less complex
D.	computationally intense
Answer» D. computationally intense

211.	The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods.
A.	true
B.	false
C.	-
D.	-
Answer» A. true

212.	Which Statement is not true statement.
A.	k-means clustering is a linear clustering algorithm.
B.	k-means clustering aims to partition n observations into k clusters
C.	k-nearest neighbor is same as k-means
D.	k-means is sensitive to outlier
Answer» C. k-nearest neighbor is same as k-means

213.	what is Feature scaling done before applying K-Mean algorithm?
A.	in distance calculation it will give the same weights for all features
B.	you always get the same clusters. if you use or don\t use feature scaling
C.	in manhattan distance it is an important step but in euclidian it is not
D.	none of these
Answer» A. in distance calculation it will give the same weights for all features

214.	With Bayes theorem the probability of hypothesis HÂ¾ specified by P(H) Â¾ is referred to as
A.	a conditional probability
B.	an a priori probability
C.	a bidirectional probability
D.	a posterior probability
Answer» B. an a priori probability

215.	The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don’t subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car
A.	0.0398
B.	0.0389
C.	0.0368
D.	0.0396
Answer» D. 0.0396

216.	What is the naïve assumption in a Naïve Bayes Classifier.
A.	all the classes are independent of each other
B.	all the features of a class are independent of each other
C.	the most probable feature for a class is the most important feature to be cinsidered for classification
D.	all the features of a class are conditionally dependent on each other
Answer» D. all the features of a class are conditionally dependent on each other

217.	Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either?
A.	0.32
B.	0.2
C.	0.44
D.	0.56
Answer» A. 0.32

218.	What is the actual number of independent parameters which need to be estimated in P dimensional Gaussian distribution model?
A.	p
B.	2p
C.	p(p+1)/2
D.	p(p+3)/2
Answer» D. p(p+3)/2

219.	Give the correct Answer for following statements. 1. It is important to perform feature normalization before using the Gaussian kernel. 2. The maximum value of the Gaussian kernel is 1.
A.	1 is true, 2 is false
B.	1 is false, 2 is true
C.	1 is true, 2 is true
D.	1 is false, 2 is false
Answer» C. 1 is true, 2 is true

220.	Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model?
A.	negative log-likelihood
B.	log-liklihood
C.	cross entropy
D.	residual sum of square
Answer» A. negative log-likelihood

221.	Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.
A.	0
B.	1
C.	0.1
D.	0.9
Answer» B. 1

222.	Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:
A.	y is false when x is known to be false.
B.	y is true when x is known to be true.
C.	x is true when y is known to be true
D.	x is false when y is known to be false.
Answer» B. y is true when x is known to be true.

223.	Which of the following statements about Naive Bayes is incorrect?
A.	attributes are equally important.
B.	attributes are statistically dependent of one another given the class value.
C.	attributes are statistically independent of one another given the class value.
D.	attributes can be nominal or numeric
Answer» B. attributes are statistically dependent of one another given the class value.

224.	How the entries in the full joint probability distribution can be calculated?
A.	using variables
B.	using information
C.	both using variables & information
D.	none of the mentioned
Answer» B. using information

225.	How many terms are required for building a bayes model?
A.	1
B.	2
C.	3
D.	4
Answer» C. 3

730+ Machine Learning (ML) Solved MCQs

This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one:

Which one of the following is the main reason for pruning a Decision Tree?

You've just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem?

The K-means algorithm:

Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering? 1. Single-link 2. Complete-link 3. Average-link

In which of the following cases will K-Means clustering fail to give good results? 1. Data points with outliers 2. Data points with different densities 3. Data points with round shapes 4. Data points with non-convex shapes

Hierarchical clustering is slower than non-hierarchical clustering?

High entropy means that the partitions in classification are

Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate?

The main disadvantage of maximum likelihood methods is that they are _____

The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods.

Which Statement is not true statement.

what is Feature scaling done before applying K-Mean algorithm?

With Bayes theorem the probability of hypothesis HÂ¾ specified by P(H) Â¾ is referred to as

What is the naïve assumption in a Naïve Bayes Classifier.

What is the actual number of independent parameters which need to be estimated in P dimensional Gaussian distribution model?

Give the correct Answer for following statements. 1. It is important to perform feature normalization before using the Gaussian kernel. 2. The maximum value of the Gaussian kernel is 1.

Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model?

Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively.

Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one:

Which of the following statements about Naive Bayes is incorrect?

How the entries in the full joint probability distribution can be calculated?

How many terms are required for building a bayes model?

Skewness of Normal distribution is ___________

The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?

In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?

Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance 2. Generalized models 3. Better interpretability

What is back propagation?

Which of the following is an application of NN (Neural Network)?

Neural Networks are complex ______________ with many parameters.

Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.

Which one of the following is not a major strength of the neural network approach?

The network that involves backward links from output to the input and hidden layers is called

Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms? 1. Max number of samples 2. Max features 3. Bootstrapping of samples 4. Bootstrapping of features

In an election for the head of college, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes.which of the following ensembles method works similar to the discussed elction Procedure?

What is the sequence of the following tasks in a perceptron? Initialize weights of perceptron randomly Go to the next batch of dataset If the prediction does not match the output, change the weights For a sample input, compute an output

In which neural net architecture, does weight sharing occur?

Given above is a description of a neural network. When does a neural network model become a deep learning model?

A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 10 and 30 respectively. What will be the output?

Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network.

The F-test

What is true about an ensembled classifier? 1. Classifiers that are more “sure” can vote with more conviction 2. Classifiers can be more “sure” about a particular part of the space 3. Most of the times, it performs better than a single classifier

Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance 2. Generalized models 3. Better interpretability

Which of the following can be true for selecting base learners for an ensemble? 1. Different learners can come from same algorithm with different hyper parameters 2. Different learners can come from different algorithms 3. Different learners can come from different training spaces

Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?
1. Single-link
2. Complete-link
3. Average-link

In which of the following cases will K-Means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with round shapes
4. Data points with non-convex shapes

Give the correct Answer for following statements.
1. It is important to perform feature normalization before using the Gaussian kernel.
2. The maximum value of the Gaussian kernel is 1.

Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance
2. Generalized models
3. Better interpretability

Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?
1. Max number of samples
2. Max features
3. Bootstrapping of samples
4. Bootstrapping of features

What is the sequence of the following tasks in a perceptron?
Initialize weights of perceptron randomly
Go to the next batch of dataset
If the prediction does not match the output, change the weights
For a sample input, compute an output

What is true about an ensembled classifier?
1. Classifiers that are more “sure” can vote with more conviction
2. Classifiers can be more “sure” about a particular part of the space
3. Most of the times, it performs better than a single classifier

Which of the following option is / are correct regarding benefits of ensemble model?
1. Better performance
2. Generalized models
3. Better interpretability

Which of the following can be true for selecting base learners for an ensemble?
1. Different learners can come from same algorithm with different hyper parameters
2. Different learners can come from different algorithms
3. Different learners can come from different training spaces

226.	Skewness of Normal distribution is ___________
A.	negative
B.	positive
C.	0
D.	undefined
Answer» C. 0

227.	The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
A.	the attributes are not linearly related.
B.	as the value of one attribute increases the value of the second attribute also increases
C.	as the value of one attribute decreases the value of the second attribute increases
D.	the attributes show a linear relationship
Answer» C. as the value of one attribute decreases the value of the second attribute increases

228.	8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations: C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0),(2,5)} C3: {(5,5), (9,9)} What will be the cluster centroids if you want to proceed for second iteration?
A.	c1: (4,4), c2: (2,2), c3: (7,7)
B.	c1: (6,6), c2: (4,4), c3: (9,9)
C.	c1: (2,2), c2: (0,0), c3: (5,5)
D.	c1: (4,4), c2: (3,3), c3: (7,7)
Answer» D. c1: (4,4), c2: (3,3), c3: (7,7)

229.	In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"?
A.	p(x/c)
B.	p(c/x)
C.	p(c)
D.	p(x)
Answer» A. p(x/c)

230.	Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance 2. Generalized models 3. Better interpretability
A.	1 and 3
B.	2 and 3
C.	1, 2 and 3
D.	1 and 2
Answer» D. 1 and 2

231.	What is back propagation?
A.	it is another name given to the curvy function in the perceptron
B.	it is the transmission of error back through the network to adjust the inputs
C.	it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn
D.	none of the mentioned
Answer» A. it is another name given to the curvy function in the perceptron

232.	Which of the following is an application of NN (Neural Network)?
A.	sales forecasting
B.	data validation
C.	risk management
D.	all of the mentioned
Answer» D. all of the mentioned

233.	Neural Networks are complex ______________ with many parameters.
A.	linear functions
B.	nonlinear functions
C.	discrete functions
D.	exponential functions
Answer» A. linear functions

234.	Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results.
A.	true – this works always, and these multiple perceptrons learn to classify even complex problems
B.	false – perceptrons are mathematically incapable of solving linearly inseparable functions, no matter what you do
C.	true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded
D.	false – just having a single perceptron is enough
Answer» C. true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded

235.	Which one of the following is not a major strength of the neural network approach?
A.	neural network learning algorithms are guaranteed to converge to an optimal solution
B.	neural networks work well with datasets containing noisy data
C.	neural networks can be used for both supervised learning and unsupervised clustering
D.	neural networks can be used for applications that require a time element to be included in the data
Answer» A. neural network learning algorithms are guaranteed to converge to an optimal solution