730+ Machine Learning (ML) Solved MCQs

Machine learning is a subset of artificial intelligence that involves the use of algorithms and statistical models to enable a system to improve its performance on a specific task over time. In other words, machine learning algorithms are designed to allow a computer to learn from data, without being explicitly programmed.

These multiple-choice questions (MCQs) are designed to enhance your knowledge and understanding in the following areas: Computer Science Engineering (CSE) .

Take a Test

351.	In reinforcement learning if feedback is negative one it is defined as .
A.	penalty
B.	overlearning
C.	reward
D.	none of above
Answer» A. penalty

352.	According to , it's a key success factor for the survival and evolution of all species.
A.	claude shannon\s theory
B.	gini index
C.	darwin's theory
D.	none of above
Answer» C. darwin's theory

353.	A supervised scenario is characterized by the concept of a .
A.	programmer
B.	teacher
C.	author
D.	farmer
Answer» B. teacher

354.	overlearning causes due to an excessive .
A.	capacity
B.	regression
C.	reinforcement
D.	accuracy
Answer» A. capacity

355.	Which of the following is an example of a deterministic algorithm?
A.	pca
B.	k-means
C.	none of the above
Answer» A. pca

356.	Which of the following model model include a backwards elimination feature selection routine?
A.	mcv
B.	mars
C.	mcrs
D.	all above
Answer» B. mars

357.	Can we extract knowledge without apply feature selection
A.	yes
B.	no
Answer» A. yes

358.	While using feature selection on the data, is the number of features decreases.
A.	no
B.	yes
Answer» B. yes

359.	Which of the following are several models
A.	regression
B.	classification
C.	none of the above
Answer» C. none of the above

360.	provides some built-in datasets that can be used for testing purposes.
A.	scikit-learn
B.	classification
C.	regression
D.	none of the above
Answer» A. scikit-learn

361.	While using all labels are turned into sequential numbers.
A.	labelencoder class
B.	labelbinarizer class
C.	dictvectorizer
D.	featurehasher
Answer» A. labelencoder class

362.	produce sparse matrices of real numbers that can be fed into any machine learning model.
A.	dictvectorizer
B.	featurehasher
C.	both a & b
D.	none of the mentioned
Answer» C. both a & b

363.	scikit-learn offers the class , which is responsible for filling the holes using a strategy based on the mean, median, or frequency
A.	labelencoder
B.	labelbinarizer
C.	dictvectorizer
D.	imputer
Answer» D. imputer

364.	Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value.
A.	minmaxscaler
B.	maxabsscaler
C.	both a & b
D.	none of the mentioned
Answer» C. both a & b

365.	scikit-learn also provides a class for per- sample normalization,
A.	normalizer
B.	imputer
C.	classifier
D.	all above
Answer» A. normalizer

366.	dataset with many features contains information proportional to the independence of all features and their variance.
A.	normalized
B.	unnormalized
C.	both a & b
D.	none of the mentioned
Answer» B. unnormalized

367.	In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the .
A.	concuttent matrix
B.	convergance matrix
C.	supportive matrix
D.	covariance matrix
Answer» D. covariance matrix

368.	The parameter can assume different values which determine how the data matrix is initially processed.
A.	run
B.	start
C.	init
D.	stop
Answer» C. init

369.	allows exploiting the natural sparsity of data while extracting principal components.
A.	sparsepca
B.	kernelpca
C.	svd
D.	init parameter
Answer» A. sparsepca

370.	Which of the following is true about Residuals ?
A.	lower is better
B.	higher is better
C.	a or b depend on the situation
D.	none of these
Answer» A. lower is better

371.	Overfitting is more likely when you have huge amount of data to train?
A.	true
B.	false
Answer» B. false

372.	Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?
A.	since the there is a relationship means our model is not good
B.	since the there is a relationship means our model is good
C.	can�t say
D.	none of these
Answer» A. since the there is a relationship means our model is not good

373.	Lets say, a Linear regression model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?
A.	you will always have test error zero
B.	you can not have test error zero
C.	none of the above
Answer» C. none of the above

374.	In a linear regression problem, we are using R-squared to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?
A.	if r squared increases, this variable is significant.
B.	if r squared decreases, this variable is not significant.
C.	individually r squared cannot tell about variable importance. we can't say anything about it right now.
D.	none of these.
Answer» C. individually r squared cannot tell about variable importance. we can't say anything about it right now.

375.	Which of the one is true about Heteroskedasticity?
A.	linear regression with varying error terms
B.	linear regression with constant error terms
C.	linear regression with zero error terms
D.	none of these
Answer» A. linear regression with varying error terms

730+ Machine Learning (ML) Solved MCQs

In reinforcement learning if feedback is negative one it is defined as .

According to , it's a key success factor for the survival and evolution of all species.

A supervised scenario is characterized by the concept of a .

overlearning causes due to an excessive .

Which of the following is an example of a deterministic algorithm?

Which of the following model model include a backwards elimination feature selection routine?

Can we extract knowledge without apply feature selection

While using feature selection on the data, is the number of features decreases.

Which of the following are several models

provides some built-in datasets that can be used for testing purposes.

While using all labels are turned into sequential numbers.

produce sparse matrices of real numbers that can be fed into any machine learning model.

scikit-learn offers the class , which is responsible for filling the holes using a strategy based on the mean, median, or frequency

Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value.

scikit-learn also provides a class for per- sample normalization,

dataset with many features contains information proportional to the independence of all features and their variance.

In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the .

The parameter can assume different values which determine how the data matrix is initially processed.

allows exploiting the natural sparsity of data while extracting principal components.

Which of the following is true about Residuals ?

Overfitting is more likely when you have huge amount of data to train?

Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation?

Lets say, a Linear regression model perfectly fits the training data (train error is zero). Now, Which of the following statement is true?

In a linear regression problem, we are using R-squared to measure goodness-of-fit. We add a feature in linear regression model and retrain the same model.Which of the following option is true?

Which of the one is true about Heteroskedasticity?

To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?

which of the following step / assumption in regression modeling impacts the trade- off between under-fitting and over-fitting the most.

Can we calculate the skewness of variables based on mean and median?

Which of the following is true about Ridge or Lasso regression methods in case of feature selection?

Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R- Squared increases and Adjusted R-

How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?

Conditional probability is a measure of the probability of an event given that another event has already occurred.

What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. Its a similarity function

The cost parameter in the SVM means:

If you remove the non-red circled points from the data, the decision boundary will

How do you handle missing or corrupted data in a dataset?

The SVMs are less effective when:

If there is only a discrete number of possible outcomes called .

Some people are using the term instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic.

The term can be freely used, but with the same meaning adopted in physics or system theory.

Common deep learning applications / problems can also be solved using

Identify the various approaches for machine learning.

what is the function of Unsupervised Learning?

What are the two methods used for the calibration in Supervised Learning?

Which of the following are several models for feature extraction

Let's say, a Linear regression model perfectly fits the training data (train error

Suppose we fit Lasso Regression to a data set, which has 100 features (X1,X2X100). Now, we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same regularization parameter.Now, which of the following option will be correct?

376.	Which of the following assumptions do we make while deriving linear regression parameters?1. The true relationship between dependent y and predictor x is linear2. The model errors are statistically independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The predictor x is non-stochastic and is measured error-free
A.	1,2 and 3.
B.	1,3 and 4.
C.	1 and 3.
D.	all of above.
Answer» D. all of above.

377.	To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?
A.	scatter plot
B.	barchart
C.	histograms
D.	none of these
Answer» A. scatter plot

378.	which of the following step / assumption in regression modeling impacts the trade- off between under-fitting and over-fitting the most.
A.	the polynomial degree
B.	whether we learn the weights by matrix inversion or gradient descent
C.	the use of a constant-term
Answer» A. the polynomial degree

379.	Can we calculate the skewness of variables based on mean and median?
A.	true
B.	false
Answer» B. false

380.	Which of the following is true about Ridge or Lasso regression methods in case of feature selection?
A.	ridge regression uses subset selection of features
B.	lasso regression uses subset selection of features
C.	both use subset selection of features
D.	none of above
Answer» B. lasso regression uses subset selection of features

381.	Which of the following statement(s) can be true post adding a variable in a linear regression model?1. R-Squared and Adjusted R-squared both increase2. R- Squared increases and Adjusted R-
A.	1 and 2
B.	1 and 3
C.	2 and 4
D.	none of the above
Answer» A. 1 and 2

382.	How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?
A.	1
B.	2
C.	can't say
Answer» B. 2

383.	Conditional probability is a measure of the probability of an event given that another event has already occurred.
A.	true
B.	false
Answer» A. true

384.	What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. Its a similarity function
A.	1
B.	2
C.	1 and 2
D.	none of these
Answer» C. 1 and 2

385.	Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter.What would happen when you use very small C (C~0)?
A.	misclassification would happen
B.	data will be correctly classified
C.	can't say
D.	none of these
Answer» A. misclassification would happen