McqMate
301. 
A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Which of the following statement is true in following case? 
A.  feature f1 is an example of nominal variable. 
B.  feature f1 is an example of ordinal variable. 
C.  it doesn't belong to any of the above category. 
D.  both of these 
Answer» B. feature f1 is an example of ordinal variable. 
302. 
What would you do in PCA to get the same projection as SVD? 
A.  transform data to zero mean 
B.  transform data to zero median 
C.  not possible 
D.  none of these 
Answer» A. transform data to zero mean 
303. 
What is PCA, KPCA and ICA used for? 
A.  principal components analysis 
B.  kernel based principal component analysis 
C.  independent component analysis 
D.  all above 
Answer» D. all above 
304. 
Can a model trained for item based similarity also choose from a given set of items? 
A.  yes 
B.  no 
Answer» A. yes 
305. 
What are common feature selection methods in regression task? 
A.  correlation coefficient 
B.  greedy algorithms 
C.  all above 
D.  none of these 
Answer» C. all above 
306. 
The parameter allows specifying the percentage of elements to put into the test/training set 
A.  test_size 
B.  training_size 
C.  all above 
D.  none of these 
Answer» C. all above 
307. 
In many classification problems, the target is made up of categorical labels which cannot immediately be processed by any algorithm. 
A.  random_state 
B.  dataset 
C.  test_size 
D.  all above 
Answer» B. dataset 
308. 
adopts a dictionaryoriented approach, associating to each category label a progressive integer number. 
A.  labelencoder class 
B.  labelbinarizer class 
C.  dictvectorizer 
D.  featurehasher 
Answer» A. labelencoder class 
309. 
If Linear regression model perfectly first i.e., train error is zero, then 
A.  test error is also always zero 
B.  test error is non zero 
C.  couldn't comment on test error 
D.  test error is equal to train error 
Answer» C. couldn't comment on test error 
310. 
Which of the following metrics can be used for evaluating regression models?

A.  ii and iv 
B.  i and ii 
C.  ii, iii and iv 
D.  i, ii, iii and iv 
Answer» D. i, ii, iii and iv 
311. 
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change? 
A.  by 1 
B.  no change 
C.  by intercept 
D.  by its slope 
Answer» D. by its slope 
312. 
Function used for linear regression in R is 
A.  lm(formula, data) 
B.  lr(formula, data) 
C.  lrm(formula, data) 
D.  regression.linear(formula, data) 
Answer» A. lm(formula, data) 
313. 
In syntax of linear model lm(formula,data,..), data refers to 
A.  matrix 
B.  vector 
C.  array 
D.  list 
Answer» B. vector 
314. 
In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to 
A.  (xintercept, slope) 
B.  (slope, xintercept) 
C.  (yintercept, slope) 
D.  (slope, yintercept) 
Answer» C. (yintercept, slope) 
315. 
Linear Regression is a supervised machine learning algorithm. 
A.  true 
B.  false 
Answer» A. true 
316. 
It is possible to design a Linear regression algorithm using a neural network? 
A.  true 
B.  false 
Answer» A. true 
317. 
Overfitting is more likely when you have huge amount of data to train? 
A.  true 
B.  false 
Answer» B. false 
318. 
Which of the following statement is true about outliers in Linear regression? 
A.  linear regression is sensitive to outliers 
B.  linear regression is not sensitive to outliers 
C.  can't say 
D.  none of these 
Answer» A. linear regression is sensitive to outliers 
319. 
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation? 
A.  since the there is a relationship means our model is not good 
B.  since the there is a relationship means our model is good 
C.  can't say 
D.  none of these 
Answer» A. since the there is a relationship means our model is not good 
320. 
Naive Bayes classifiers are a collection of algorithms 
A.  classification 
B.  clustering 
C.  regression 
D.  all 
Answer» A. classification 
321. 
Naive Bayes classifiers is Learning 
A.  supervised 
B.  unsupervised 
C.  both 
D.  none 
Answer» A. supervised 
322. 
Features being classified is independent of each other in Nave Bayes Classifier 
A.  false 
B.  true 
Answer» B. true 
323. 
Features being classified is of each other in Nave Bayes Classifier 
A.  independent 
B.  dependent 
C.  partial dependent 
D.  none 
Answer» A. independent 
324. 
Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.

A.  true 
B.  false 
Answer» A. true 
325. 
In given image, P(HE) is probability. 
A.  posterior 
B.  prior 
Answer» A. posterior 
326. 
In given image, P(H) is probability. 
A.  posterior 
B.  prior 
Answer» B. prior 
327. 
Conditional probability is a measure of the probability of an event given that another 
A.  true 
B.  false 
Answer» A. true 
328. 
Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. 
A.  true 
B.  false 
Answer» A. true 
329. 
Bernoulli Nave Bayes Classifier is distribution 
A.  continuous 
B.  discrete 
C.  binary 
Answer» C. binary 
330. 
Multinomial Nave Bayes Classifier is distribution 
A.  continuous 
B.  discrete 
C.  binary 
Answer» B. discrete 
331. 
Gaussian Nave Bayes Classifier is distribution 
A.  continuous 
B.  discrete 
C.  binary 
Answer» A. continuous 
332. 
Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features. 
A.  true 
B.  false 
Answer» A. true 
333. 
Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the of the feature values. 
A.  mean 
B.  variance 
C.  discrete 
D.  random 
Answer» A. mean 
334. 
SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = ??1jx) 
A.  true 
B.  false 
Answer» B. false 
335. 
Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian. 
A.  true 
B.  false 
Answer» A. true 
336. 
Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting 
A.  true 
B.  false 
Answer» A. true 
337. 
SVM is a algorithm 
A.  classification 
B.  clustering 
C.  regression 
D.  all 
Answer» A. classification 
338. 
SVM is a learning 
A.  supervised 
B.  unsupervised 
C.  both 
D.  none 
Answer» A. supervised 
339. 
The linearSVMclassifier works by drawing a straight line between two classes 
A.  true 
B.  false 
Answer» A. true 
340. 
Which of the following function provides unsupervised prediction ? 
A.  cl_forecastb 
B.  cl_nowcastc 
C.  cl_precastd 
D.  none of the mentioned 
Answer» D. none of the mentioned 
341. 
Which of the following is characteristic of best machine learning method ? 
A.  fast 
B.  accuracy 
C.  scalable 
D.  all above 
Answer» D. all above 
342. 
What are the different Algorithm techniques in Machine Learning? 
A.  supervised learning and semisupervised learning 
B.  unsupervised learning and transduction 
C.  both a & b 
D.  none of the mentioned 
Answer» C. both a & b 
343. 
What is the standard approach to supervised learning? 
A.  split the set of example into the training set and the test 
B.  group the set of example into the training set and the test 
C.  a set of observed instances tries to induce a general rule 
D.  learns programs from data 
Answer» A. split the set of example into the training set and the test 
344. 
Which of the following is not Machine Learning? 
A.  artificial intelligence 
B.  rule based inference 
C.  both a & b 
D.  none of the mentioned 
Answer» B. rule based inference 
345. 
What is Model Selection in Machine Learning? 
A.  the process of selecting models among different mathematical models, which are used to describe the same data set 
B.  when a statistical model describes random error or noise instead of underlying relationship 
C.  find interesting directions in data and find novel observations/ database cleaning 
D.  all above 
Answer» A. the process of selecting models among different mathematical models, which are used to describe the same data set 
346. 
Which are two techniques of Machine Learning ? 
A.  genetic programming and inductive learning 
B.  speech recognition and regression 
C.  both a & b 
D.  none of the mentioned 
Answer» A. genetic programming and inductive learning 
347. 
Even if there are no actual supervisors learning is also based on feedback provided by the environment 
A.  supervised 
B.  reinforcement 
C.  unsupervised 
D.  none of the above 
Answer» B. reinforcement 
348. 
What does learning exactly mean? 
A.  robots are programed so that they can perform the task based on data they gather from sensors. 
B.  a set of data is used to discover the potentially predictive relationship. 
C.  learning is the ability to change according to external stimuli and remembering most of all previous experiences. 
D.  it is a set of data is used to discover the potentially predictive relationship. 
Answer» C. learning is the ability to change according to external stimuli and remembering most of all previous experiences. 
349. 
When it is necessary to allow the model to develop a generalization ability and avoid a common problem called . 
A.  overfitting 
B.  overlearning 
C.  classification 
D.  regression 
Answer» A. overfitting 
350. 
Techniques involve the usage of both labeled and unlabeled data is called . 
A.  supervised 
B.  semisupervised 
C.  unsupervised 
D.  none of the above 
Answer» B. semisupervised 
351. 
In reinforcement learning if feedback is negative one it is defined as . 
A.  penalty 
B.  overlearning 
C.  reward 
D.  none of above 
Answer» A. penalty 
352. 
According to , it's a key success factor for the survival and evolution of all species. 
A.  claude shannon\s theory 
B.  gini index 
C.  darwin's theory 
D.  none of above 
Answer» C. darwin's theory 
353. 
A supervised scenario is characterized by the concept of a . 
A.  programmer 
B.  teacher 
C.  author 
D.  farmer 
Answer» B. teacher 
354. 
overlearning causes due to an excessive . 
A.  capacity 
B.  regression 
C.  reinforcement 
D.  accuracy 
Answer» A. capacity 
355. 
Which of the following is an example of a deterministic algorithm? 
A.  pca 
B.  kmeans 
C.  none of the above 
Answer» A. pca 
356. 
Which of the following model model include a backwards elimination feature selection routine? 
A.  mcv 
B.  mars 
C.  mcrs 
D.  all above 
Answer» B. mars 
357. 
Can we extract knowledge without apply feature selection 
A.  yes 
B.  no 
Answer» A. yes 
358. 
While using feature selection on the data, is the number of features decreases. 
A.  no 
B.  yes 
Answer» B. yes 
359. 
Which of the following are several models 
A.  regression 
B.  classification 
C.  none of the above 
Answer» C. none of the above 
360. 
provides some builtin datasets that can be used for testing purposes. 
A.  scikitlearn 
B.  classification 
C.  regression 
D.  none of the above 
Answer» A. scikitlearn 
361. 
While using all labels are turned into sequential numbers. 
A.  labelencoder class 
B.  labelbinarizer class 
C.  dictvectorizer 
D.  featurehasher 
Answer» A. labelencoder class 
362. 
produce sparse matrices of real numbers that can be fed into any machine learning model. 
A.  dictvectorizer 
B.  featurehasher 
C.  both a & b 
D.  none of the mentioned 
Answer» C. both a & b 
363. 
scikitlearn offers the class , which is responsible for filling the holes using a strategy based on the mean, median, or frequency 
A.  labelencoder 
B.  labelbinarizer 
C.  dictvectorizer 
D.  imputer 
Answer» D. imputer 
364. 
Which of the following scale data by removing elements that don't belong to a given range or by considering a maximum absolute value. 
A.  minmaxscaler 
B.  maxabsscaler 
C.  both a & b 
D.  none of the mentioned 
Answer» C. both a & b 
365. 
scikitlearn also provides a class for per sample normalization, 
A.  normalizer 
B.  imputer 
C.  classifier 
D.  all above 
Answer» A. normalizer 
366. 
dataset with many features contains information proportional to the independence of all features and their variance. 
A.  normalized 
B.  unnormalized 
C.  both a & b 
D.  none of the mentioned 
Answer» B. unnormalized 
367. 
In order to assess how much information is brought by each component, and the correlation among them, a useful tool is the . 
A.  concuttent matrix 
B.  convergance matrix 
C.  supportive matrix 
D.  covariance matrix 
Answer» D. covariance matrix 
368. 
The parameter can assume different values which determine how the data matrix is initially processed. 
A.  run 
B.  start 
C.  init 
D.  stop 
Answer» C. init 
369. 
allows exploiting the natural sparsity of data while extracting principal components. 
A.  sparsepca 
B.  kernelpca 
C.  svd 
D.  init parameter 
Answer» A. sparsepca 
370. 
Which of the following is true about Residuals ? 
A.  lower is better 
B.  higher is better 
C.  a or b depend on the situation 
D.  none of these 
Answer» A. lower is better 
371. 
Overfitting is more likely when you have huge amount of data to train? 
A.  true 
B.  false 
Answer» B. false 
372. 
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them. Which of the following conclusion do you make about this situation? 
A.  since the there is a relationship means our model is not good 
B.  since the there is a relationship means our model is good 
C.  can�t say 
D.  none of these 
Answer» A. since the there is a relationship means our model is not good 
373. 
Lets say, a Linear regression model perfectly fits the training data (train error is zero). Now, Which of the following statement is true? 
A.  you will always have test error zero 
B.  you can not have test error zero 
C.  none of the above 
Answer» C. none of the above 
374. 
In a linear regression problem, we are using Rsquared to measure goodnessoffit. We add a feature in linear regression model and retrain the same model.Which of the following option is true? 
A.  if r squared increases, this variable is significant. 
B.  if r squared decreases, this variable is not significant. 
C.  individually r squared cannot tell about variable importance. we can't say anything about it right now. 
D.  none of these. 
Answer» C. individually r squared cannot tell about variable importance. we can't say anything about it right now. 
375. 
Which of the one is true about Heteroskedasticity? 
A.  linear regression with varying error terms 
B.  linear regression with constant error terms 
C.  linear regression with zero error terms 
D.  none of these 
Answer» A. linear regression with varying error terms 
376. 
Which of the following assumptions do we make while deriving linear regression parameters?1. The true relationship between dependent y and predictor x is linear2. The model errors are statistically independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The predictor x is nonstochastic and is measured errorfree 
A.  1,2 and 3. 
B.  1,3 and 4. 
C.  1 and 3. 
D.  all of above. 
Answer» D. all of above. 
377. 
To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited? 
A.  scatter plot 
B.  barchart 
C.  histograms 
D.  none of these 
Answer» A. scatter plot 
378. 
which of the following step / assumption in regression modeling impacts the trade off between underfitting and overfitting the most. 
A.  the polynomial degree 
B.  whether we learn the weights by matrix inversion or gradient descent 
C.  the use of a constantterm 
Answer» A. the polynomial degree 
379. 
Can we calculate the skewness of variables based on mean and median? 
A.  true 
B.  false 
Answer» B. false 
380. 
Which of the following is true about Ridge or Lasso regression methods in case of feature selection? 
A.  ridge regression uses subset selection of features 
B.  lasso regression uses subset selection of features 
C.  both use subset selection of features 
D.  none of above 
Answer» B. lasso regression uses subset selection of features 
381. 
Which of the following statement(s) can be true post adding a variable in a linear regression model?1. RSquared and Adjusted Rsquared both increase2. R Squared increases and Adjusted R 
A.  1 and 2 
B.  1 and 3 
C.  2 and 4 
D.  none of the above 
Answer» A. 1 and 2 
382. 
How many coefficients do you need to estimate in a simple linear regression model (One independent variable)? 
A.  1 
B.  2 
C.  can't say 
Answer» B. 2 
383. 
Conditional probability is a measure of the probability of an event given that another event has already occurred. 
A.  true 
B.  false 
Answer» A. true 
384. 
What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high dimensional space2. Its a similarity function 
A.  1 
B.  2 
C.  1 and 2 
D.  none of these 
Answer» C. 1 and 2 
385. 
Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of its hyper parameter.What would happen when you use very small C (C~0)? 
A.  misclassification would happen 
B.  data will be correctly classified 
C.  can't say 
D.  none of these 
Answer» A. misclassification would happen 
386. 
The cost parameter in the SVM means: 
A.  the number of cross validations to be made 
B.  the kernel to be used 
C.  the tradeoff between misclassification and simplicity of the model 
D.  none of the above 
Answer» C. the tradeoff between misclassification and simplicity of the model 
387. 
If you remove the nonred circled points from the data, the decision boundary will 
A.  true 
B.  false 
Answer» B. false 
388. 
How do you handle missing or corrupted data in a dataset? 
A.  drop missing rows or columns 
B.  replace missing values with mean/median/mode 
C.  assign a unique category to missing values 
D.  all of the above 
Answer» D. all of the above 
389. 
The SVMs are less effective when: 
A.  the data is linearly separable 
B.  the data is clean and ready to use 
C.  the data is noisy and contains overlapping points 
Answer» C. the data is noisy and contains overlapping points 
390. 
If there is only a discrete number of possible outcomes called . 
A.  modelfree 
B.  categories 
C.  prediction 
D.  none of above 
Answer» B. categories 
391. 
Some people are using the term instead of prediction only to avoid the weird idea that machine learning is a sort of modern magic. 
A.  inference 
B.  interference 
C.  accuracy 
D.  none of above 
Answer» A. inference 
392. 
The term can be freely used, but with the same meaning adopted in physics or system theory. 
A.  accuracy 
B.  cluster 
C.  regression 
D.  prediction 
Answer» D. prediction 
393. 
Common deep learning applications / problems can also be solved using 
A.  realtime visual object identification 
B.  classic approaches 
C.  automatic labeling 
D.  bioinspired adaptive systems 
Answer» B. classic approaches 
394. 
Identify the various approaches for machine learning. 
A.  concept vs classification learning 
B.  symbolic vs statistical learning 
C.  inductive vs analytical learning 
D.  all above 
Answer» D. all above 
395. 
what is the function of Unsupervised Learning? 
A.  find clusters of the data and find lowdimensional representations of the data 
B.  find interesting directions in data and find novel observations/ database cleaning 
C.  interesting coordinates and correlations 
D.  all 
Answer» D. all 
396. 
What are the two methods used for the calibration in Supervised Learning? 
A.  platt calibration and isotonic regression 
B.  statistics and informal retrieval 
Answer» A. platt calibration and isotonic regression 
397. 
Which of the following are several models for feature extraction 
A.  regression 
B.  classification 
C.  none of the above 
Answer» C. none of the above 
398. 
Let's say, a Linear regression model perfectly fits the training data (train error 
A.  you will always have test error zero 
B.  you can not have test error zero 
C.  none of the above 
Answer» C. none of the above 
399. 
Which of the following assumptions do we make while deriving linear regression parameters?

A.  1,2 and 3. 
B.  1,3 and 4. 
C.  1 and 3. 
D.  all of above. 
Answer» D. all of above. 
400. 
Suppose we fit Lasso Regression to a data set, which has 100 features (X1,X2X100). Now, we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same regularization parameter.Now, which of the following option will be correct? 
A.  it is more likely for x1 to be excluded from the model 
B.  it is more likely for x1 to be included in the model 
C.  can't say 
D.  none of these 
Answer» B. it is more likely for x1 to be included in the model 
Done Reading?