McqMate
1. 
Application of machine learning methods to large databases is called 
A.  data mining. 
B.  artificial intelligence 
C.  big data computing 
D.  internet of things 
Answer» A. data mining. 
2. 
If machine learning model output involves target variable then that model is called as 
A.  descriptive model 
B.  predictive model 
C.  reinforcement learning 
D.  all of the above 
Answer» B. predictive model 
3. 
In what type of learning labelled training data is used 
A.  unsupervised learning 
B.  supervised learning 
C.  reinforcement learning 
D.  active learning 
Answer» B. supervised learning 
4. 
In following type of feature selection method we start with empty feature set 
A.  forward feature selection 
B.  backword feature selection 
C.  both a and b?? 
D.  none of the above 
Answer» A. forward feature selection 
5. 
In PCA the number of input dimensiona are equal to principal components 
A.  true 
B.  false 
Answer» A. true 
6. 
PCA can be used for projecting and visualizing data in lower dimensions. 
A.  true 
B.  false 
Answer» A. true 
7. 
Which of the following is the best machine learning method? 
A.  scalable 
B.  accuracy 
C.  fast 
D.  all of the above 
Answer» D. all of the above 
8. 
What characterize unlabeled examples in machine learning 
A.  there is no prior knowledge 
B.  there is no confusing knowledge 
C.  there is prior knowledge 
D.  there is plenty of confusing knowledge 
Answer» D. there is plenty of confusing knowledge 
9. 
What does dimensionality reduction reduce? 
A.  stochastics 
B.  collinerity 
C.  performance 
D.  entropy 
Answer» B. collinerity 
10. 
Data used to build a data mining model. 
A.  training data 
B.  validation data 
C.  test data 
D.  hidden data 
Answer» A. training data 
11. 
The problem of finding hidden structure in unlabeled data is called… 
A.  supervised learning 
B.  unsupervised learning 
C.  reinforcement learning 
D.  none of the above 
Answer» B. unsupervised learning 
12. 
Of the Following Examples, Which would you address using an supervised learning Algorithm? 
A.  given email labeled as spam or not spam, learn a spam filter 
B.  given a set of news articles found on the web, group them into set of articles about the same story. 
C.  given a database of customer data, automatically discover market segments and group customers into different market segments. 
D.  find the patterns in market basket analysis 
Answer» A. given email labeled as spam or not spam, learn a spam filter 
13. 
Dimensionality Reduction Algorithms are one of the possible ways to reduce the computation time required to build a model 
A.  true 
B.  false 
Answer» A. true 
14. 
You are given reviews of few netflix series marked as positive, negative and neutral. Classifying reviews of a new netflix series is an example of 
A.  supervised learning 
B.  unsupervised learning 
C.  semisupervised learning 
D.  reinforcement learning 
Answer» A. supervised learning 
15. 
Which of the following is a good test dataset characteristic? 
A.  large enough to yield meaningful results 
B.  is representative of the dataset as a whole 
C.  both a and b 
D.  none of the above 
Answer» C. both a and b 
16. 
Following are the types of supervised learning 
A.  classification 
B.  regression 
C.  subgroup discovery 
D.  all of the above 
Answer» D. all of the above 
17. 
Type of matrix decomposition model is 
A.  descriptive model 
B.  predictive model 
C.  logical model 
D.  none of the above 
Answer» A. descriptive model 
18. 
Following is powerful distance metrics used by Geometric model 
A.  euclidean distance 
B.  manhattan distance 
C.  both a and b?? 
D.  square distance 
Answer» C. both a and b?? 
19. 
The output of training process in machine learning is 
A.  machine learning model 
B.  machine learning algorithm 
C.  null 
D.  accuracy 
Answer» A. machine learning model 
20. 
A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Here feature type is

A.  nominal 
B.  ordinal 
C.  categorical 
D.  boolean 
Answer» B. ordinal 
21. 
PCA is 
A.  forward feature selection 
B.  backword feature selection 
C.  feature extraction 
D.  all of the above 
Answer» C. feature extraction 
22. 
Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model. 
A.  true 
B.  false 
Answer» A. true 
23. 
Which of the following techniques would perform better for reducing dimensions of a data set? 
A.  removing columns which have too many missing values 
B.  removing columns which have high variance in data 
C.  removing columns with dissimilar data trends 
D.  none of these 
Answer» A. removing columns which have too many missing values 
24. 
Supervised learning and unsupervised clustering both require which is correct according to the statement. 
A.  output attribute. 
B.  hidden attribute. 
C.  input attribute. 
D.  categorical attribute 
Answer» C. input attribute. 
25. 
What characterize is hyperplance in geometrical model of machine learning? 
A.  a plane with 1 dimensional fewer than number of input attributes 
B.  a plane with 2 dimensional fewer than number of input attributes 
C.  a plane with 1 dimensional more than number of input attributes 
D.  a plane with 2 dimensional more than number of input attributes 
Answer» B. a plane with 2 dimensional fewer than number of input attributes 
26. 
Like the probabilistic view, the ________ view allows us to associate a probability of membership with each classification. 
A.  exampler 
B.  deductive 
C.  classical 
D.  inductive 
Answer» D. inductive 
27. 
Database query is used to uncover this type of knowledge. 
A.  deep 
B.  hidden 
C.  shallow 
D.  multidimensional 
Answer» D. multidimensional 
28. 
A person trained to interact with a human expert in order to capture their knowledge. 
A.  knowledge programmer 
B.  knowledge developer r 
C.  knowledge engineer 
D.  knowledge extractor 
Answer» D. knowledge extractor 
29. 
Some telecommunication company wants to segment their customers into distinct groups ,this is an example of 
A.  supervised learning 
B.  reinforcement learning 
C.  unsupervised learning 
D.  data extraction 
Answer» C. unsupervised learning 
30. 
In the example of predicting number of babies based on stork's population ,Number of babies is 
A.  outcome 
B.  feature 
C.  observation 
D.  attribute 
Answer» A. outcome 
31. 
Which learning Requires Self Assessment to identify patterns within data? 
A.  unsupervised learning 
B.  supervised learning 
C.  semisupervised learning 
D.  reinforced learning 
Answer» A. unsupervised learning 
32. 
Select the correct answers for following statements.

A.  both are true 
B.  1 is true and 2 is false 
C.  both are false 
D.  1 is false and 2 is true 
Answer» B. 1 is true and 2 is false 
33. 
The "curse of dimensionality" referes 
A.  all the problems that arise when working with data in the higher dimensions, that did not exist in the lower dimensions. 
B.  all the problems that arise when working with data in the lower dimensions, that did not exist in the higher dimensions. 
C.  all the problems that arise when working with data in the lower dimensions, that did not exist in the lower dimensions. 
D.  all the problems that arise when working with data in the higher dimensions, that did not exist in the higher dimensions. 
Answer» A. all the problems that arise when working with data in the higher dimensions, that did not exist in the lower dimensions. 
34. 
In simple term, machine learning is 
A.  training based on historical data 
B.  prediction to answer a query 
C.  both a and b?? 
D.  automization of complex tasks 
Answer» C. both a and b?? 
35. 
If machine learning model output doesnot involves target variable then that model is called as 
A.  descriptive model 
B.  predictive model 
C.  reinforcement learning 
D.  all of the above 
Answer» A. descriptive model 
36. 
Following are the descriptive models 
A.  clustering 
B.  classification 
C.  association rule 
D.  both a and c 
Answer» D. both a and c 
37. 
Different learning methods does not include? 
A.  memorization 
B.  analogy 
C.  deduction 
D.  introduction 
Answer» D. introduction 
38. 
A measurable property or parameter of the dataset is 
A.  training data 
B.  feature 
C.  test data 
D.  validation data 
Answer» B. feature 
39. 
Feature can be used as a 
A.  binary split 
B.  predictor 
C.  both a and b?? 
D.  none of the above 
Answer» C. both a and b?? 
40. 
It is not necessary to have a target variable for applying dimensionality reduction algorithms 
A.  true 
B.  false 
Answer» A. true 
41. 
The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an unsupervised method2. It searches for the directions that data have the largest variance3. Maximum number of principal components <= number of features4. All principal components are orthogonal to each other 
A.  1 & 2 
B.  2 & 3 
C.  3 & 4 
D.  all of the above 
Answer» D. all of the above 
42. 
Which of the following is a reasonable way to select the number of principal components "k"? 
A.  choose k to be the smallest value so that at least 99% of the varinace is retained.  answer 
B.  choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer). 
C.  choose k to be the largest value so that 99% of the variance is retained. 
D.  use the elbow method 
Answer» A. choose k to be the smallest value so that at least 99% of the varinace is retained.  answer 
43. 
Which of the folllowing is an example of feature extraction? 
A.  construction bag of words from an email 
B.  applying pca to project high dimensional data 
C.  removing stop words 
D.  forward selection 
Answer» B. applying pca to project high dimensional data 
44. 
Prediction is 
A.  the result of application of specific theory or rule in a specific case 
B.  discipline in statistics used to find projections in multidimensional data 
C.  value entered in database by expert 
D.  independent of data 
Answer» A. the result of application of specific theory or rule in a specific case 
45. 
You are given sesimic data and you want to predict next earthquake , this is an example of 
A.  supervised learning 
B.  reinforcement learning 
C.  unsupervised learning 
D.  dimensionality reduction 
Answer» A. supervised learning 
46. 
PCA works better if there is

A.  1 and 2 
B.  2 and 3 
C.  1 and 3 
D.  1,2 and 3 
Answer» C. 1 and 3 
47. 
A student Grade is a variable F1 which takes a value from A,B,C and D. Which of the following is True in the following case? 
A.  variable f1 is an example of nominal variable 
B.  variable f1 is an example of ordinal variable 
C.  it doesn\t belong to any of the mentioned categories 
D.  it belongs to both ordinal and nominal category 
Answer» B. variable f1 is an example of ordinal variable 
48. 
What can be major issue in LeaveOneOutCrossValidation(LOOCV)? 
A.  low variance 
B.  high variance 
C.  faster runtime compared to kfold cross validation 
D.  slower runtime compared to normal validation 
Answer» B. high variance 
49. 
Imagine a NewlyBorn starts to learn walking. It will try to find a suitable policy to learn walking after repeated falling and getting up.specify what type of machine learning is best suited? 
A.  classification 
B.  regression 
C.  kmeans algorithm 
D.  reinforcement learning 
Answer» D. reinforcement learning 
50. 
Support Vector Machine is 
A.  logical model 
B.  proababilistic model 
C.  geometric model 
D.  none of the above 
Answer» C. geometric model 
51. 
In multiclass classification number of classes must be 
A.  less than two 
B.  equals to two 
C.  greater than two 
D.  option 1 and option 2 
Answer» C. greater than two 
52. 
Which of the following can only be used when training data are linearlyseparable? 
A.  linear hardmargin svm 
B.  linear logistic regression 
C.  linear soft margin svm 
D.  the centroid method 
Answer» A. linear hardmargin svm 
53. 
Impact of high variance on the training set ? 
A.  overfitting 
B.  underfitting 
C.  both underfitting & overfitting 
D.  depents upon the dataset 
Answer» A. overfitting 
54. 
What do you mean by a hard margin? 
A.  the svm allows very low error in classification 
B.  the svm allows high amount of error in classification 
C.  both 1 & 2 
D.  none of the above 
Answer» A. the svm allows very low error in classification 
55. 
The effectiveness of an SVM depends upon: 
A.  selection of kernel 
B.  kernel parameters 
C.  soft margin parameter c 
D.  all of the above 
Answer» A. selection of kernel 
56. 
What are support vectors? 
A.  all the examples that have a nonzero weight ??k in a svm 
B.  the only examples necessary to compute f(x) in an svm. 
C.  all of the above 
D.  none of the above 
Answer» C. all of the above 
57. 
A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0. 
A.  true 
B.  false 
C.  sometimes – it can also output intermediate values as well 
D.  can’t say 
Answer» A. true 
58. 
What is the purpose of the Kernel Trick? 
A.  to transform the data from nonlinearly separable to linearly separable 
B.  to transform the problem from regression to classification 
C.  to transform the problem from supervised to unsupervised learning. 
D.  all of the above 
Answer» A. to transform the data from nonlinearly separable to linearly separable 
59. 
Which of the following can only be used when training data are linearlyseparable? 
A.  linear hardmargin svm 
B.  linear logistic regression 
C.  linear soft margin svm 
D.  parzen windows 
Answer» A. linear hardmargin svm 
60. 
The firing rate of a neuron 
A.  determines how strongly the dendrites of the neuron stimulate axons of neighboring neurons 
B.  is more analogous to the output of a unit in a neural net than the output voltage of the neuron 
C.  only changes very slowly, taking a period of several seconds to make large adjustments 
D.  can sometimes exceed 30,000 action potentials per second 
Answer» B. is more analogous to the output of a unit in a neural net than the output voltage of the neuron 
61. 
Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target? 
A.  aucroc 
B.  accuracy 
C.  logloss 
D.  meansquarederror 
Answer» D. meansquarederror 
62. 
The cost parameter in the SVM means: 
A.  the number of crossvalidations to be made 
B.  the kernel to be used 
C.  the tradeoff between misclassification and simplicity of the model 
D.  none of the above 
Answer» C. the tradeoff between misclassification and simplicity of the model 
63. 
The kernel trick 
A.  can be applied to every classification algorithm 
B.  is commonly used for dimensionality reduction 
C.  changes ridge regression so we solve a d ?? d linear system instead of an n ?? n system, given n sample points with d features 
D.  exploits the fact that in many learning algorithms, the weights can be written as a linear combination of input points 
Answer» D. exploits the fact that in many learning algorithms, the weights can be written as a linear combination of input points 
64. 
How does the biasvariance decomposition of a ridge regression estimator compare with that of ordinary least squares regression? 
A.  ridge has larger bias, larger variance 
B.  ridge has smaller bias, larger variance 
C.  ridge has larger bias, smaller variance 
D.  ridge has smaller bias, smaller variance 
Answer» C. ridge has larger bias, smaller variance 
65. 
Which of the following are real world applications of the SVM? 
A.  text and hypertext categorization 
B.  image classification 
C.  clustering of news articles 
D.  all of the above 
Answer» D. all of the above 
66. 
How can SVM be classified? 
A.  it is a model trained using unsupervised learning. it can be used for classification and regression. 
B.  it is a model trained using unsupervised learning. it can be used for classification but not for regression. 
C.  it is a model trained using supervised learning. it can be used for classification and regression. 
D.  t is a model trained using unsupervised learning. it can be used for classification but not for regression. 
Answer» C. it is a model trained using supervised learning. it can be used for classification and regression. 
67. 
Which of the following can help to reduce overfitting in an SVM classifier? 
A.  use of slack variables 
B.  highdegree polynomial features 
C.  normalizing the data 
D.  setting a very low learning rate 
Answer» A. use of slack variables 
68. 
Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. Which of the following is best option would you more likely to consider iterating SVM next time? 
A.  you want to increase your data points 
B.  you want to decrease your data points 
C.  you will try to calculate more variables 
D.  you will try to reduce the features 
Answer» C. you will try to calculate more variables 
69. 
What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It’s a similarity function 
A.  1 
B.  2 
C.  1 and 2 
D.  none of these 
Answer» C. 1 and 2 
70. 
You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which is false. 
A.  this is an instance of overfitting 
B.  this is an instance of underfitting 
C.  the training was not well regularized 
D.  the training and testing examples are sampled from different distributions 
Answer» B. this is an instance of underfitting 
71. 
Suppose your model is demonstrating high variance across the different training sets. Which of the following is NOT valid way to try and reduce the variance? 
A.  increase the amount of traning data in each traning set 
B.  improve the optimization algorithm being used for error minimization. 
C.  decrease the model complexity 
D.  reduce the noise in the training data 
Answer» B. improve the optimization algorithm being used for error minimization. 
72. 
Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify? 
A.  the model would consider even far away points from hyperplane for modeling 
B.  the model would consider only the points close to the hyperplane for modeling 
C.  the model would not be affected by distance of points from hyperplane for modeling 
D.  none of the above 
Answer» B. the model would consider only the points close to the hyperplane for modeling 
73. 
We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other

A.  1 
B.  1 and 2 
C.  1 and 3 
D.  2 and 3 
Answer» B. 1 and 2 
74. 
Wrapper methods are hyperparameter selection methods that 
A.  should be used whenever possible because they are computationally efficient 
B.  should be avoided unless there are no other options because they are always prone to overfitting. 
C.  are useful mainly when the learning machines are “black boxes” 
D.  should be avoided altogether. 
Answer» C. are useful mainly when the learning machines are “black boxes” 
75. 
Which of the following methods can not achieve zero training error on any linearly separable dataset? 
A.  decision tree 
B.  15nearest neighbors 
C.  hardmargin svm 
D.  perceptron 
Answer» B. 15nearest neighbors 
76. 
Suppose we train a hardmargin linear SVM on n > 100 data points in R2, yielding a hyperplane with exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)? 
A.  2 
B.  3 
C.  n 
D.  n+1 
Answer» D. n+1 
77. 
Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearly

A.  s1 ⚂ s2 
B.  s1 may not be a subset of s2 
C.  w1 = w2 
D.  all of the above 
Answer» B. s1 may not be a subset of s2 
78. 
Which statement about outliers is true? 
A.  outliers should be part of the training dataset but should not be present in the test data 
B.  outliers should be identified and removed from a dataset 
C.  the nature of the problem determines how outliers are used 
D.  outliers should be part of the test dataset but should not be present in the training data 
Answer» C. the nature of the problem determines how outliers are used 
79. 
If TP=9 FP=6 FN=26 TN=70 then Error rate will be 
A.  45 percentage 
B.  99 percentage 
C.  28 percentage 
D.  20 perentage 
Answer» C. 28 percentage 
80. 
Imagine, you are solving a classification problems with highly imbalanced class. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case?

A.  1 and 3 
B.  1 and 4 
C.  2 and 3 
D.  2 and 4 
Answer» A. 1 and 3 
81. 
he minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM’s? 
A.  large datasets 
B.  small datasets 
C.  medium sized datasets 
D.  size does not matter 
Answer» A. large datasets 
82. 
Perceptron Classifier is 
A.  unsupervised learning algorithm 
B.  semisupervised learning algorithm 
C.  supervised learning algorithm 
D.  soft margin classifier 
Answer» C. supervised learning algorithm 
83. 
Type of dataset available in Supervised Learning is 
A.  unlabeled dataset 
B.  labeled dataset 
C.  csv file 
D.  excel file 
Answer» B. labeled dataset 
84. 
which among the following is the most appropriate kernel that can be used with SVM to separate the classes. 
A.  linear kernel 
B.  gaussian rbf kernel 
C.  polynomial kernel 
D.  option 1 and option 3 
Answer» B. gaussian rbf kernel 
85. 
The SVMs are less effective when 
A.  the data is linearly separable 
B.  the data is clean and ready to use 
C.  the data is noisy and contains overlapping points 
D.  option 1 and option 2 
Answer» C. the data is noisy and contains overlapping points 
86. 
Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify? 
A.  the model would consider even far away points from hyperplane for modeling 
B.  the model would consider only the points close to the hyperplane for modeling 
C.  the model would not be affected by distance of points from hyperplane for modeling 
D.  opton 1 and option 2 
Answer» B. the model would consider only the points close to the hyperplane for modeling 
87. 
What is the precision value for following confusion matrix of binary classification? 
A.  0.91 
B.  0.09 
C.  0.9 
D.  0.95 
Answer» B. 0.09 
88. 
Which of the following are components of generalization Error? 
A.  bias 
B.  vaiance 
C.  both of them 
D.  none of them 
Answer» C. both of them 
89. 
Which of the following is not a kernel method in SVM? 
A.  linear kernel 
B.  polynomial kernel 
C.  rbf kernel 
D.  nonlinear kernel 
Answer» A. linear kernel 
90. 
During the treatement of cancer patients , the doctor needs to be very careful about which patients need to be given chemotherapy.Which metric should we use in order to decide the patients who should given chemotherapy? 
A.  precision 
B.  recall 
C.  call 
D.  score 
Answer» A. precision 
91. 
Which one of the following is suitable? 1. When the hypothsis space is richer, overfitting is more likely. 2. when the feature space is larger , overfitting is more likely. 
A.  true, false 
B.  false, true 
C.  true,true 
D.  false,false 
Answer» C. true,true 
92. 
Which of the following is a categorical data? 
A.  branch of bank 
B.  expenditure in rupees 
C.  prize of house 
D.  weight of a person 
Answer» A. branch of bank 
93. 
The soft margin SVM is more preferred than the hardmargin SVM when 
A.  the data is linearly seperable 
B.  the data is noisy and contains overlapping points 
C.  the data is not noisy and linearly seperable 
D.  the data is noisy and linearly seperable 
Answer» B. the data is noisy and contains overlapping points 
94. 
In SVM which has quadratic kernel function of polynomial degree 2 that has slack variable C as one hyper paramenter. What would happen if we use very large value for C 
A.  we can still classify the data correctly for given setting of hyper parameter c 
B.  we can not classify the data correctly for given setting of hyper parameter c 
C.  we can not classify the data at all 
D.  data can be classified correctly without any impact of c 
Answer» A. we can still classify the data correctly for given setting of hyper parameter c 
95. 
In SVM, RBF kernel with appropriate parameters to perform binary classification where the data is nonlinearly seperable. In this scenario 
A.  the decision boundry in the transformed feature space in nonlinear 
B.  the decision boundry in the transformed feature space in linear 
C.  the decision boundry in the original feature space in not considered 
D.  the decision boundry in the original feature space in linear 
Answer» B. the decision boundry in the transformed feature space in linear 
96. 
Which of the following is true about SVM? 1. Kernel function map low dimensional data to high dimensional space. 2. It is a similarity Function 
A.  1 is true, 2 is false 
B.  1 is false, 2 is true 
C.  1 is true, 2 is true 
D.  1 is false, 2 is false 
Answer» C. 1 is true, 2 is true 
97. 
What is the Accuracy in percentage based on following confusion matrix of three class classification.

A.  0.75 
B.  0.97 
C.  0.95 
D.  0.85 
Answer» B. 0.97 
98. 
Which of the following method is used for multiclass classification? 
A.  one vs rest 
B.  loocv 
C.  all vs one 
D.  one vs another 
Answer» A. one vs rest 
99. 
Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either? 
A.  0.32 
B.  0.2 
C.  0.44 
D.  0.56 
Answer» C. 0.44 
100. 
A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many maximum possible different examples are there? 
A.  12 
B.  24 
C.  48 
D.  72 
Answer» D. 72 
Done Reading?