![Mcqmate logo](https://mcqmate.com/public/images/logos/logo-black.png)
![Mcqmate logo](https://mcqmate.com/public/images/logos/logo-white.png)
McqMate
201. |
This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration Select one: |
A. | k-means clustering |
B. | conceptual clustering |
C. | expectation maximization |
D. | agglomerative clustering |
Answer» A. k-means clustering |
202. |
Which one of the following is the main reason for pruning a Decision Tree? |
A. | to save computing time during testing |
B. | to save space for storing the decision tree |
C. | to make the training set error smaller |
D. | to avoid overfitting the training set |
Answer» D. to avoid overfitting the training set |
203. |
You've just finished training a decision tree for spam classification, and it is getting abnormally bad performance on both your training and test sets. You know that your implementation has no bugs, so what could be causing the problem? |
A. | your decision trees are too shallow. |
B. | you need to increase the learning rate. |
C. | you are overfitting. |
D. | incorrect data |
Answer» A. your decision trees are too shallow. |
204. |
The K-means algorithm: |
A. | requires the dimension of the feature space to be no bigger than the number of samples |
B. | has the smallest value of the objective function when k = 1 |
C. | minimizes the within class variance for a given number of clusters |
D. | converges to the global optimum if and only if the initial means are chosen as some of the samples themselves |
Answer» C. minimizes the within class variance for a given number of clusters |
205. |
Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?
|
A. | 1 and 2 |
B. | 1 and 3 |
C. | 2 and 3 |
D. | 1, 2 and 3 |
Answer» D. 1, 2 and 3 |
206. |
In which of the following cases will K-Means clustering fail to give good results?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 2 and 4 |
D. | 1, 2 and 4 |
Answer» D. 1, 2 and 4 |
207. |
Hierarchical clustering is slower than non-hierarchical clustering? |
A. | true |
B. | false |
C. | depends on data |
D. | cannot say |
Answer» A. true |
208. |
High entropy means that the partitions in classification are |
A. | pure |
B. | not pure |
C. | useful |
D. | useless |
Answer» B. not pure |
209. |
Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce clusters of many different sizes and shapes. Which of the following methods is the most appropriate? |
A. | decision trees |
B. | density-based clustering |
C. | model-based clustering |
D. | k-means clustering |
Answer» B. density-based clustering |
210. |
The main disadvantage of maximum likelihood methods is that they are _____ |
A. | mathematically less folded |
B. | mathematically less complex |
C. | mathematically less complex |
D. | computationally intense |
Answer» D. computationally intense |
211. |
The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods. |
A. | true |
B. | false |
C. | - |
D. | - |
Answer» A. true |
212. |
Which Statement is not true statement. |
A. | k-means clustering is a linear clustering algorithm. |
B. | k-means clustering aims to partition n observations into k clusters |
C. | k-nearest neighbor is same as k-means |
D. | k-means is sensitive to outlier |
Answer» C. k-nearest neighbor is same as k-means |
213. |
what is Feature scaling done before applying K-Mean algorithm? |
A. | in distance calculation it will give the same weights for all features |
B. | you always get the same clusters. if you use or don\t use feature scaling |
C. | in manhattan distance it is an important step but in euclidian it is not |
D. | none of these |
Answer» A. in distance calculation it will give the same weights for all features |
214. |
With Bayes theorem the probability of hypothesis H¾ specified by P(H) ¾ is referred to as |
A. | a conditional probability |
B. | an a priori probability |
C. | a bidirectional probability |
D. | a posterior probability |
Answer» B. an a priori probability |
215. |
The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%.
|
A. | 0.0398 |
B. | 0.0389 |
C. | 0.0368 |
D. | 0.0396 |
Answer» D. 0.0396 |
216. |
What is the naïve assumption in a Naïve Bayes Classifier. |
A. | all the classes are independent of each other |
B. | all the features of a class are independent of each other |
C. | the most probable feature for a class is the most important feature to be cinsidered for classification |
D. | all the features of a class are conditionally dependent on each other |
Answer» D. all the features of a class are conditionally dependent on each other |
217. |
Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either? |
A. | 0.32 |
B. | 0.2 |
C. | 0.44 |
D. | 0.56 |
Answer» A. 0.32 |
218. |
What is the actual number of independent parameters which need to be estimated in P dimensional Gaussian distribution model? |
A. | p |
B. | 2p |
C. | p(p+1)/2 |
D. | p(p+3)/2 |
Answer» D. p(p+3)/2 |
219. |
Give the correct Answer for following statements.
|
A. | 1 is true, 2 is false |
B. | 1 is false, 2 is true |
C. | 1 is true, 2 is true |
D. | 1 is false, 2 is false |
Answer» C. 1 is true, 2 is true |
220. |
Which of the following quantities are minimized directly or indirectly during parameter estimation in Gaussian distribution Model? |
A. | negative log-likelihood |
B. | log-liklihood |
C. | cross entropy |
D. | residual sum of square |
Answer» A. negative log-likelihood |
221. |
Consider the following dataset. x,y,z are the features and T is a class(1/0). Classify the test data (0,0,1) as values of x,y,z respectively. |
A. | 0 |
B. | 1 |
C. | 0.1 |
D. | 0.9 |
Answer» B. 1 |
222. |
Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that Select one: |
A. | y is false when x is known to be false. |
B. | y is true when x is known to be true. |
C. | x is true when y is known to be true |
D. | x is false when y is known to be false. |
Answer» B. y is true when x is known to be true. |
223. |
Which of the following statements about Naive Bayes is incorrect? |
A. | attributes are equally important. |
B. | attributes are statistically dependent of one another given the class value. |
C. | attributes are statistically independent of one another given the class value. |
D. | attributes can be nominal or numeric |
Answer» B. attributes are statistically dependent of one another given the class value. |
224. |
How the entries in the full joint probability distribution can be calculated? |
A. | using variables |
B. | using information |
C. | both using variables & information |
D. | none of the mentioned |
Answer» B. using information |
225. |
How many terms are required for building a bayes model? |
A. | 1 |
B. | 2 |
C. | 3 |
D. | 4 |
Answer» C. 3 |
226. |
Skewness of Normal distribution is ___________ |
A. | negative |
B. | positive |
C. | 0 |
D. | undefined |
Answer» C. 0 |
227. |
The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you? |
A. | the attributes are not linearly related. |
B. | as the value of one attribute increases the value of the second attribute also increases |
C. | as the value of one attribute decreases the value of the second attribute increases |
D. | the attributes show a linear relationship |
Answer» C. as the value of one attribute decreases the value of the second attribute increases |
228. |
8 observations are clustered into 3 clusters using K-Means clustering algorithm. After first iteration clusters,
|
A. | c1: (4,4), c2: (2,2), c3: (7,7) |
B. | c1: (6,6), c2: (4,4), c3: (9,9) |
C. | c1: (2,2), c2: (0,0), c3: (5,5) |
D. | c1: (4,4), c2: (3,3), c3: (7,7) |
Answer» D. c1: (4,4), c2: (3,3), c3: (7,7) |
229. |
In Naive Bayes equation P(C / X)= (P(X / C) *P(C) ) / P(X) which part considers "likelihood"? |
A. | p(x/c) |
B. | p(c/x) |
C. | p(c) |
D. | p(x) |
Answer» A. p(x/c) |
230. |
Which of the following option is / are correct regarding benefits of ensemble model? 1. Better performance
|
A. | 1 and 3 |
B. | 2 and 3 |
C. | 1, 2 and 3 |
D. | 1 and 2 |
Answer» D. 1 and 2 |
231. |
What is back propagation? |
A. | it is another name given to the curvy function in the perceptron |
B. | it is the transmission of error back through the network to adjust the inputs |
C. | it is the transmission of error back through the network to allow weights to be adjusted so that the network can learn |
D. | none of the mentioned |
Answer» A. it is another name given to the curvy function in the perceptron |
232. |
Which of the following is an application of NN (Neural Network)? |
A. | sales forecasting |
B. | data validation |
C. | risk management |
D. | all of the mentioned |
Answer» D. all of the mentioned |
233. |
Neural Networks are complex ______________ with many parameters. |
A. | linear functions |
B. | nonlinear functions |
C. | discrete functions |
D. | exponential functions |
Answer» A. linear functions |
234. |
Having multiple perceptrons can actually solve the XOR problem satisfactorily: this is because each perceptron can partition off a linear part of the space itself, and they can then combine their results. |
A. | true – this works always, and these multiple perceptrons learn to classify even complex problems |
B. | false – perceptrons are mathematically incapable of solving linearly inseparable functions, no matter what you do |
C. | true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded |
D. | false – just having a single perceptron is enough |
Answer» C. true – perceptrons can do this but are unable to learn to do it – they have to be explicitly hand-coded |
235. |
Which one of the following is not a major strength of the neural network approach? |
A. | neural network learning algorithms are guaranteed to converge to an optimal solution |
B. | neural networks work well with datasets containing noisy data |
C. | neural networks can be used for both supervised learning and unsupervised clustering |
D. | neural networks can be used for applications that require a time element to be included in the data |
Answer» A. neural network learning algorithms are guaranteed to converge to an optimal solution |
236. |
The network that involves backward links from output to the input and hidden layers is called |
A. | self organizing maps |
B. | perceptrons |
C. | recurrent neural network |
D. | multi layered perceptron |
Answer» C. recurrent neural network |
237. |
Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?
|
A. | 1 |
B. | 2 |
C. | 3&4 |
D. | 1,2,3&4 |
Answer» D. 1,2,3&4 |
238. |
What is back propagation?
|
A. | a |
B. | b |
C. | c |
D. | b&c |
Answer» C. c |
239. |
In an election for the head of college, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes.which of the following ensembles method works similar to the discussed elction Procedure? |
A. | ??bagging |
B. | boosting |
C. | stacking |
D. | randomization |
Answer» A. ??bagging |
240. |
What is the sequence of the following tasks in a perceptron?
|
A. | 1, 4, 3, 2 |
B. | 3, 1, 2, 4 |
C. | 4, 3, 2, 1 |
D. | 1, 2, 3, 4 |
Answer» A. 1, 4, 3, 2 |
241. |
In which neural net architecture, does weight sharing occur?
|
A. | recurrent neural network |
B. | convolutional neural network |
C. | . fully connected neural network |
D. | both a and b |
Answer» D. both a and b |
242. |
Which of the following are correct statement(s) about stacking?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 1 and 3 |
D. | 1,2 and 3 |
Answer» C. 1 and 3 |
243. |
Given above is a description of a neural network. When does a neural network model become a deep learning model? |
A. | when you add more hidden layers and increase depth of neural network |
B. | when there is higher dimensionality of data |
C. | when the problem is an image recognition problem |
D. | when there is lower dimensionality of data |
Answer» A. when you add more hidden layers and increase depth of neural network |
244. |
What are the steps for using a gradient descent algorithm?
|
A. | 1, 2, 3, 4, 5 |
B. | 4, 3, 1, 5, 2 |
C. | 3, 2, 1, 5, 4 |
D. | 5, 4, 3, 2, 1 |
Answer» B. 4, 3, 1, 5, 2 |
245. |
A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of proportionality being equal to 2. The inputs are 4, 10, 10 and 30 respectively. What will be the output? |
A. | 238 |
B. | 76 |
C. | 248 |
D. | 348 |
Answer» D. 348 |
246. |
Increase in size of a convolutional kernel would necessarily increase the performance of a convolutional network. |
A. | true |
B. | false |
Answer» B. false |
247. |
The F-test |
A. | an omnibus test |
B. | considers the reduction in error when moving from the complete model to the reduced model |
C. | considers the reduction in error when moving from the reduced model to the complete model |
D. | can only be conceptualized as a reduction in error |
Answer» C. considers the reduction in error when moving from the reduced model to the complete model |
248. |
What is true about an ensembled classifier?
|
A. | 1 and 2 |
B. | 1 and 3 |
C. | 2 and 3 |
D. | all of the above |
Answer» D. all of the above |
249. |
Which of the following option is / are correct regarding benefits of ensemble model?
|
A. | 1 and 3 |
B. | 2 and 3 |
C. | 1 and 2 |
D. | 1, 2 and 3 |
Answer» C. 1 and 2 |
250. |
Which of the following can be true for selecting base learners for an ensemble?
|
A. | 1 |
B. | 2 |
C. | 1 and 3 |
D. | 1, 2 and 3 |
Answer» D. 1, 2 and 3 |
251. |
True or False: Ensemble learning can only be applied to supervised learning methods. |
A. | true |
B. | false |
Answer» B. false |
252. |
True or False: Ensembles will yield bad results when there is significant diversity among the models. Note: All individual models have meaningful and good predictions. |
A. | true |
B. | false |
Answer» B. false |
253. |
Which of the following is / are true about weak learners used in ensemble model?
|
A. | 1 and 2 |
B. | 1 and 3 |
C. | 2 and 3 |
D. | none of these |
Answer» A. 1 and 2 |
254. |
True or False: Ensemble of classifiers may or may not be more accurate than any of its individual model. |
A. | true |
B. | false |
Answer» A. true |
255. |
If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance? |
A. | yes |
B. | no |
C. | can’t say |
Answer» B. no |
256. |
Generally, an ensemble method works better, if the individual base models have ____________? Note: Suppose each individual base models have accuracy greater than 50%. |
A. | less correlation among predictions |
B. | high correlation among predictions |
C. | correlation does not have any impact on ensemble output |
D. | none of the above |
Answer» A. less correlation among predictions |
257. |
In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure?
|
A. | bagging |
B. | boosting |
C. | a or b |
D. | none of these |
Answer» A. bagging |
258. |
Suppose there are 25 base classifiers. Each classifier has error rates of e = 0.35.
|
A. | 0.05 |
B. | 0.06 |
C. | 0.07 |
D. | 0.09 |
Answer» B. 0.06 |
259. |
In machine learning, an algorithm (or learning algorithm) is said to be unstable if a small change in training data cause the large change in the learned classifiers. True or False: Bagging of unstable classifiers is a good idea |
A. | true |
B. | false |
Answer» A. true |
260. |
Which of the following parameters can be tuned for finding good ensemble model in bagging based algorithms?
|
A. | 1 and 3 |
B. | 2 and 3 |
C. | 1 and 2 |
D. | all of above |
Answer» D. all of above |
261. |
How is the model capacity affected with dropout rate (where model capacity means the ability of a neural network to approximate complex functions)? |
A. | model capacity increases in increase in dropout rate |
B. | model capacity decreases in increase in dropout rate |
C. | model capacity is not affected on increase in dropout rate |
D. | none of these |
Answer» B. model capacity decreases in increase in dropout rate |
262. |
True or False: Dropout is computationally expensive technique w.r.t. bagging |
A. | true |
B. | false |
Answer» B. false |
263. |
Suppose, you want to apply a stepwise forward selection method for choosing the best models for an ensemble model. Which of the following is the correct order of the steps?
|
A. | 1-2-3 |
B. | 1-3-4 |
C. | 2-1-3 |
D. | none of above |
Answer» D. none of above |
264. |
Suppose, you have 2000 different models with their predictions and want to ensemble predictions of best x models. Now, which of the following can be a possible method to select the best x models for an ensemble? |
A. | step wise forward selection |
B. | step wise backward elimination |
C. | both |
D. | none of above |
Answer» C. both |
265. |
Below are the two ensemble models:
|
A. | e1 |
B. | e2 |
C. | any of e1 and e2 |
D. | none of these |
Answer» B. e2 |
266. |
True or False: In boosting, individual base learners can be parallel. |
A. | true |
B. | false |
Answer» B. false |
267. |
Which of the following is true about bagging?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 1 and 3 |
D. | all of these |
Answer» C. 1 and 3 |
268. |
Suppose you are using stacking with n different machine learning algorithms with k folds on data.
|
A. | you will have only k features after the first stage |
B. | you will have only m features after the first stage |
C. | you will have k+m features after the first stage |
D. | you will have k*n features after the first stage |
Answer» B. you will have only m features after the first stage |
269. |
Which of the following is the difference between stacking and blending? |
A. | stacking has less stable cv compared to blending |
B. | in blending, you create out of fold prediction |
C. | stacking is simpler than blending |
D. | none of these |
Answer» D. none of these |
270. |
Which of the following can be one of the steps in stacking?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 1 and 3 |
D. | all of above |
Answer» A. 1 and 2 |
271. |
Q25. Which of the following are advantages of stacking?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 1 and 3 |
D. | all of the above |
Answer» A. 1 and 2 |
272. |
Which of the following are correct statement(s) about stacking?
|
A. | 1 and 2 |
B. | 2 and 3 |
C. | 1 and 3 |
D. | all of above |
Answer» C. 1 and 3 |
273. |
Which of the following is true about weighted majority votes?
|
A. | 1 and 3 |
B. | 2 and 3 |
C. | 1 and 2 |
D. | 1, 2 and 3 |
Answer» D. 1, 2 and 3 |
274. |
Which of the following is true about averaging ensemble? |
A. | it can only be used in classification problem |
B. | it can only be used in regression problem |
C. | it can be used in both classification as well as regression |
D. | none of these |
Answer» C. it can be used in both classification as well as regression |
275. |
How can we assign the weights to output of different models in an ensemble?
|
A. | 1 and 2 |
B. | 1 and 3 |
C. | 2 and 3 |
D. | all of above |
Answer» D. all of above |
276. |
Suppose you are given ‘n’ predictions on test data by ‘n’ different models (M1, M2, …. Mn) respectively. Which of the following method(s) can be used to combine the predictions of these models?
|
A. | 1, 3 and 4 |
B. | 1,3 and 6 |
C. | 1,3, 4 and 6 |
D. | all of above |
Answer» D. all of above |
277. |
In an election, N candidates are competing against each other and people are voting for either of the candidates. Voters don’t communicate with each other while casting their votes. Which of the following ensemble method works similar to above-discussed election procedure? Hint: Persons are like base models of ensemble method. |
A. | bagging |
B. | 1,3 and 6 |
C. | a or b |
D. | none of these |
Answer» A. bagging |
278. |
If you use an ensemble of different base models, is it necessary to tune the hyper parameters of all base models to improve the ensemble performance? |
A. | yes |
B. | no |
C. | can’t say |
Answer» B. no |
279. |
Which of the following is NOT supervised learning? |
A. | pca |
B. | decision tree |
C. | linear regression |
D. | naive bayesian |
Answer» A. pca |
280. |
According to , it's a key success factor for the survival and evolution of all species. |
A. | claude shannon\s theory |
B. | gini index |
C. | darwin's theory |
D. | none of above |
Answer» C. darwin's theory |
281. |
How can you avoid overfitting ? |
A. | by using a lot of data |
B. | by using inductive machine learning |
C. | by using validation only |
D. | none of above |
Answer» A. by using a lot of data |
282. |
What are the popular algorithms of Machine Learning? |
A. | decision trees and neural networks (back propagation) |
B. | probabilistic networks and nearest neighbor |
C. | support vector machines |
D. | all |
Answer» D. all |
283. |
What is Training set? |
A. | training set is used to test the accuracy of the hypotheses generated by the learner. |
B. | a set of data is used to discover the potentially predictive relationship. |
C. | both a & b |
D. | none of above |
Answer» B. a set of data is used to discover the potentially predictive relationship. |
284. |
Common deep learning applications include |
A. | image classification, real-time visual tracking |
B. | autonomous car driving, logistic optimization |
C. | bioinformatics, speech recognition |
D. | all above |
Answer» D. all above |
285. |
what is the function of Supervised Learning? |
A. | classifications, predict time series, annotate strings |
B. | speech recognition, regression |
C. | both a & b |
D. | none of above |
Answer» C. both a & b |
286. |
Commons unsupervised applications include |
A. | object segmentation |
B. | similarity detection |
C. | automatic labeling |
D. | all above |
Answer» D. all above |
287. |
Reinforcement learning is particularly efficient when . |
A. | the environment is not completely deterministic |
B. | it\s often very dynamic |
C. | it\s impossible to have a precise error measure |
D. | all above |
Answer» D. all above |
288. |
if there is only a discrete number of possible outcomes (called categories), the process becomes a . |
A. | regression |
B. | classification. |
C. | modelfree |
D. | categories |
Answer» B. classification. |
289. |
Which of the following are supervised learning applications |
A. | spam detection, pattern detection, natural language processing |
B. | image classification, real-time visual tracking |
C. | autonomous car driving, logistic optimization |
D. | bioinformatics, speech recognition |
Answer» A. spam detection, pattern detection, natural language processing |
290. |
During the last few years, many algorithms have been applied to deep neural networks to learn the best policy for playing Atari video games and to teach an agent how to associate the right action with an input representing the state. |
A. | logical |
B. | classical |
C. | classification |
D. | none of above |
Answer» D. none of above |
291. |
Which of the following sentence is correct? |
A. | machine learning relates with the study, design and |
B. | data mining can be defined as the process in which the |
C. | both a & b |
D. | none of the above |
Answer» C. both a & b |
292. |
What is Overfitting in Machine learning? |
A. | when a statistical model describes random error or noise instead of underlying relationship overfitting occurs. |
B. | robots are programed so that they can perform the task based on data they gather from sensors. |
C. | while involving the process of learning overfitting occurs. |
D. | a set of data is used to discover the potentially predictive relationship |
Answer» A. when a statistical model describes random error or noise instead of underlying relationship overfitting occurs. |
293. |
What is Test set? |
A. | test set is used to test the accuracy of the hypotheses generated by the learner. |
B. | it is a set of data is used to discover the potentially predictive relationship. |
C. | both a & b |
D. | none of above |
Answer» A. test set is used to test the accuracy of the hypotheses generated by the learner. |
294. |
is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value |
A. | removing the whole line |
B. | creating sub-model to predict those features |
C. | using an automatic strategy to input them according to the other known values |
D. | all above |
Answer» B. creating sub-model to predict those features |
295. |
How it's possible to use a different placeholder through the parameter . |
A. | regression |
B. | classification |
C. | random_state |
D. | missing_values |
Answer» D. missing_values |
296. |
If you need a more powerful scaling feature, with a superior control on outliers and the possibility to select a quantile range, there's also the class . |
A. | robustscaler |
B. | dictvectorizer |
C. | labelbinarizer |
D. | featurehasher |
Answer» A. robustscaler |
297. |
scikit-learn also provides a class for per- sample normalization, Normalizer. It can apply to each element of a dataset |
A. | max, l0 and l1 norms |
B. | max, l1 and l2 norms |
C. | max, l2 and l3 norms |
D. | max, l3 and l4 norms |
Answer» B. max, l1 and l2 norms |
298. |
There are also many univariate methods that can be used in order to select the best features according to specific criteria based on . |
A. | f-tests and p-values |
B. | chi-square |
C. | anova |
D. | all above |
Answer» A. f-tests and p-values |
299. |
Which of the following selects only a subset of features belonging to a certain percentile |
A. | selectpercentile |
B. | featurehasher |
C. | selectkbest |
D. | all above |
Answer» A. selectpercentile |
300. |
performs a PCA with non-linearly separable data sets. |
A. | sparsepca |
B. | kernelpca |
C. | svd |
D. | none of the mentioned |
Answer» B. kernelpca |
Done Reading?