100+ Bigdata Solved MCQs

These multiple-choice questions (MCQs) are designed to enhance your knowledge and understanding in the following areas: Bachelor of Business Administration in Computer Applications (BBA [CA]) .

Chapters

Take a Test

51.	Which of the following can be used to impute data sets based only on information in the training set?
A.	postprocess
B.	preProcess
C.	process
D.	All of the Mentioned
Answer» B. preProcess

discuss

52.	Which of the following model model include a backwards elimination feature selection routine?
A.	MCV
B.	MARS
C.	MCRS
D.	All of the Mentioned
Answer» B. MARS

discuss

53.	Which of the following is a categorical outcome?
A.	RMSE
B.	RSquared
C.	Accuracy
D.	All of the Mentioned
Answer» C. Accuracy

discuss

54.	Which of the following function provides unsupervised prediction ?
A.	cl_forecast
B.	cl_nowcast
C.	cl_precast
D.	None of the Mentioned
Answer» D. None of the Mentioned

discuss

55.	What is true about Machine Learning?
A.	Machine Learning (ML) is that field of computer science
B.	ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method.
C.	The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention.
D.	All of the above
Answer» D. All of the above

discuss

56.	ML is a field of AI consisting of learning algorithms that?
A.	Improve their performance
B.	At executing some task
C.	Over time with experience
D.	All of the above
Answer» D. All of the above

discuss

57.	p → 0q is not a?
A.	hack clause
B.	horn clause
C.	structural clause
D.	system clause
Answer» B. horn clause

discuss

58.	The action _______ of a robot arm specify to Place block A on block B.
A.	STACK(A,B)
B.	LIST(A,B)
C.	QUEUE(A,B)
D.	ARRAY(A,B)
Answer» A. STACK(A,B)

discuss

59.	A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.
A.	bottow-up parser
B.	top parser
C.	top-down parser
D.	bottom parser
Answer» C. top-down parser

discuss

60.	A model of language consists of the categories which does not include ________.
A.	System Unit
B.	structural units.
C.	data units
D.	empirical units
Answer» B. structural units.

discuss

61.	Different learning methods does not include?
A.	Introduction
B.	Analogy
C.	Deduction
D.	Memorization
Answer» A. Introduction

discuss

62.	The model will be trained with data in one single batch is known as ?
A.	Batch learning
B.	Offline learning
C.	Both A and B
D.	None of the above
Answer» C. Both A and B

discuss

63.	Which of the following are ML methods?
A.	based on human supervision
B.	supervised Learning
C.	semi-reinforcement Learning
D.	All of the above
Answer» A. based on human supervision

discuss

64.	In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?
A.	mini-batches
B.	optimizedparameters
C.	hyperparameters
D.	superparameters
Answer» C. hyperparameters

discuss ⁽¹⁾

65.	Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?
A.	Decision Tree
B.	Regression
C.	Classification
D.	Random Forest
Answer» D. Random Forest

discuss

66.	To find the minimum or the maximum of a function, we set the gradient to zero because:
A.	The value of the gradient at extrema of a function is always zero
B.	Depends on the type of problem
C.	Both A and B
D.	None of the above
Answer» A. The value of the gradient at extrema of a function is always zero

discuss

67.	Which of the following is a disadvantage of decision trees?
A.	Factor analysis
B.	Decision trees are robust to outliers
C.	Decision trees are prone to be overfit
D.	None of the above
Answer» C. Decision trees are prone to be overfit

discuss

68.	How do you handle missing or corrupted data in a dataset?
A.	Drop missing rows or columns
B.	Replace missing values with mean/median/mode
C.	Assign a unique category to missing values
D.	All of the above
Answer» D. All of the above

discuss

69.	When performing regression or classification, which of the following is the correct way to preprocess the data?
A.	Normalize the data -> PCA -> training
B.	PCA -> normalize PCA output -> training
C.	Normalize the data -> PCA -> normalize PCA output -> training
D.	None of the above
Answer» A. Normalize the data -> PCA -> training

discuss

70.	Which of the following statements about regularization is not correct?
A.	Using too large a value of lambda can cause your hypothesis to underfit the data.
B.	Using too large a value of lambda can cause your hypothesis to overfit the data
C.	Using a very large value of lambda cannot hurt the performance of your hypothesis.
D.	None of the above
Answer» D. None of the above

discuss

71.	Which of the following techniques can not be used for normalization in text mining?
A.	Stemming
B.	Lemmatization
C.	Stop Word Removal
D.	None of the above
Answer» C. Stop Word Removal

discuss

72.	In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» D. All of the above

discuss

73.	Which of the following is a reasonable way to select the number of principal components "k"?
A.	Choose k to be the smallest value so that at least 99% of the varinace is retained.
B.	Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C.	Choose k to be the largest value so that 99% of the variance is retained.
D.	Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is retained.

discuss

74.	What is a sentence parser typically used for?
A.	It is used to parse sentences to check if they are utf-8 compliant.
B.	It is used to parse sentences to derive their most likely syntax tree structures.
C.	It is used to parse sentences to assign POS tags to all tokens.
D.	It is used to check if sentences can be parsed into meaningful tokens.
Answer» B. It is used to parse sentences to derive their most likely syntax tree structures.

discuss

75.	Data Analysis is a process of?
A.	inspecting data
B.	cleaning data
C.	transforming data
D.	All of the above
Answer» D. All of the above

discuss

76.	Which of the following is not a major data analysis approaches?
A.	Data Mining
B.	Predictive Intelligence
C.	Business Intelligence
D.	Text Analytics
Answer» B. Predictive Intelligence

discuss

77.	How many main statistical methodologies are used in data analysis?
A.	2
B.	3
C.	4
D.	5
Answer» A. 2

discuss

78.	In descriptive statistics, data from the entire population or a sample is summarized with ?
A.	integer descriptors
B.	floating descriptors
C.	numerical descriptors
D.	decimal descriptors
Answer» C. numerical descriptors

discuss

79.	Data Analysis is defined by the statistician?
A.	William S.
B.	Hans Peter Luhn
C.	Gregory Piatetsky-Shapiro
D.	John Tukey
Answer» D. John Tukey

discuss

80.	Which of the following is true about hypothesis testing?
A.	answering yes/no questions about the data
B.	estimating numerical characteristics of the data
C.	describing associations within the data
D.	modeling relationships within the data
Answer» A. answering yes/no questions about the data

discuss

81.	The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.
A.	TRUE
B.	FALSE
C.	Can be true or false
D.	Can not say
Answer» A. TRUE

discuss

82.	The branch of statistics which deals with development of particular statistical methods is classified as
A.	industry statistics
B.	economic statistics
C.	applied statistics
D.	applied statistics
Answer» D. applied statistics

discuss

83.	Which of the following is true about regression analysis?
A.	answering yes/no questions about the data
B.	estimating numerical characteristics of the data
C.	modeling relationships within the data
D.	describing associations within the data
Answer» C. modeling relationships within the data

discuss

84.	Text Analytics, also referred to as Text Mining?
A.	TRUE
B.	FALSE
C.	Can be true or false
D.	Can not say
Answer» A. TRUE

discuss

85.	What is true about Data Visualization?
A.	Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts.
B.	Data Visualization helps users in analyzing a large amount of data in a simpler way.
C.	Data Visualization makes complex data more accessible, understandable, and usable.
D.	All of the above
Answer» D. All of the above

discuss

86.	Data can be visualized using?
A.	graphs
B.	charts
C.	maps
D.	All of the above
Answer» D. All of the above

discuss

87.	Data visualization is also an element of the broader _____________.
A.	deliver presentation architecture
B.	data presentation architecture
C.	dataset presentation architecture
D.	data process architecture
Answer» B. data presentation architecture

discuss

88.	Which method shows hierarchical data in a nested format?
A.	Treemaps
B.	Scatter plots
C.	Population pyramids
D.	Area charts
Answer» A. Treemaps

discuss

89.	Which is used to inference for 1 proportion using normal approx?
A.	fisher.test()
B.	chisq.test()
C.	Lm.test()
D.	prop.test()
Answer» D. prop.test()

discuss

90.	Which is used to find the factor congruence coefficients?
A.	factor.mosaicplot
B.	factor.xyplot
C.	factor.congruence
D.	factor.cumsum
Answer» C. factor.congruence

discuss

91.	Which of the following is tool for checking normality?
A.	qqline()
B.	qline()
C.	anova()
D.	lm()
Answer» A. qqline()

discuss

92.	Which of the following is false?
A.	data visualization include the ability to absorb information quickly
B.	Data visualization is another form of visual art
C.	Data visualization decrease the insights and take solwer decisions
D.	None Of the above
Answer» C. Data visualization decrease the insights and take solwer decisions

discuss

93.	Common use cases for data visualization include?
A.	Politics
B.	Sales and marketing
C.	Healthcare
D.	All of the above
Answer» D. All of the above

discuss

94.	Which of the following plots are often used for checking randomness in time series?
A.	Autocausation
B.	Autorank
C.	Autocorrelation
D.	None of the above
Answer» C. Autocorrelation

discuss

95.	To find the minimum or the maximum of a function, we set the gradient to zero because:
A.	The value of the gradient at extrema of a function is always zero
B.	Depends on the type of problem
C.	Both A and B
D.	None of the above
Answer» A. The value of the gradient at extrema of a function is always zero

discuss

96.	Which of the following techniques can not be used for normalization in text mining?
A.	Stemming
B.	Lemmatization
C.	Stop Word Removal
D.	None of the above
Answer» C. Stop Word Removal

discuss

97.	In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
A.	1 and 2
B.	2 and 3
C.	1 and 3
D.	All of the above
Answer» D. All of the above

discuss

98.	Which of the following is a reasonable way to select the number of principal components "k"?
A.	Choose k to be the smallest value so that at least 99% of the varinace is retained.
B.	Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C.	Choose k to be the largest value so that 99% of the variance is retained.
D.	Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is retained.

discuss

99.	Which of the following is false?
A.	Subsetting can be used to select and exclude variables and observations
B.	Raw data should be processed only one time.
C.	Merging concerns combining datasets on the same observations to produce a result with more variables
D.	None Of the above
Answer» B. Raw data should be processed only one time.

discuss

100.	According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
A.	Big data management and data mining
B.	Data warehousing and business intelligence
C.	Management of Hadoop clusters
D.	Collecting and storing unstructured data
Answer» A. Big data management and data mining

discuss

100+ Bigdata Solved MCQs

Which of the following can be used to impute data sets based only on information in the training set?

Which of the following model model include a backwards elimination feature selection routine?

Which of the following is a categorical outcome?

Which of the following function provides unsupervised prediction ?

What is true about Machine Learning?

ML is a field of AI consisting of learning algorithms that?

p → 0q is not a?

The action _______ of a robot arm specify to Place block A on block B.

A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.

A model of language consists of the categories which does not include ________.

Different learning methods does not include?

The model will be trained with data in one single batch is known as ?

Which of the following are ML methods?

In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?

Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?

To find the minimum or the maximum of a function, we set the gradient to zero because:

Which of the following is a disadvantage of decision trees?

How do you handle missing or corrupted data in a dataset?

When performing regression or classification, which of the following is the correct way to preprocess the data?

Which of the following statements about regularization is not correct?

Which of the following techniques can not be used for normalization in text mining?

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

Which of the following is a reasonable way to select the number of principal components "k"?

What is a sentence parser typically used for?

Data Analysis is a process of?

Which of the following is not a major data analysis approaches?

How many main statistical methodologies are used in data analysis?

In descriptive statistics, data from the entire population or a sample is summarized with ?

Data Analysis is defined by the statistician?

Which of the following is true about hypothesis testing?

The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

The branch of statistics which deals with development of particular statistical methods is classified as

Which of the following is true about regression analysis?

Text Analytics, also referred to as Text Mining?

What is true about Data Visualization?

Data can be visualized using?

Data visualization is also an element of the broader _____________.

Which method shows hierarchical data in a nested format?

Which is used to inference for 1 proportion using normal approx?

Which is used to find the factor congruence coefficients?

Which of the following is tool for checking normality?

Which of the following is false?

Common use cases for data visualization include?

Which of the following plots are often used for checking randomness in time series?

To find the minimum or the maximum of a function, we set the gradient to zero because:

Which of the following techniques can not be used for normalization in text mining?

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

Which of the following is a reasonable way to select the number of principal components "k"?

Which of the following is false?

According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?