Chapter: Introduction to Bigdata
1.

Data in ___________ bytes size is called Big Data.

A. Tera
B. Giga
C. Peta
D. Meta
Answer» C. Peta
2.

How many V's of Big Data

A. 2
B. 3
C. 4
D. 5
Answer» D. 5
3.

Transaction data of the bank is?

A. structured data
B. unstructured datat
C. Both A and B
D. None of the above
Answer» A. structured data
4.

In how many forms BigData could be found?

A. 2
B. 3
C. 4
D. 5
Answer» B. 3
5.

Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions
B. Improved customer service
C. Better operational efficiency
D. All of the above
Answer» D. All of the above
6.

Which of the following are incorrect Big Data Technologies?

A. Apache Hadoop
B. Apache Spark
C. Apache Kafka
D. Apache Pytarch
Answer» D. Apache Pytarch
7.

The overall percentage of the world’s total data has been created just within the past two years is ?

A. 80%
B. 85%
C. 90%
D. 95%
Answer» C. 90%
8.

Apache Kafka is an open-source platform that was created by?

A. LinkedIn
B. Facebook
C. Google
D. IBM
Answer» A. LinkedIn
9.

What was Hadoop named after?

A. Creator Doug Cutting’s favorite circus act
B. Cuttings high school rock band
C. The toy elephant of Cutting’s son
D. A sound Cutting’s laptop made during Hadoop development
Answer» C. The toy elephant of Cutting’s son
10.

What are the main components of Big Data?

A. MapReduce
B. HDFS
C. YARN
D. All of the above
Answer» D. All of the above
11.

All of the following accurately describe Hadoop, EXCEPT ____________

A. Open-source
B. Real-time
C. Java-based
D. Distributed computing approach
Answer» B. Real-time
12.

__________ has the world’s largest Hadoop cluster.

A. Apple
B. Datamatics
C. Facebook
D. None of the above
Answer» C. Facebook
13.

Facebook Tackles Big Data With _______ based on Hadoop.

A. Project Prism
B. Prism
C. Project Big
D. Project Data
Answer» A. Project Prism
14.

___________ is general-purpose computing model and runtime system for distributed data analytics.

A. Mapreduce
B. Drill
C. Oozie
D. None of the above
Answer» A. Mapreduce
15.

The examination of large amounts of data to see what patterns or other useful information can be found is known as

A. Data examination
B. Information analysis
C. Big data analytics
D. Data analysis
Answer» C. Big data analytics
16.

Big data analysis does the following except?

A. Collects data
B. Spreads data
C. Organizes data
D. Analyzes data
Answer» D. Analyzes data
17.

What makes Big Data analysis difficult to optimize?

A. Big Data is not difficult to optimize
B. Both data and cost effective ways to mine data to make business sense out of it
C. The technology to mine data
D. None of the above
Answer» B. Both data and cost effective ways to mine data to make business sense out of it
18.

The new source of big data that will trigger a Big Data revolution in the years to come is?

A. Business transactions
B. Social media
C. Transactional data and sensor data
D. RDBMS
Answer» C. Transactional data and sensor data
19.

The unit of data that flows through a Flume agent is

A. Log
B. Row
C. Record
D. Event
Answer» D. Event
20.

Listed below are the three steps that are followed to deploy a Big Data Solution except

A. Data Processing
B. Data dissemination
C. Data Storage
D. Data Ingestion
Answer» B. Data dissemination
21.

Who popularized bigdata term?

A. John deere
B. John Mashey
C. johny Mashe
D. Jhon Mash
Answer» B. John Mashey
22.

Numbers ,text, image, audio and video data is ____

A. Volume
B. Value
C. Varity
D. Variety
Answer» D. Variety
23.

Real time data is ______.

A. Field
B. Primary Key
C. unique
D. record
Answer» C. unique
24.

______ is the term that is used to describe data that is high volume , high velocity and /or high variety.

A. Analytics
B. Bigdata
C. Hadoop Data
D. Bigdata analytics
Answer» B. Bigdata
25.

According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?

A. Big data management and data mining
B. Data warehousing and business intelligence
C. Management of Hadoop clusters
D. Collecting and storing unstructured data
Answer» A. Big data management and data mining
26.

Point out the wrong statement.

A. Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
B. Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
C. The programming model, MapReduce, used by Hadoop is difficult to write and test
D. All of these
Answer» C. The programming model, MapReduce, used by Hadoop is difficult to write and test
27.

__________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

A. MapReduce
B. Mahout
C. Oozie
D. All of the mentioned
Answer» A. MapReduce
28.

__________ has the world’s largest Hadoop cluster.

A. Apple
B. Datamatics
C. Facebook
D. None of the mentioned
Answer» C. Facebook
29.

Facebook Tackles Big Data With _______ based on Hadoop.

A. ‘Project Prism’
B. ‘Prism’
C. ‘Project Big’
D. ‘Project Data’
Answer» A. ‘Project Prism’
30.

Data science is the process of diverse set of data through ?

A. organizing data
B. processing data
C. analysing data
D. All of the above
Answer» D. All of the above
31.

The modern conception of data science as an independent discipline is sometimes attributed to?

A. William S.
B. John McCarthy
C. Arthur Samuel
D. Satoshi Nakamoto
Answer» A. William S.
32.

Which of the following language is used in Data science?

A. C
B. C++
C. R
D. Ruby
Answer» C. R
33.

Which of the following is false?

A. Subsetting can be used to select and exclude variables and observations
B. Raw data should be processed only one time.
C. Merging concerns combining datasets on the same observations to produce a result with more variables
D. None Of the above
Answer» B. Raw data should be processed only one time.
34.

What is the work of Data Architect?

A. utilize large data sets to gather information that meets their company's needs
B. work with businesses to determine the best usage of the information yielded from data
C. build data solutions that are optimized for performance and design applications
D. All of the above
Answer» C. build data solutions that are optimized for performance and design applications
35.

Which of the following is correct skills for a Data Scientist?

A. Probability & Statistics
B. Machine Learning / Deep Learning
C. Data Wrangling
D. All of the above
Answer» D. All of the above
36.

Which of the following are correct component for data science?

A. Data Engineering
B. Advanced Computing
C. Domain expertise
D. All of the above
Answer» D. All of the above
37.

Which of the following is not a part of data science process?

A. Discovery
B. Model Planning
C. Communication Building
D. Operationalize
Answer» C. Communication Building
38.

Which of the following are the Data Sources in data science?

A. Structured
B. Unstructured
C. Both A and B
D. None Of the above
Answer» C. Both A and B
39.

Which of the following is not a application for data science?

A. Recommendation Systems
B. Image & Speech Recognition
C. Online Price Comparison
D. Privacy Checker
Answer» D. Privacy Checker
40.

Point out the correct statement.

A. Raw data is original source of data
B. Preprocessed data is original source of data
C. Raw data is the data obtained after processing steps
D. None of the above
Answer» A. Raw data is original source of data
41.

Which of the following is one of the key data science skills?

A. Statistics
B. Machine Learning
C. Data Visualization
D. All of the above
Answer» D. All of the above
42.

Which of the following is a key characteristic of a hacker?

A. Afraid to say they don't know the answer
B. Willing to find answers on their own
C. Not Willing to find answers on their own
D. All of the above
Answer» B. Willing to find answers on their own
43.

Raw data should be processed only one time.

A. True
B. False
C. Can be true or false
D. Can not say
Answer» B. False
44.

Which of the following is the common goal of statistical modelling?

A. Inference
B. Summarizing
C. Subsetting
D. None of the above
Answer» A. Inference
45.

Causal analysis is commonly applied to census data.

A. True
B. False
C. Can be true or false
D. Can not say
Answer» B. False
46.

Which of the following model is usually a gold standard for data analysis?

A. Inferential
B. Descriptive
C. Causal
D. All of the above
Answer» C. Causal
47.

Which of the following is a revision control system?

A. Git
B. Numpy
C. Scipy
D. Slidify
Answer» A. Git
48.

Which of the following step is performed by data scientist after acquiring the data?

A. Data Cleaning
B. Data Integration
C. Data Replication
D. All of the above
Answer» A. Data Cleaning
49.

Which of the following focuses on the discovery of (previously) unknown properties on the data?

A. Data mining
B. BigData
C. Data wrangling
D. Machine Learning
Answer» A. Data mining
Chapter: Introduction to Data Science
50.

Which of the following can be used to create sub–samples using a maximum dissimilarity approach?

A. minDissim
B. maxDissim
C. inmaxDissim
D. All of the Mentioned
Answer» B. maxDissim
51.

Which of the following can be used to impute data sets based only on information in the training set?

A. postprocess
B. preProcess
C. process
D. All of the Mentioned
Answer» B. preProcess
52.

Which of the following model model include a backwards elimination feature selection routine?

A. MCV
B. MARS
C. MCRS
D. All of the Mentioned
Answer» B. MARS
53.

Which of the following is a categorical outcome?

A. RMSE
B. RSquared
C. Accuracy
D. All of the Mentioned
Answer» C. Accuracy
54.

Which of the following function provides unsupervised prediction ?

A. cl_forecast
B. cl_nowcast
C. cl_precast
D. None of the Mentioned
Answer» D. None of the Mentioned
55.

What is true about Machine Learning?

A. Machine Learning (ML) is that field of computer science
B. ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method.
C. The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention.
D. All of the above
Answer» D. All of the above
56.

ML is a field of AI consisting of learning algorithms that?

A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above
Answer» D. All of the above
57.

p → 0q is not a?

A. hack clause
B. horn clause
C. structural clause
D. system clause
Answer» B. horn clause
58.

The action _______ of a robot arm specify to Place block A on block B.

A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)
Answer» A. STACK(A,B)
59.

A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.

A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser
Answer» C. top-down parser
60.

A model of language consists of the categories which does not include ________.

A. System Unit
B. structural units.
C. data units
D. empirical units
Answer» B. structural units.
61.

Different learning methods does not include?

A. Introduction
B. Analogy
C. Deduction
D. Memorization
Answer» A. Introduction
62.

The model will be trained with data in one single batch is known as ?

A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above
Answer» C. Both A and B
63.

Which of the following are ML methods?

A. based on human supervision
B. supervised Learning
C. semi-reinforcement Learning
D. All of the above
Answer» A. based on human supervision
64.

In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?

A. mini-batches
B. optimizedparameters
C. hyperparameters
D. superparameters
Answer» C. hyperparameters
65.

Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?

A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Answer» D. Random Forest
66.

To find the minimum or the maximum of a function, we set the gradient to zero because:

A. The value of the gradient at extrema of a function is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer» A. The value of the gradient at extrema of a function is always zero
67.

Which of the following is a disadvantage of decision trees?

A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
Answer» C. Decision trees are prone to be overfit
68.

How do you handle missing or corrupted data in a dataset?

A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
Answer» D. All of the above
69.

When performing regression or classification, which of the following is the correct way to preprocess the data?

A. Normalize the data -> PCA -> training
B. PCA -> normalize PCA output -> training
C. Normalize the data -> PCA -> normalize PCA output -> training
D. None of the above
Answer» A. Normalize the data -> PCA -> training
70.

Which of the following statements about regularization is not correct?

A. Using too large a value of lambda can cause your hypothesis to underfit the data.
B. Using too large a value of lambda can cause your hypothesis to overfit the data
C. Using a very large value of lambda cannot hurt the performance of your hypothesis.
D. None of the above
Answer» D. None of the above
71.

Which of the following techniques can not be used for normalization in text mining?

A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
Answer» C. Stop Word Removal
72.

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Answer» D. All of the above
73.

Which of the following is a reasonable way to select the number of principal components "k"?

A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C. Choose k to be the largest value so that 99% of the variance is retained.
D. Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
74.

What is a sentence parser typically used for?

A. It is used to parse sentences to check if they are utf-8 compliant.
B. It is used to parse sentences to derive their most likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to all tokens.
D. It is used to check if sentences can be parsed into meaningful tokens.
Answer» B. It is used to parse sentences to derive their most likely syntax tree structures.
75.

Data Analysis is a process of?

A. inspecting data
B. cleaning data
C. transforming data
D. All of the above
Answer» D. All of the above
76.

Which of the following is not a major data analysis approaches?

A. Data Mining
B. Predictive Intelligence
C. Business Intelligence
D. Text Analytics
Answer» B. Predictive Intelligence
77.

How many main statistical methodologies are used in data analysis?

A. 2
B. 3
C. 4
D. 5
Answer» A. 2
78.

In descriptive statistics, data from the entire population or a sample is summarized with ?

A. integer descriptors
B. floating descriptors
C. numerical descriptors
D. decimal descriptors
Answer» C. numerical descriptors
79.

Data Analysis is defined by the statistician?

A. William S.
B. Hans Peter Luhn
C. Gregory Piatetsky-Shapiro
D. John Tukey
Answer» D. John Tukey
80.

Which of the following is true about hypothesis testing?

A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. describing associations within the data
D. modeling relationships within the data
Answer» A. answering yes/no questions about the data
81.

The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities.

A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer» A. TRUE
82.

The branch of statistics which deals with development of particular statistical methods is classified as

A. industry statistics
B. economic statistics
C. applied statistics
D. applied statistics
Answer» D. applied statistics
83.

Which of the following is true about regression analysis?

A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. modeling relationships within the data
D. describing associations within the data
Answer» C. modeling relationships within the data
84.

Text Analytics, also referred to as Text Mining?

A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer» A. TRUE
85.

What is true about Data Visualization?

A. Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts.
B. Data Visualization helps users in analyzing a large amount of data in a simpler way.
C. Data Visualization makes complex data more accessible, understandable, and usable.
D. All of the above
Answer» D. All of the above
86.

Data can be visualized using?

A. graphs
B. charts
C. maps
D. All of the above
Answer» D. All of the above
87.

Data visualization is also an element of the broader _____________.

A. deliver presentation architecture
B. data presentation architecture
C. dataset presentation architecture
D. data process architecture
Answer» B. data presentation architecture
88.

Which method shows hierarchical data in a nested format?

A. Treemaps
B. Scatter plots
C. Population pyramids
D. Area charts
Answer» A. Treemaps
89.

Which is used to inference for 1 proportion using normal approx?

A. fisher.test()
B. chisq.test()
C. Lm.test()
D. prop.test()
Answer» D. prop.test()
90.

Which is used to find the factor congruence coefficients?

A. factor.mosaicplot
B. factor.xyplot
C. factor.congruence
D. factor.cumsum
Answer» C. factor.congruence
91.

Which of the following is tool for checking normality?

A. qqline()
B. qline()
C. anova()
D. lm()
Answer» A. qqline()
92.

Which of the following is false?

A. data visualization include the ability to absorb information quickly
B. Data visualization is another form of visual art
C. Data visualization decrease the insights and take solwer decisions
D. None Of the above
Answer» C. Data visualization decrease the insights and take solwer decisions
93.

Common use cases for data visualization include?

A. Politics
B. Sales and marketing
C. Healthcare
D. All of the above
Answer» D. All of the above
94.

Which of the following plots are often used for checking randomness in time series?

A. Autocausation
B. Autorank
C. Autocorrelation
D. None of the above
Answer» C. Autocorrelation
Chapter: Introduction to Machine Learning
95.

To find the minimum or the maximum of a function, we set the gradient to zero because:

A. The value of the gradient at extrema of a function is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer» A. The value of the gradient at extrema of a function is always zero
96.

Which of the following techniques can not be used for normalization in text mining?

A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
Answer» C. Stop Word Removal
97.

In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Answer» D. All of the above
98.

Which of the following is a reasonable way to select the number of principal components "k"?

A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C. Choose k to be the largest value so that 99% of the variance is retained.
D. Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
99.

Which of the following is false?

A. Subsetting can be used to select and exclude variables and observations
B. Raw data should be processed only one time.
C. Merging concerns combining datasets on the same observations to produce a result with more variables
D. None Of the above
Answer» B. Raw data should be processed only one time.
Chapter: Data Analytics with R Weka Machine Learning
100.

According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?

A. Big data management and data mining
B. Data warehousing and business intelligence
C. Management of Hadoop clusters
D. Collecting and storing unstructured data
Answer» A. Big data management and data mining
Tags
  • Question and answers in Bigdata,
  • Bigdata multiple choice questions and answers,
  • Bigdata Important MCQs,
  • Solved MCQs for Bigdata,
  • Bigdata MCQs with answers PDF download