Question 1:

Which HDFS feature protects against user errors causing accidental loss of data?

A. Encryption

B. Replication

C. Namenode federation

D. Snapshots

Correct Answer: B

Question 2:

A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datasets.

Why is this a problem?

A. MapReduce works best on unstructured data

B. There is no problem; MapReduce accommodates all the data

C. MapReduce can only parse one file at a time.

D. MapReduce is not ideal when the processing of one dataset depends on another.

Correct Answer: D

Question 3:

What is a property of a good color model for ordinal data?

A. Uses a rainbow-like color map for distinction of categories

B. Uses a rainbow-like color map for ease of display and printing

C. Uses perceptually ordinal colors with just-noticeable increments

D. Uses perceptually ordinal colors with linear, perceptual increments

Correct Answer: D

Question 4:

What is the most likely reason for an HBase table to contain millions of columns?

A. Data is imported from a relational database table

B. Data is stored in the column qualifier

C. There are thousands of columns families

D. The column names are randomly generated

Correct Answer: B

Question 5:

Which scenario would be ideal for processing Hadoop data with Hive?

A. Structured data, real-time processing

B. Unstructured data; batch processing

C. Unstructured data; real-time processing

D. Structured data; batch processing

Correct Answer: B

Question 6:

Why would a company decide to use HBase to replace an existing relational database?

A. It is required for performing ad-hoc queries.

B. Varying formats of input data requires columns to be added in real time.

C. The company\’s employees are already fluent in SQL.

D. Existing SQL code will run unchanged on HBase.

Correct Answer: A

Question 7:

Which graph structure would best model the relationship between job seekers and employers?

A. Bipartite

B. Weighted

C. Directed acyclic

D. Ranked

Correct Answer: A

Question 8:

What is an ideal use case for HDFS?

A. Storing files that are updated frequently

B. Storing files that are written once and read many times

C. Storing results between Map steps and Reduce steps

D. Storing application files in memory

Correct Answer: B

Question 9:

A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20.

What is the lie factor of the graph?

A. 1

B. 2

C. 3

D. 6

Correct Answer: B

Question 10:

What advantage does replication provide while storing a file in HDFS?

A. Data protection and scheduling flexibility

B. Elimination of requirement for a combiner process

C. Elimination of requirement for Shuffle and Sort process

D. Memory optimization and minimizing tasks to run

Correct Answer: A

Question 11:

What is an important simu-lation design consideration?

A. Ensure model Inputs align with reality

B. Use different seed values to regenerate results

C. For rare event models, minimize number of trials

D. A complex model is better than a simple model

Correct Answer: A

Question 12:

A hotel chain runs a simul-ation on room pricing. They want to estimate revenue, per hotel, within /- $10 with 95% confidence (Za/2=1.96). The estimated revenue standard deviation is $5000 based on previous booking data.

What is the optimal number of simulation trials to run?

A. A 32-bit operating system was used

B. The same number of trials was used

C. A linear congruential generator (LCG) was used (or pseudo-random number generation

D. Different seeds tor the random number generator were used.

Correct Answer: C

Question 13:

What is NOT a category of a NoSQL data store?

A. Columnar

B. Document

C. Key/Value

D. Flat File

Correct Answer: D

Question 14:

What is the maximum number of edges in an undirected graph of 10 nodes?

A. 45

B. 90

C. 100

D. 9

Correct Answer: A

Question 15:

What is an intended application of the MapReduce framework?

A. Processing can be broken into smaller pieces

B. Processing a large number of small files

C. Processing in real time is required

D. Processing a small subset of data

Correct Answer: A