[MCQ's] Big Data Mcq's

Question 31 :
Following is based on grid like street geography of the New York:

Manhattan Distance
Edit Distance
Hamming distance
Lp distance

Question 32 :
Which of the following statements about standard Bloom filters is correct?

It is possible to delete an element from a Bloom filter.
A Bloom filter always returns the correct result.
It is possible to alter the hash functions of a full Bloom filter to create more space.
A Bloom filter always returns TRUE when testing for a previously added element.

Question 33 :
The hardware term used to describe Hadoop hardware requirements is

Commodity firmware
Commodity software
Commodity hardware
Cluster hardware

Question 34 :
A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.

One-time Query
Standing Query
Adhoc Query
General Query

Question 35 :
Find Hamming Distance for vectors A=100101011 B=100010010

2
4
3
1

Question 36 :
Which of the following statements about data streaming is true?

Stream data is always unstructured data.
Stream data often has a high velocity.
Stream elements cannot be stored on disk.
Stream data is always structured data.

Question 37 :
Sliding window operations typically fall in the category

OLTP Transactions
Big Data Batch Processing
Big Data Real Time Processing
Small Batch Processing

Question 38 :
In Bloom filter an array of n bits is initialized with

all 0s
all 1s
half 0s and half 1s
all -1

Question 39 :
The Jaccard similarity of two non-binary sets A and B, is defined by__________

Jaccard Index
Primary Index
Secondary Index
Clustered Index

Question 40 :
Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).

L1 =10 , L2 = 5.83
L1 =10 , L2 = 5
L1 =11 , L2 = 4.9
L1 =9 , L2 = 5.83

Question 41 :
Which of the following is a NoSQL Database Type ?

SQL
JSON
Document databases
CSV

Question 42 :
Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?

Hadoop Common
YARN
HDFS
MapReduce

Question 43 :
What do you mean by sampling of stream data?

Sampling reduces the amount of data fed to a subsequent data mining algorithm.
Sampling reduces the diversity of the data stream
Sampling aims to keep statistical properties of the data intact.
Sampling algorithms often doesn't need multiple passes over the data

Question 44 :
_____________is a batch-based, distributed computing framework modeled after Google’s paper.

MapCompute
MapReuse
MapCluster
MapReduce

Question 45 :
If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is

8
16
32
64

Question 46 :
Which of the following is not the class of points in BFR algorithm

Discard Set (DS)
Compression Set (CS)
Isolation Set (IS)
Retained Set (RS)

Question 47 :
Which of the following decides the number of partitions that are created on the local file system of the worker nodes?

Number of map tasks
Number of reduce tasks
Number of file input splits
Number of distinct keys in the intermediate key-value pairs

Question 48 :
During start up, the ___________ loads the file system state from the fsimage and the edits log file.

Datanode
Namenode
Secondary Namenode
Rack awereness policy

Question 49 :
which of the following is not the characterstic of stream data?

Continuous
ordered
persistant
huge

Question 50 :
What is the finally produced by Hierarchical Agglomerative Clustering?

final estimate of cluster centroids
assignment of each point to clusters
tree showing how close things are to each other
Group of clusters

Big Data MCQ's