Question 31 :
Following is based on grid like street geography of the New York:
- Manhattan Distance
- Edit Distance
- Hamming distance
- Lp distance
Question 32 :
Which of the following statements about standard Bloom filters is correct?
- It is possible to delete an element from a Bloom filter.
- A Bloom filter always returns the correct result.
- It is possible to alter the hash functions of a full Bloom filter to create more space.
- A Bloom filter always returns TRUE when testing for a previously added element.
Question 33 :
The hardware term used to describe Hadoop hardware requirements is
- Commodity firmware
- Commodity software
- Commodity hardware
- Cluster hardware
Question 34 :
A ________________ query Q is a query that is issued once over a database D, and then logically runs continuously over the data in D until Q is terminated.
- One-time Query
- Standing Query
- Adhoc Query
- General Query
Question 35 :
Find Hamming Distance for vectors A=100101011 B=100010010
- 2
- 4
- 3
- 1
Question 36 :
Which of the following statements about data streaming is true?
- Stream data is always unstructured data.
- Stream data often has a high velocity.
- Stream elements cannot be stored on disk.
- Stream data is always structured data.
Question 37 :
Sliding window operations typically fall in the category
- OLTP Transactions
- Big Data Batch Processing
- Big Data Real Time Processing
- Small Batch Processing
Question 38 :
In Bloom filter an array of n bits is initialized with
- all 0s
- all 1s
- half 0s and half 1s
- all -1
Question 39 :
The Jaccard similarity of two non-binary sets A and B, is defined by__________
- Jaccard Index
- Primary Index
- Secondary Index
- Clustered Index
Question 40 :
Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).
- L1 =10 , L2 = 5.83
- L1 =10 , L2 = 5
- L1 =11 , L2 = 4.9
- L1 =9 , L2 = 5.83
Question 41 :
Which of the following is a NoSQL Database Type ?
- SQL
- JSON
- Document databases
- CSV
Question 42 :
Which of the following is responsible for managing the cluster resources and use them for scheduling users’ applications?
- Hadoop Common
- YARN
- HDFS
- MapReduce
Question 43 :
What do you mean by sampling of stream data?
- Sampling reduces the amount of data fed to a subsequent data mining algorithm.
- Sampling reduces the diversity of the data stream
- Sampling aims to keep statistical properties of the data intact.
- Sampling algorithms often doesn't need multiple passes over the data
Question 44 :
_____________is a batch-based, distributed computing framework modeled after Google’s paper.
- MapCompute
- MapReuse
- MapCluster
- MapReduce
Question 45 :
If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is
- 8
- 16
- 32
- 64
Question 46 :
Which of the following is not the class of points in BFR algorithm
- Discard Set (DS)
- Compression Set (CS)
- Isolation Set (IS)
- Retained Set (RS)
Question 47 :
Which of the following decides the number of partitions that are created on the local file system of the worker nodes?
- Number of map tasks
- Number of reduce tasks
- Number of file input splits
- Number of distinct keys in the intermediate key-value pairs
Question 48 :
During start up, the ___________ loads the file system state from the fsimage and the edits log file.
- Datanode
- Namenode
- Secondary Namenode
- Rack awereness policy
Question 49 :
which of the following is not the characterstic of stream data?
- Continuous
- ordered
- persistant
- huge
Question 50 :
What is the finally produced by Hierarchical Agglomerative Clustering?
- final estimate of cluster centroids
- assignment of each point to clusters
- tree showing how close things are to each other
- Group of clusters