Big Data MCQ's




Question 1 :
Which of the following is not a Hadoop Distributions?


  1. MAPR
  2. Cloudera
  3. Hortonworks
  4. RMAP
  

Question 2 :
___________is related with an inconsistency possessed by data and this in turn hampers the data analization process or creates hurdle in the way for those wish to analyze this form of data.


  1. Variability
  2. Variety
  3. Volume
  4. Complexity
  

Question 3 :
About data streaming, Which of the following statements is true?


  1. Stream data is always unstructured data.
  2. Stream data often has a high velocity.
  3. Stream elements cannot be stored on disk.
  4. Stream data is always structured data.
  

Question 4 :
The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?


  1. The number of 0's cannot be estimated at all.
  2. The number of 0's can be estimated with a maximum guaranteed error
  3. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.
  4. Determine whether an element has already occurred in previous stream data.
  

Question 5 :
Pick a hash function h that maps each of the N elements to at least log2 N bits, Estimated number of distinct elements is


  1. 2^R
  2. 2^(-R)
  3. 1-(2^R)
  4. 1-(2^(-R))
  

Question 6 :
What is the edit distance between A=father and B=feather ?


  1. 5
  2. 1
  3. 4
  4. 2
  

Question 7 :
NOSQL is


  1. Not only SQL
  2. Not SQL
  3. Not Over SQL
  4. No SQL
  

Question 8 :
Neo4j is an example of which of the following NoSQL architectural pattern?


  1. Key-value store
  2. Graph Store
  3. Document Store
  4. Column-based Store
  

Question 9 :
Hadoop is the solution for:


  1. Database software
  2. Big Data Software
  3. Data Mining software
  4. Distribution software
  

Question 10 :
Which of the following is not the default daemon of Hadoop?


  1. Namenode
  2. Datanode
  3. Job Tracker
  4. Job history server
  

Question 11 :
Which of the following is not true for 5v?


  1. Volume
  2. variable
  3. Velocity
  4. value
  

Question 12 :
The graphical representation of an SNA is made up of links and _____________.


  1. People
  2. Networks
  3. Nodes
  4. Computers
  

Question 13 :
Hadoop is a framework that works with a variety of related tools. Common hadoop ecosystem include ____________


  1. MapReduce, Hummer and Iguana
  2. MapReduce, Hive and HBase
  3. MapReduce, MySQL and Google Apps
  4. MapReduce, Heron and Trumpet
  

Question 14 :
_________ systems focus on the relationship between users and items for recommendation.


  1. DGIM
  2. Collaborative-Filtering
  3. Content Based and Collaborative Filtering
  4. Content Based
  

Question 15 :
A Reduce task receives


  1. one or more keys and their associated value list
  2. key value pair
  3. list of keys and their associated values
  4. list of key value pairs
  

Question 16 :
The FM-sketch algorithm can be used to:


  1. Estimate the number of distinct elements.
  2. Sample data with a time-sensitive window.
  3. Estimate the frequent elements.
  4. Determine whether an element has already occurred in previous stream data.
  

Question 17 :
Which algorithm isused to find fully connected subgraph in soial media mining?


  1. CURE
  2. CPM
  3. SimRank
  4. Girvan-Newman Algorithm
  

Question 18 :
if Distance measure d(x, y)= d(y, x) then it is called


  1. Symmetric
  2. identical
  3. positiveness
  4. triangle inequality
  

Question 19 :
Sharding' a database across many server instances can be achieved with _______________


  1. MAN
  2. LAN
  3. WAN
  4. SAN
  

Question 20 :
Effect of Spider trap on page rank


  1. perticular page get the highest page rank
  2. All the pages of web will get 0 page rank
  3. no effect on any page
  4. affects a perticular set of pages
  

Question 21 :
CSV and JSON can be described as


  1. Structured data
  2. Unstructured data
  3. Semi-structured data
  4. Multi-structured data
  

Question 22 :
_________ systems focus on the relationship between users and items for recommendation.


  1. DGIM
  2. Collaborative-Filtering
  3. Content Based and Collaborative Filtering
  4. Content Based
  

Question 23 :
ETL stands for ________________


  1. Extraction transformation and loading
  2. Extract Taken Lend
  3. Enterprise Transfer Load
  4. Entertainment Transference Load
  

Question 24 :
Which of the following is correct option for MongoDB


  1. MongoDB is column oriented data store
  2. MongoDB uses XML more in comparison with JSON
  3. MongoDB is a document store database
  4. MongoDB is a key-value data store
  

Question 25 :
________ stores are used to store information about networks, such as social connections.


  1. Key-value
  2. Wide-column
  3. Document
  4. graph
  

Question 26 :
The time between elements of one stream


  1. need not be uniform
  2. need to be uniform
  3. must be 1ms.
  4. must be 1ns
  

Question 27 :
Which of the following is a column-oriented database that runs on top of HDFS


  1. Hive
  2. Sqoop
  3. Hbase
  4. Flume
  

Question 28 :
Techniques for fooling search engines into believing your page is about something it is not, are called _____________.


  1. term spam
  2. page rank
  3. phishing
  4. dead ends
  

Question 29 :
The police set up checkpoints at randomly selected road locations, then inspected every driver at those locations. What type of sample is this?


  1. Simple Random Sample
  2. Startified Random Sample
  3. Cluster Random Sample
  4. Uniform sampling
  

Question 30 :
Which of the following Operation can be implemented with Combiners?


  1. Selection
  2. Projection
  3. Natural Join
  4. Union
  
Pages