Hello friends if you are looking for Big Data Multiple choice questions | Big Data MCQ with answers | Big Data Question with answers | Big Data Question answer PDF Dumps
Big Data Concepts: Big Data Essentials
1] Which statements are true about unstructured data?
Ans: Web pages, video files, and audio files are examples of unstructured data,
Unstructured data is very often linked to structured data. An example is how X-
ray images at a hospital are linked to patient IDs or health card numbers
2] Which statements accurately describe the differences between big data and data warehousing?
Ans: Data warehouses only handle structured data (relational or non-relational),
whereas big data can handle structured, un-structured, or semi-structured data,
While only DBMS compatible data are stored in data warehouses, all kinds of
data including transactional data, social media data (including audio and video),
machinery data, or any DBMS data can be stored and managed using big data
3] Which statement about parallel or distributed computing is true?
Ans: Distributed computing can allow an application on one machine to
leverage processing power, memory, or storage on another machine
4] Which statement about horizontal and vertical scaling is true?
Ans: Horizontal scaling is typically the easiest scaling option
5] Match each Hadoop component with its respective layer in the Hadoop
ecosystem. One layer will not be used.
Ans: Data access layer
Ans: Data processing layer
Ans: Data storage layer
Ans: Data management layer
6] Which statements are correct about HDFS?
Ans: HDFS provides high throughput access to application data by providing the
data access in parallel, HDFS provides a fault-tolerant storage layer for Hadoop
and its other components
7] What are the benefits of migrating from Hadoop to the cloud?
Ans: Better scalability, Long-term cost savings, Better collaboration, Easy access
and resource availability
8] What are the differences between Hadoop and cloud computing?
Ans: Cloud computing focuses on on-demand, scalable, and adaptable service
models, while Hadoop is all about extracting value out of volume, variety, and
velocity, Cloud computing constitutes various computing concepts. This
naturally involves a large number of computers that are usually connected
through a real-time communication network. Hadoop, on the other hand, is a
framework that uses simple programming models to process large data sets
across clusters of computers
9] What are the most important features of HDFS?
Ans: Scalability, Replication, Distributed storage, High availability
10] Which statement is true about in-memory storage systems?
Ans: Data storage in an in-memory database is reliant on random access
Big Data Concepts: Getting to Know Big Data
1] What could be accomplished by using big data technologies?
Ans: New product development and optimized offerings, Cost reductions
2] What are some examples of big data sources?
Ans: Email, Social media, Open data, Sensor data
3] Four of the seven characteristics of big data are listed. Match each
characteristic with its description. One description will not be used?
Ans: The amount of data that exists
Ans: The speed at which data is processed and becomes accessible
Ans: Making sure the data is accurate, which requires processes to keep bad
data from accumulating in your systems
Ans: The different types of data from XML to video to SMS
4] Which statements are true about unstructured data?
Ans: Unstructured data is information that does not have a predefined data
model, Common examples of unstructured data include audio, video files, or
5] What are the main deliverables of big data?
Ans: Text/image analytic, Multivariate analysis, Predictive models
6] What are the most important advantages of big data, according to the
International Institute for Analytics (IIA)?
Ans: Big data leads to cost reductions, Big data enables faster, better decision
making, Big data helps to identify what customers need and to introduce new
products and services accordingly
7] What are some of the main business domains that use big data tools today?
Ans: Aviation industry, Credit scoring agencies, E-commerce industry,
8] Which statements are correct about how Netflix utilizes big data?
Ans: Netflix has screenshots of scenes people might have viewed repeatedly,
the associated ratings, and the number of searches and the search topics, Netflix uses what is known as the big data recommendation algorithm to
suggest TV shows and movies based on a user’s preferences
9] What has Amazon been able to achieve by utilizing big data?
Ans: Gathering of information on what each customer is likely to purchase
based on what other people with similar interests have purchased, Gathering of data on the search patterns of its customers
10] What are the main challenges that companies experience with big data?
Ans: Unfamiliarity with big data and confusing it with traditional methods,
Integrating data from a variety of sources, Data security issues, Unprecedented data growth.
Spark for High-speed Big Data Analytics
1] What are some components of Apache Spark?
Ans: GraphX, Spark SQL
2] Which statements are true about resilient distributed datasets (RDDs) and
directed acyclic graphs (DAGs)?
Ans: RDD is an immutable (read-only), fundamental collection of elements or
items that can be operated on many devices at the same time (parallel
processing), Compared to MapReduce that creates a graph in two stages,
Apache Spark can create DAGs that contain many stages
3] As Spark usage grew at Uber, users encountered an increasing number of
issues. What were some of those issues/challenges?
Ans: Multiple Spark versions, Multiple compute clusters
4] What are some examples of metrics that Alibaba measures by utilizing Spark?
Ans: Connected components, Degree distribution
5] What are some advantages that Spark provides to modern healthcare
Ans: Behind the scenes distributed execution, A user-friendly API, Convenient
6] Which statement is correct about how Spark and Hadoop are different?
Ans: The Hadoop MapReduce model provides a batch engine, hence it is
dependent on different engines for other requirements, whereas Spark
performs batch, interactive, machine learning and streaming all in the same
7] What are some predominant industries that use Spark today?
Ans: Media and entertainment industry, Finance industry
8] What are some characteristics of Spark that help improve performance?
Ans: Lazy loading behavior, Cache appropriately
9] What are the three API types that are compatible with Spark?
Ans: RDD, DataFrame, DataSet
10] What are some of the most important best practices when it comes to using Apache Spark?
Ans: Proper tuning, Using the right level of parallelism, Joining a large and a
medium size RDD.
Techniques for Big Data Analytics
1] What are the biggest challenges associated with traditional data analytics?
Ans: Scalability, consistency, reliability, efficiency, and maintainability
2] Place the layers of big data analytics architecture in the correct order from
the bottom to the top.
Ans: Data monitoring, Data security, Data storage, Data processing, Data query,
3] What are the parameters of data ingestion?
Ans: Data size, Data format, Data frequency, Data velocity
4] What are some ways in which big data processing can be performed?
Ans: Stream processing, Batch processing
5] Which statement about data storage systems is correct?
Ans: The Hadoop distributed file system (HDFS) is the primary data storage
system used by Hadoop applications
6] Which are the main components of the big data architecture?
Ans: The data model, Big data security, Big data analytics
7] Which are the main reasons for using batch processing?
Ans: To join tables in relational databases, To run complex algorithms on large datasets which require access to the entire batch
8] Which is correct about stream processing?
Ans: Stream processing provides analytical insights before the data storage
9] Which statement is true about the Lambda architecture?
Ans: Data that enters the system is dispatched to two layers in the Lambda
architecture: the batch layer and the speed layer, The Lambda architecture
provides fault-tolerance against possible hardware failures and human errors
10] Which statement is true about the Kappa architecture?
Ans: The Kappa architecture uses stream processing to manage data flows
through a single path
Data Silos, Lakes, & Streams Introduction
1] Which of the following is a characteristic of a data silo?
Ans: Data may be in a raw, native format and not useful unless processed, Data is not easily accessible using common tools, Data is stored in isolation and cannot be combined with other sources
2] Which of the following are valid data types that can be stored in a data lake?
Ans: Unstructured data, Structured data, Semi-structured data
3] Which of the following is not a characteristic of a data lake?
Ans: Data is not searchable easily
4] Which of the following are challenges involved in designing and building data lakes?
Ans: Data lakes need to work with different data types and sparse and
incomplete data, Data lakes need to be able to support a huge volume of data,
Data lakes need to maintain data security and compliance
5] Which of the following are valid differences between a traditional relational database and a data warehouse?
Ans: A database supports ACID properties and a data warehouse does not, A
data warehouse is optimized for read access, a database is optimized for read as well as write access
6] Which of the following statements about data lakes and data warehouses are true?
Ans: Data warehouses hold fairly structured data optimized for analysis, Data
lakes need to maintain security and ensure compliance of the data stored within
it, Data lakes promote shared data stewardship
7] Which of the following is not an example of a data stream?
Ans: Census data stored in a database
8] Which of the following is not a valid service used to ingest data into the AWS cloud?
9] Which of the following correctly defines AWS Glue?
Ans: A single catalog which indexes data from multiple sources to make it
10] Which of the following AWS services can be used to visualize data stored in a data lake on AWS?
Ans: Amazon QuickSight