BigData Hadoop Question Answers

hello friends if you are looking for Bigdata Hadoop MCQ with Answers | Bigdata Hadoop Multiple Choice Questions | Bigdata Hadoop Objective Type Questions | Bigdata Hadoop Accenture Test Answers you will get there

Q1. What is the default replication factor to achieve fault tolerance?

Ans – 3

Q2. HBase is ______defines only column families.

Ans – Schema-less

Q3.which is a distributed machine learning framework on top of Spark.

Ans – ML, MLlib

Q4. What is the language in which Hadoop framework is written?

Ans – Java

Q5. How can we load data into Hive tables?

A.From files
B.From other tables
C.from Databases
D.none of the above

Ans: ac

Q6. Which of the following is not a category of NoSQL db?

Ans – Scientific

Q7. The Shuffle and Sort process will happen

A.Before the Mapper Phase
B.After the Mapper Phase
C.After the Reducer Phase
D.Before the Reducer phase

Ans: ac

Q8. Which datatype can be used to implement a bounded variable length character string?

Ans – varchar

Q9. Which file(s) is/are used for storing meta data in Name node?

Ans – editlogs fsimage

Q10.What is/are machine learning algorithm? Select all that apply. (Check Box)


Ans: ab

Q11. Which of the following statement is True about Apache Kafka ?

Ans -Apache Kafka retains all published messages regardless of consumption.

Q12. You have developed one application based on windowing operation, if you have set batch interval 10 seconds and window length 60 seconds then how many RDD’s will be processed in a window?

Ans – 10

Q13.What will hdfs dfs -chmod 600 filename.txt result in?

Ans – Give user access to read and write and execute with no access to others

Q14. Which of the following process schedules the tasks across nodes in Hadoop system

Ans – Scheduler

Q15. Which processes need to be active for successful call to Hive?

Ans – Namenode Datanode Secondary namenode ResourceManager NodeManager

Q16. Which of the following phases occur simultaneously in Map-Reduce Framework? Choose the most appropriate option.

Ans – shuffle and sort

Q17. What are the capabilities of Kafka?

A.Kafka runs as a cluster on one or more servers that can span multiple datacenters
B.The Kafka cluster stores streams of records in categories called topics.
C.Each record consists of a key, a value, and a timestamp.
D.All of the above

Ans: d

Q18. A network of 10000 sensors has been deployed in on connectd fleet of cars. Which of the following technology can be used to ingest the generated data into the analytics platform?

Ans – Hadoop

Q19. hadoop fs-put emp.csv /dataset command? What will happen when you execute above

Ans – emp.csv file copied from Local to HDFS /dataset directory

Q20. Assume that we have a the below Employee data in csv file names “EmployeeData.csv” onHDFS folder “/SparkDemos”101,Alex, 20000102,Adam,30000103 Rose 40000101 Linda, 35000102,Jack,45000103,Sam 50000turematerials09@accentureWhich of the below code fragment can will help create an appropnate Dataframe named”EmployeeDF” for the same?

Ans – EmployeeDF =“/Spark Demos/EmployeeData.csv”)schema1 = StructType([StructField(“Dept_ID”, IntegerType(), True)StructField(“EmpName” StringType() True),\StructField(“Salary”,IntegerType(), True)])EmployeeDF = spark read schema(schema1).load (“/ SparkDemos / E,

Q21. Which APIs are available in PySpark for data processing?


Ans: abd

Q22. Hadoop is a Storage as well as processing framework – True/False?

Ans – True

Q23. Which type of Programming does Python support?

a.object-oriented programming
B.structured programming
C. functional programming
D.all of the mentioned

Ans: d

Q24. Which of the followings are a valid aggregate functions?


Ans: acd

Q25. Which of the following are examples of ETL tools?

Ans – Oracle

Q26. Select True or False. We can secure data in a datalake with mechanisms such as authentication and authorization

Ans – True

Q27. Which of the following is the correct extension of the Python file?

Ans – .py

Q28. Which of the following language is not supported by Spark?

Ans – pascal

Q29. Which of the following is not a DDL command?

Ans – Update

Q30. Which statement is used to delete all rows in a table without having the action logged?

Ans – Truncate

Q31. What are the different types of partitioning Methods?

Ans – row and column based

Q32. Select True or False. Primary Key column supports NULL values.

Ans – false

Q33. Which Talend components are used to process input/output for delimited files?


Ans: ad

Q34. ____is a distributed graph processing framework on top of Spark.

Ans – graphX

Q35. Which of the following built-in function is used to identify datatype of a variable in python?

Ans – type()

Q36. What will be the value of the following Python expression? 4+3%5

Ans – 7

Q37. What will be the output of the following Python code snippet if x = 1 ? x <<2

Ans – 4

Q38. The different types of analysis which are supported through data lakes are:

D. Continuous

Ans: ac

Q39. Which of the following is not Constraint in SQL? .

Ans – union

Q40. Identify the advantages of in-memory processing?

A.Decreasing costs
B.High performance hardware
C.Inefficient to scale up
D.Esay to scale up
E.Easy to maintain and deploy
F.Easy access to data analytics

Ans: abef

Leave a Reply

Your email address will not be published. Required fields are marked *