AWS Data Engineer School


Hello friends if you are looking for AWS Data Engineer School Question Answer| AWS Data Engineer School Multiple Choice Questions| AWS Data Engineer School Objective Type Questions | AWS Data Engineer MCQ with Engineer

Q1.Your customer wants to Import an exdating virtual machine Into the AWS cloud. As a Solutions Architect, you have decided to consider a migration method.Which service should you use?

A.AWS Import/Export

B.VM Import/Export

C. VPC Peering

D.Direct Connect

Ans: b

Q2.You are migrating a database system in your on-premises environment to AWS. You chose RDS as your database and configured it to access RDS from your EC2 instance to perform data processing. As a security requirement, network trallic between RDS and the instance must be encrypted using SSL and access to the database using profile credentials specific to the EC2 instance rather than a password.Choose a method to meet this requirement

A.Enable IAM DB Authentication
B.Enable RDS encyption
C.Allow EC2 instance access in the RDS security group
D.Enable network access control list

Ans: c

Q3. A business wishes to transition from on-premises Oracle to Amazon Aurora PostgreSQL The transfer must be conducted with the least amount of downtime possible by using AWS DMS. Prior to the cutover, a Database Specialist must verify that the data was correctly migrated from the source to the destination. The migration should have a negligible effect on the source database’s performance.

A.Use the AWS Schema Conversion Tool (AWS SCT) to convert source Oracle database schemes to the target Aurora DB cluster. Verify the datatype of the columns.
B.Use the table metrics of the AWS DMS task created for migrating the data to verify the statistics for the tables being migrated and to verify that the data definition language (DDL) statements are completed.
C.Enable the AWS Schema Conversion Tool (AWS SCT) premigration validation and review the premigration checklist to make sure there are no issues with the conversion.
D. Enable AWS DMS data validation on the task so the AWS DMS task compares the source and target records, and reports any mismatches.

Ans: d

Q4.Which Index that has the same partition key as the base table, but a different sort key?

A.Global secondary Index
B.Local secondary index
C.Both A and B
D.None of the above

Ans: b

Q5. An organization is currently hosting a large amount of frequently accessed data consisting of key-value pairs and semi-structured documents in their data center. They are planning to move this data to AWS. Which of one of the following services MOST effectively meets their needs?

A.Amazon Redshift
B. Amazon RDS
C.Amazon DynamoDB
D. Amazon Aurora

Ans: c

Q6.Which of the following statements are True about the Apache Hive service running on the EMR Cluster ?

A. Hive Interpreter converts every query Into a MapReduce Job.

B. Hive Interpreter converts the query into a MapReduce Job only if required.

C. Hive query output can not be writen on to S3 bucket

D. Hive is service which can take data only from HDFS for processing.

E. All of the above

Ans: b

Q7.An algorithm has been written in Python to Identify SPAM e-malls. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon 53. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage. Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the In-me the model Into Amazon Machine Learning

B. Use Amazon EMR to parallelize the text analysis tasks across the cluster us

C. Use Amazon Elasticsearch Service to store the text and then use the Python against the text index

D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon

Ans: c

Q8.Which storage approach is used by Redshift

A. Database

B. Row

C. Columnar

D. Key-Value

Ans: c

Q9. Which of the following solution will Improve the data loading performance?Statement A: Compress .csv files and use an INSERT statement to ingest data Into Amazon Redshift. Statement B: Split large .csv files, then use a COPY command to load data into Amazon Redshift.

A. Both Statement A and Statement B can be used to improve data loading perfor

B. Statement B can be used to improve data loading performance

C. Statement A can be used to improve data loading performance

D. Neither Statement A nor Statement B can be used to improve data loading performance.

Ans: b

Q10. A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema In Amazon Redshift What is the most efficient architecture strategy for this purpose?

A. When the data is saved to Amazon S3, use S3 Event Notlications and AWS contents. Insert the data into the analysis schema on Redshift.

B. Transform the unstructured data using Amazon EMR and generate CSV data. analysis schema within Redshift.

C. Load the unstructured data into Redshift, and use string parsing functions to e into the analysis schema.

D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Lambda to INSERT the data into Redshift

Ans: b

Q11. Which services are used for data visualizations

A. Qule Sight

B. Athena

C. Kineses Data Analytics

D. Awsglue

Ans: c

Q12.How many data catlogs can be created In AWS Glue

A. It depends upon number of jobs created in AWS Glue

B. Central aws data data catalog repository per account


C. Only one data catalog repository per region

D. All of the above

Ans: d

Q13. A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in through Amazon Kinesis Data Firehose with the butter Interval set to 60 seconds. The dashboard must support near-real-time data. Which visualization solution will meet these requirements?

A. Select Amazon Elasticsearch Service (Amazon ES) as the endpoint for Kinesis Data Firehose. Set up a Kibens dashboard using the data In Amazon ES with the desired analyses and visualizations.

B. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker Jupyter notebook and carry out the desired analyses and visualizations.

C. Select Amazon Redshin as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with SPICE to Amazon Redshift to create the desired analyses and visualizations.

D. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and Amazon Athena to query It. Connect Amazon QuickSight with SPICE to Athena to create the desired analyses and visualizations

Ans: a

Q14. A software company hosts an application on AWS, and new features are released weekly. As part of the application testing process, a solution must be developed that analyzes logs from each Amazon EC2 Instance to ensure that the application is working as expected after each deployment. The collection and analysis solution should be highly available with the ability to display new information with minimal delays. Which method should the company use to collect and analyze the logs?

A. Enable detalled monitoring on Amazon EC2, use Amazon CloudWatch agent to store logs in Amazon S3, and use Amazon Athena for fast, Interactive log analytics.

B. Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to collect and send data to Kinesis Data Streams to further push the data to Amazon Elasticsearch Service and visualize using Amazon QuickSight

C. Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to collect and send data Firehose to further push the data to Amazon Elasticsearch Service and Kibana. Kinesis Data

D. Use Amazon CloudWatch subscriptions to get access to a real-time feed of logs and have the logs delivered to Amazon Kinesis Data Streams to further push the data to Amazon Elasticsearch Service and Kibana.

Ans: d

Q15. IAM stands for

A. Identity & Access Management

B. Identify the access managemnt

C. Both

D. None of the above

Ans: a

Q16.You launch an Amazon EC2 Instance without an assigned AWS Identity and Access Management (IAM) role, Later, you decide that the Instance should be running with an IAM role. Which action must you take In order to have a running Amazon EC2 instance with an IAM role assigned to it?

A. Create an Image of the instance, and register the image with an IAM role assigned and an Amazon EBS VORSTA mapping

B. Create a new IAM role with the same permissions as an existing IAM role, and assign it to the running Instance

C. Create an Image of the instance, add a new IAM role with the same permissions as the desired IAM role, and deregister the image with the new role assigned.

D. Create an image of the instance, and use this image to launch a new instance with the desired IAM role assigned

Ans: b

Q17.A company has a suite of web applications that uses Amazon DynamoDB as Its database. The SysOps Administrator will be using EC2 Instances to host an application and will access the newly created DynamoDB table Which or the following will the SysOps Administrator need to do to ensure that the application has the relevant permissions to access the DynamoDB table?

A. Create an 1AM User with the required permission to DynamoDB and then attach It to the EC2 Instance
B. Create an 1AM group with the required permissions to DynamoDB and ensure the application runs on behalf of the 1AM group on the EC2 Instance
C. Creating Access Keys with the required permissions to DynamoDB and ensuring that the keys are embedded on the application
D. Create an 1 AM Role with the required permissions to DynamoDB.Grant the 1AM role to the EC2 instance

Ans: D

Q18. A company has 1 million scanned documents stored as Image files In Amazon S3. The documents contain typewritten application forms with Information Including the applicant’s first name, applicant last name, application
date, application type, and application text The company has developed a machine learning algorithm to extract the metadata values from the scanned documents. Pie company wants to allow internal data analysts to analyze and find applications using the applicant name, application date, or application text. The original Images should also be downloadable Cost control Is secondary to query performance. Which solution organizes the Images and metadata to drive Insights while meeting the requirements?

A. For each Image, use object tags to add the metadata. Use Amazon S3 Select to retrieve the files based on the applicant name and application date
B. Index the metadata and the Amazon S3 location of the Image file In Amazon Elasticsearch Service. Allow the data analysts to use Klbana to submit queries to the Elasticsearch cluster.
C. Store the metadata and the Amazon S3 location of the Image file In an Amazon Redshift table. Allow the data analysts to run ad-hoc queries on the table.
D. Store the metadata and the Amazon S3 location of the Image files In an Apache Parquet file In Amazon S3, and define a table In the AWS Glue Data Catalog. Allow data analysts to use Amazon Athena to submit custom
queries.

Ans: A

Q19. How long It will take for reading or writing operation with Elastic search

ANS – Seconds

Q20. A company developed a new election reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from AWS WAF to an Amazon S3 bucket The company Is now seeking a low-cost option to perform this infrequent data analysis with visualizations of logs in a way that requires minimal development effort. Which solution meets these requirements?

A. Use an AWS Glue crawler to create and update a table In the Glue data catalog from the logs. Use Athena to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.
B. Create a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon Elasticsearch Service (Amazon ES). Use Amazon ES to perform text- based searches of the logs for ad-hoc analyses and use Klbana for data visualizations.
C. Create an AWS Lambda function to convert the logs Into csv format Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL
queries and use Amazon QuickSight to develop data visualizations.
D. Create an Amazon EMR duster and use Amazon S3 as the data source Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

Ans: D

Q21. An organization Is currently using an Amazon EMR long-running duster with the latest Amazon EMR release for analytic Jobs and Is storing data as external tables on Amazon S3. The company needs to launch multiple transient
EMR dusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running duster. Which solution will expose the Hive metastore with the LEAST operational effort?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.
B. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.
C. Export Hive metastore Information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site
dasslflcatlon to point to the Amazon RDS database.
D. Launch an Amazon EC2 Instance, Install and configure Apache Derby, and export the Hive metastore Information to derby.

Ans: C

Q22.You are working for a hospital which uses 3-node DS2.XLARGE Redshift cluster to store medical history of patients There are some Bl dashboards which query this data and show some key metrics such as total patients admitted and the number of patients discharged successfully. These dashboards are updated every hour through SQL queries. There Is also a group of data scientists who query the database intermittently to analyse data of patients with high risk Recently, the data scientists have complained of stow querie

A. Create separate Redshift cluster for data scientists and ask them to use that for their queries
B. Create separate WLM queue for data scientists and the Bl dashboards and configure Automatic WLM
C. Change node type of Redshift duster to DC2.XLARGE using Elastic Resize.
D. Change the Redshift storage to provisioned IOPS(lnput/Output Operations per Second) to increase the I/O

Ans: B

Q23.The Redshift duster you are managing for your company has a variety of workloads. There are users who run long, complex analytical queries as well as some automated Jobs which run short, read-only queries. You are observing
that during high workload, these short-running queries are timing out and the related jobs are failing. Which solution is the simplest one to remediate this problem?

A. Use Short Query Acceleration
B. Add more nodes to your Redshift duster using Elastic Resize
C. Use manual Workload Management with concurrency scaling
D. Use automated Workload Management with concurrency scaling

Ans: A

Q24.A data engineer m a manufacturing company Is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon
Redshtift what is tie most efficient architecture strategy for this purpose?

A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data Into the analysis schema within Redshift
B.Load the unstructured data into Redshift, and use siring parsing functions to extract structured data for Inserting Into the analysis schema.
C. When the data Is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insen the data Into the analysis schema on Redshift.
D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data Into Redshift.

Ans: A

Q25.Company A operates In Country X. Company A maintains a large dataset of historical purchase orders that
contains personal data of their customers In the form of full names and telephone numbers. The dataset consists of
5 text files, 1TB each. Currently the dataset resides on-premises due to legal requirements of storing personal data
in-country. The research and development department needs to run a clustering algorithm on the dataset and wants
to use Elastic Map Reduce service In the closest AWS region. Due to geographic distance, the minimum latency
between the on-premlses system and the closet AWS region Is 200 ms.
Which option allows Company A to do clustering in the AWS Cloud and meet the legal requirement of maintaining
cs CamScanner
personal data in-country?

A. Establish a Direct Connect link between the on-premlses system and the AWS region to reduce latency. Have
the EMR cluster read the data directly from the on-premlses storage system over Direct Connect.
B. Anonymize the personal data portions of the dataset and transfer the data flies Into Amazon S3 In the AWS
region. Have the EMR cluster read the dataset using EMRFS.
C Encrypt the data flies according to encryption standards of Country X and store them on AWS region In Amazon
S3. Have the EMR cluster read the dataset using EMRFS.
D. Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto
an EBS volume. Have the EMR cluster read the dataset using EMRFS.

Ans: A

Q26.An Amazon EMR duster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple
unique data sources. The customer needs to query common fields across some of the data sets to be able to
perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?

A. Presto
B. MicroStrategy
C. Pig
D. Hive

Ans: A

Q27.Which of the following are not appropriates use cases for Amazon Simple Storage Service (Amazon S3)? (Choose
2 answers)

A. Storing web content
B. Storing a file system mounted to an Amazon Elastic Compute Cloud (Amazon EC2) Instance
C. Storing backups for a relational database
D. Bl Primary storage for a database

Ans: B D

Q28.A media advertising company handles a large number of real-time messages sourced from over 200 websites.The company’s data engineer needs to collect and process records In real time for analysis using Spark Streaming on Amazon Elastic MapReduce (EMR) The data engineer needs to
ALL raw messages as they are received as a top priority. Which Amazon Kinesis configuration meets these
requirements?

A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Pull messages
off Firehose with Spark Streaming In parallel to persistence to Amazon S3.
B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming In parallel to
AWS Lambda pushing messages from Streams to Firehose backed by Amazon Simple Storage Service (S3).
C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Use AWS
Lambda to pull messages from Firehose to Streams for processing with Spark Streaming.
D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and write row data to
Amazon Simple Storage Service (S3) before and after processing.

Ans: C


Leave a Reply

Your email address will not be published. Required fields are marked *