[Q37-Q54] Master 2022 Latest The Questions AWS Certified Data Analytics and Pass AWS-Certified-Data-Analytics-Specialty Real Exam!

Share

Master 2022 Latest The Questions AWS Certified Data Analytics and Pass AWS-Certified-Data-Analytics-Specialty Real Exam!

Penetration testers simulate AWS-Certified-Data-Analytics-Specialty exam PDF

NEW QUESTION 37
A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team's requirements?

  • A. Geospatial chart
  • B. Heat map
  • C. Line chart
  • D. Tree map

Answer: A

 

NEW QUESTION 38
An ecommerce company is migrating its business intelligence environment from on premises to the AWS Cloud. The company will use Amazon Redshift in a public subnet and Amazon QuickSight. The tables already are loaded into Amazon Redshift and can be accessed by a SQL tool.
The company starts QuickSight for the first time. During the creation of the data source, a data analytics specialist enters all the information and tries to validate the connection. An error with the following message occurs: "Creating a connection to your data source timed out." How should the data analytics specialist resolve this error?

  • A. Add the QuickSight IP address range into the Amazon Redshift security group.
  • B. Create an IAM role for QuickSight to access Amazon Redshift.
  • C. Use a QuickSight admin user for creating the dataset.
  • D. Grant the SELECT permission on Amazon Redshift tables.

Answer: D

Explanation:
Connection to the database times out
Your client connection to the database appears to hang or time out when running long queries, such as a COPY command. In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query. The results of the query might be missing or incomplete depending on when the connection stopped.

 

NEW QUESTION 39
A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
* Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
* One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data?
(Choose three.)

  • A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
  • B. For archived data, use Amazon SageMaker to perform data transformations.
  • C. For archived data, use Amazon EMR to perform data transformations.
  • D. For daily incoming data, use Amazon Redshift to perform transformations.
  • E. For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
  • F. For daily incoming data, use Amazon Athena to scan and identify the schema.

Answer: D,E,F

 

NEW QUESTION 40
A manufacturing company has many loT devices in different facilities across the world The company is using Amazon Kinesis Data Streams to collect the data from the devices The company's operations team has started to observe many WnteThroughputExceeded exceptions The operations team determines that the reason is the number of records that are being written to certain shards The data contains device ID capture date measurement type, measurement value and facility ID The facility ID is used as the partition key Which action will resolve this issue?

  • A. Change the partition key from facility ID to capture date
  • B. Archive the data on the producers' side
  • C. Increase the number of shards
  • D. Change the partition key from facility ID to a randomly generated key

Answer: C

 

NEW QUESTION 41
A hospital is building a research data lake to ingest data from electronic health records (EHR) systems from multiple hospitals and clinics. The EHR systems are independent of each other and do not have a common patient identifier. The data engineering team is not experienced in machine learning (ML) and has been asked to generate a unique patient identifier for the ingested records.
Which solution will accomplish this task?

  • A. An AWS Glue ETL job with the FindMatches transform
  • B. An AWS Glue ETL job with the ResolveChoice transform
  • C. Amazon SageMaker Ground Truth
  • D. Amazon Kendra

Answer: A

Explanation:
Explanation
Matching Records with AWS Lake Formation FindMatches

 

NEW QUESTION 42
A human resources company maintains a 10-node Amazon Redshift cluster to run analytics queries on the company's data. The Amazon Redshift cluster contains a product table and a transactions table, and both tables have a product_sku column. The tables are over 100 GB in size. The majority of queries run on both tables.
Which distribution style should the company use for the two tables to achieve optimal query performance?

  • A. A KEY distribution style for both tables
  • B. An EVEN distribution style for the product table and an KEY distribution style for the transactions table
  • C. An EVEN distribution style for both tables
  • D. An ALL distribution style for the product table and an EVEN distribution style for the transactions table

Answer: A

 

NEW QUESTION 43
A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost-effective solution.
Which approach meets these requirements for optimizing and querying the log data?

  • A. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
  • B. Launch a transient Amazon EMR cluster nightly to transform new log files into Apache ORC format and partition by year, month, and day. Use Amazon Redshift Spectrum to query the data.
  • C. Use an AWS Glue job nightly to transform new log files into .csv format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
  • D. Launch a long-running Amazon EMR cluster that continuously transforms new log files from Amazon S3 into its Hadoop Distributed File System (HDFS) storage and partitions by year, month, and day. Use Apache Presto to query the optimized format.

Answer: B

 

NEW QUESTION 44
An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: "Command Failed with Exit Code 1." Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90-95% soon after. The average memory usage across all executors continues to be less than 4%.
The data engineer also notices the following error while examining the related Amazon CloudWatch Logs.
What should the data engineer do to solve the failure in the MOST cost-effective way?

  • A. Change the worker type from Standard to G.2X.
  • B. Modify the AWS Glue ETL code to use the 'groupFiles': 'inPartition' feature.
  • C. Modify maximum capacity to increase the total maximum data processing units (DPUs) used.
  • D. Increase the fetch size setting by using AWS Glue dynamics frame.

Answer: B

Explanation:
https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html#monitor-debug-oom-fix

 

NEW QUESTION 45
A mobile gaming company wants to capture data from its gaming app and make the data available for analysis immediately. The data record size will be approximately 20 KB. The company is concerned about achieving optimal throughput from each device. Additionally, the company wants to develop a data stream processing application with dedicated throughput for each consumer.
Which solution would achieve this goal?

  • A. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature while consuming the data.
  • B. Have the app use Amazon Kinesis Producer Library (KPL) to send data to Kinesis Data Firehose. Use the enhanced fan-out feature while consuming the data.
  • C. Have the app call the PutRecordBatch API to send data to Amazon Kinesis Data Firehose. Submit a support case to enable dedicated throughput on the account.
  • D. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Host the stream- processing application on Amazon EC2 with Auto Scaling.

Answer: A

Explanation:
Explanation
https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html

 

NEW QUESTION 46
Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each teams Amazon S3 bucket. The EMR cluster has Kerberos enabled and is configured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.
Which steps will satisfy the security requirements?

  • A. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3.
    Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the base IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
  • B. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3.
    Create three additional IAM roles, each granting access to each team's specific bucket. Add the additional IAM roles to the cluster's EMR role for the EC2 trust policy. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
  • C. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3.
    Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
  • D. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3.
    Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.

Answer: C

 

NEW QUESTION 47
A company is hosting an enterprise reporting solution with Amazon Redshift. The application provides reporting capabilities to three main groups: an executive group to access financial reports, a data analyst group to run long-running ad-hoc queries, and a data engineering group to run stored procedures and ETL processes.
The executive team requires queries to run with optimal performance. The data engineering team expects queries to take minutes.
Which Amazon Redshift feature meets the requirements for this task?

  • A. Concurrency scaling
  • B. Short query acceleration (SQA)
  • C. Workload management (WLM)
  • D. Materialized views

Answer: D

Explanation:
Explanation
Materialized views:

 

NEW QUESTION 48
A company's marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
* The data size is approximately 32 TB uncompressed.
* There is a low volume of single-row inserts each day.
* There is a high volume of aggregation queries each day.
* Multiple complex joins are performed.
* The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?

  • A. Amazon Neptune
  • B. Amazon Elasticsearch
  • C. Amazon Redshift
  • D. Amazon Aurora MySQL

Answer: C

 

NEW QUESTION 49
A regional energy company collects voltage data from sensors attached to buildings. To address any known dangerous conditions, the company wants to be alerted when a sequence of two voltage drops is detected within 10 minutes of a voltage spike at the same building. It is important to ensure that all messages are delivered as quickly as possible. The system must be fully managed and highly available. The company also needs a solution that will automatically scale up as it covers additional cites with this monitoring feature. The alerting system is subscribed to an Amazon SNS topic for remediation.
Which solution meets these requirements?

  • A. Create an Amazon Kinesis data stream to capture the incoming sensor data and create another stream for alert messages. Set up AWS Application Auto Scaling on both. Create a Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream. Configure an AWS Lambda function to poll the message stream and publish to the SNS topic.
  • B. Create an Amazon Managed Streaming for Kafka cluster to ingest the data, and use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Use the Spark Streaming application to detect the known event sequence and send the SNS message.
  • C. Create a REST-based web service using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with sufficient Provisioned IOPS (PIOPS). In the Lambda function, store incoming events in the RDS database and query the latest data to detect the known event sequence and send the SNS message.
  • D. Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.

Answer: A

 

NEW QUESTION 50
A company wants to improve the data load time of a sales data dashboard. Data has been collected as .csv files and stored within an Amazon S3 bucket that is partitioned by date. The data is then loaded to an Amazon Redshift data warehouse for frequent analysis. The data volume is up to 500 GB per day.
Which solution will improve the data loading performance?

  • A. Split large .csv files, then use a COPY command to load data into Amazon Redshift.
  • B. Compress .csv files and use an INSERT statement to ingest data into Amazon Redshift.
  • C. Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
  • D. Load the .csv files in an unsorted key order and vacuum the table in Amazon Redshift.

Answer: A

Explanation:
Explanation
https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

 

NEW QUESTION 51
A large retailer has successfully migrated to an Amazon S3 data lake architecture. The company's marketing team is using Amazon Redshift and Amazon QuickSight to analyze data, and derive and visualize insights. To ensure the marketing team has the most up-to-date actionable information, a data analyst implements nightly refreshes of Amazon Redshift using terabytes of updates from the previous day.
After the first nightly refresh, users report that half of the most popular dashboards that had been running correctly before the refresh are now running much slower. Amazon CloudWatch does not show any alerts.
What is the MOST likely cause for the performance degradation?

  • A. The dashboards are suffering from inefficient SQL queries.
  • B. The nightly data refreshes left the dashboard tables in need of a vacuum operation that could not be automatically performed by Amazon Redshift due to ongoing user workloads.
  • C. The cluster is undersized for the queries being run by the dashboards.
  • D. The nightly data refreshes are causing a lingering transaction that cannot be automatically closed by Amazon Redshift due to ongoing user workloads.

Answer: B

Explanation:
https://github.com/awsdocs/amazon-redshift-developer-guide/issues/21

 

NEW QUESTION 52
A banking company wants to collect large volumes of transactional data using Amazon Kinesis Data Streams for real-time analytics. The company uses PutRecord to send data to Amazon Kinesis, and has observed network outages during certain times of the day. The company wants to obtain exactly once semantics for the entire processing pipeline.
What should the company do to obtain these characteristics?

  • A. Design the data producer so events are not ingested into Kinesis Data Streams multiple times.
  • B. Rely on the exactly one processing semantics of Apache Flink and Apache Spark Streaming included in Amazon EMR.
  • C. Design the application so it can remove duplicates during processing be embedding a unique ID in each record.
  • D. Rely on the processing semantics of Amazon Kinesis Data Analytics to avoid duplicate processing of events.

Answer: C

 

NEW QUESTION 53
A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.
How should the company meet these requirements?

  • A. Use a single COPY command to load the data into the Amazon Redshift cluster.
  • B. Use multiple COPY commands to load the data into the Amazon Redshift cluster.
  • C. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster.
  • D. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node.

Answer: A

Explanation:
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html

 

NEW QUESTION 54
......

Penetration testers simulate AWS-Certified-Data-Analytics-Specialty exam: https://www.testpdf.com/AWS-Certified-Data-Analytics-Specialty-exam-braindumps.html

Bestselling On-The-Job Reference Exam Questions: https://drive.google.com/open?id=1YUiFVoMydzDd2v5KcZtNkAJ6SDHeqsQL