How Many Questions Of DAS-C01 Test Questions

we provide Free Amazon-Web-Services DAS-C01 actual exam which are the best for clearing DAS-C01 test, and to get certified by Amazon-Web-Services AWS Certified Data Analytics - Specialty. The DAS-C01 Questions & Answers covers all the knowledge points of the real DAS-C01 exam. Crack your Amazon-Web-Services DAS-C01 Exam with latest dumps, guaranteed!

Free demo questions for Amazon-Web-Services DAS-C01 Exam Dumps Below:

NEW QUESTION 1
A financial company uses Amazon S3 as its data lake and has set up a data warehouse using a multi-node Amazon Redshift cluster. The data files in the data lake are organized in folders based on the data source of each data file. All the data files are loaded to one table in the Amazon Redshift cluster using a separate COPY command for each data file location. With this approach, loading all the data files into Amazon Redshift takes a long time to complete. Users want a faster solution with little or no increase in cost while maintaining the segregation of the data files in the S3 data lake.
Which solution meets these requirements?

  • A. Use Amazon EMR to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
  • B. Load all the data files in parallel to Amazon Aurora, and run an AWS Glue job to load the data into Amazon Redshift.
  • C. Use an AWS Glue job to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
  • D. Create a manifest file that contains the data file locations and issue a COPY command to load the data into Amazon Redshift.

Answer: D

Explanation:
https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html "You can use a manifest to ensure that the COPY command loads all of the required files, and only the required files, for a data load"

NEW QUESTION 2
A company is migrating from an on-premises Apache Hadoop cluster to an Amazon EMR cluster. The cluster runs only during business hours. Due to a company requirement to avoid intraday cluster failures, the EMR cluster must be highly available. When the cluster is terminated at the end of each business day, the data must persist.
Which configurations would enable the EMR cluster to meet these requirements? (Choose three.)

  • A. EMR File System (EMRFS) for storage
  • B. Hadoop Distributed File System (HDFS) for storage
  • C. AWS Glue Data Catalog as the metastore for Apache Hive
  • D. MySQL database on the master node as the metastore for Apache Hive
  • E. Multiple master nodes in a single Availability Zone
  • F. Multiple master nodes in multiple Availability Zones

Answer: ACE

Explanation:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha.html "Note : The cluster can reside only in one Availability Zone or subnet."

NEW QUESTION 3
A retail company’s data analytics team recently created multiple product sales analysis dashboards for the average selling price per product using Amazon QuickSight. The dashboards were created from .csv files uploaded to Amazon S3. The team is now planning to share the dashboards with the respective external product owners by creating individual users in Amazon QuickSight. For compliance and governance reasons, restricting access is a key requirement. The product owners should view only their respective product analysis in the dashboard reports.
Which approach should the data analytics team take to allow product owners to view only their products in the dashboard?

  • A. Separate the data by product and use S3 bucket policies for authorization.
  • B. Separate the data by product and use IAM policies for authorization.
  • C. Create a manifest file with row-level security.
  • D. Create dataset rules with row-level security.

Answer: D

Explanation:
https://docs.aws.amazon.com/quicksight/latest/user/restrict-access-to-a-data-set-using-row-level-security.html

NEW QUESTION 4
An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: “Command Failed with Exit Code 1.”
Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90–95% soon after. The average memory usage across all executors continues to be less than 4%.
The data engineer also notices the following error while examining the related Amazon CloudWatch Logs. What should the data engineer do to solve the failure in the MOST cost-effective way?

  • A. Change the worker type from Standard to G.2X.
  • B. Modify the AWS Glue ETL code to use the ‘groupFiles’: ‘inPartition’ feature.
  • C. Increase the fetch size setting by using AWS Glue dynamics frame.
  • D. Modify maximum capacity to increase the total maximum data processing units (DPUs) used.

Answer: B

Explanation:
https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html#monitor-debug-oom

NEW QUESTION 5
A university intends to use Amazon Kinesis Data Firehose to collect JSON-formatted batches of water quality readings in Amazon S3. The readings are from 50 sensors scattered across a local lake. Students will query the stored data using Amazon Athena to observe changes in a captured metric over time, such as water temperature or acidity. Interest has grown in the study, prompting the university to reconsider how data will be stored.
Which data format and partitioning choices will MOST significantly reduce costs? (Choose two.)

  • A. Store the data in Apache Avro format using Snappy compression.
  • B. Partition the data by year, month, and day.
  • C. Store the data in Apache ORC format using no compression.
  • D. Store the data in Apache Parquet format using Snappy compression.
  • E. Partition the data by sensor, year, month, and day.

Answer: CD

NEW QUESTION 6
A media content company has a streaming playback application. The company wants to collect and analyze the data to provide near-real-time feedback on playback issues. The company needs to consume this data and return results within 30 seconds according to the service-level agreement (SLA). The company needs the consumer to identify playback issues, such as quality during a specified timeframe. The data will be emitted as JSON and may change schemas over time.
Which solution will allow the company to collect data for processing while meeting these requirements?

  • A. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure an S3 event trigger an AWS Lambda function to process the dat
  • B. The Lambda function will consume the data and process it to identify potential playback issue
  • C. Persist the raw data to Amazon S3.
  • D. Send the data to Amazon Managed Streaming for Kafka and configure an Amazon Kinesis Analytics for Java application as the consume
  • E. The application will consume the data and process it to identify potential playback issue
  • F. Persist the raw data to Amazon DynamoDB.
  • G. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure Amazon S3 to trigger an event for AWS Lambda to proces
  • H. The Lambda function will consume the data and process it to identify potential playback issue
  • I. Persist the raw data to Amazon DynamoDB.
  • J. Send the data to Amazon Kinesis Data Streams and configure an Amazon Kinesis Analytics for Java application as the consume
  • K. The application will consume the data and process it to identify potential playback issue
  • L. Persist the raw data to Amazon S3.

Answer: D

Explanation:
https://aws.amazon.com/blogs/aws/new-amazon-kinesis-data-analytics-for-java/

NEW QUESTION 7
A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?

  • A. Increase the number of retrie
  • B. Decrease the timeout valu
  • C. Increase the job concurrency.
  • D. Keep the number of retries at 0. Decrease the timeout valu
  • E. Increase the job concurrency.
  • F. Keep the number of retries at 0. Decrease the timeout valu
  • G. Keep the job concurrency at 1.
  • H. Keep the number of retries at 0. Increase the timeout valu
  • I. Keep the job concurrency at 1.

Answer: B

NEW QUESTION 8
A large company receives files from external parties in Amazon EC2 throughout the day. At the end of the day, the files are combined into a single file, compressed into a gzip file, and uploaded to Amazon S3. The total size of all the files is close to 100 GB daily. Once the files are uploaded to Amazon S3, an AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster.
Which program modification will accelerate the COPY process?

  • A. Upload the individual files to Amazon S3 and run the COPY command as soon as the files become available.
  • B. Split the number of files so they are equal to a multiple of the number of slices in the Amazon Redshift cluste
  • C. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
  • D. Split the number of files so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluste
  • E. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
  • F. Apply sharding by breaking up the files so the distkey columns with the same values go to the same file.Gzip and upload the sharded files to Amazon S3. Run the COPY command on the files.

Answer: B

NEW QUESTION 9
A data analytics specialist is setting up workload management in manual mode for an Amazon Redshift environment. The data analytics specialist is defining query monitoring rules to manage system performance and user experience of an Amazon Redshift cluster.
Which elements must each query monitoring rule include?

  • A. A unique rule name, a query runtime condition, and an AWS Lambda function to resubmit any failed queries in off hours
  • B. A queue name, a unique rule name, and a predicate-based stop condition
  • C. A unique rule name, one to three predicates, and an action
  • D. A workload name, a unique rule name, and a query runtime-based condition

Answer: C

NEW QUESTION 10
A manufacturing company has been collecting IoT sensor data from devices on its factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A data analyst has determined that, at an expected ingestion rate of about 2 TB per day, the cluster will be undersized in less than 4 months. A long-term solution is needed. The data analyst has indicated that most queries only reference the most recent 13 months of data, yet there are also quarterly reports that need to query all the data generated from the past 7 years. The chief technology officer (CTO) is concerned about the costs, administrative effort, and performance of a long-term solution.
Which solution should the data analyst use to meet these requirements?

  • A. Create a daily job in AWS Glue to UNLOAD records older than 13 months to Amazon S3 and delete those records from Amazon Redshif
  • B. Create an external table in Amazon Redshift to point to the S3 locatio
  • C. Use Amazon Redshift Spectrum to join to data that is older than 13 months.
  • D. Take a snapshot of the Amazon Redshift cluste
  • E. Restore the cluster to a new cluster using dense storage nodes with additional storage capacity.
  • F. Execute a CREATE TABLE AS SELECT (CTAS) statement to move records that are older than 13 months to quarterly partitioned data in Amazon Redshift Spectrum backed by Amazon S3.
  • G. Unload all the tables in Amazon Redshift to an Amazon S3 bucket using S3 Intelligent-Tierin
  • H. Use AWS Glue to crawl the S3 bucket location to create external tables in an AWS Glue Data Catalo
  • I. Create an Amazon EMR cluster using Auto Scaling for any daily analytics needs, and use Amazon Athena for the quarterly reports, with both using the same AWS Glue Data Catalog.

Answer: A

NEW QUESTION 11
An Amazon Redshift database contains sensitive user data. Logging is necessary to meet compliance requirements. The logs must contain database authentication attempts, connections, and disconnections. The logs must also contain each query run against the database and record which database user ran each query.
Which steps will create the required logs?

  • A. Enable Amazon Redshift Enhanced VPC Routin
  • B. Enable VPC Flow Logs to monitor traffic.
  • C. Allow access to the Amazon Redshift database using AWS IAM onl
  • D. Log access using AWS CloudTrail.
  • E. Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI.
  • F. Enable and download audit reports from AWS Artifact.

Answer: C

NEW QUESTION 12
A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.
A trips fact table for information on completed rides. A drivers dimension table for driver profiles. A customers fact table holding customer profile information.
The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?

  • A. Use DISTSTYLE KEY (destination) for the trips table and sort by dat
  • B. Use DISTSTYLE ALL for the drivers and customers tables.
  • C. Use DISTSTYLE EVEN for the trips table and sort by dat
  • D. Use DISTSTYLE ALL for the drivers table.Use DISTSTYLE EVEN for the customers table.
  • E. Use DISTSTYLE KEY (destination) for the trips table and sort by dat
  • F. Use DISTSTYLE ALL for the drivers tabl
  • G. Use DISTSTYLE EVEN for the customers table.
  • H. Use DISTSTYLE EVEN for the drivers table and sort by dat
  • I. Use DISTSTYLE ALL for both fact tables.

Answer: C

Explanation:
https://www.matillion.com/resources/blog/aws-redshift-performance-choosing-the-right-distribution-styles/#:~:t https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-best-dist-key.html

NEW QUESTION 13
A company operates toll services for highways across the country and collects data that is used to understand usage patterns. Analysts have requested the ability to run traffic reports in near-real time. The company is interested in building an ingestion pipeline that loads all the data into an Amazon Redshift cluster and alerts operations personnel when toll traffic for a particular toll station does not meet a specified threshold. Station data and the corresponding threshold values are stored in Amazon S3.
Which approach is the MOST efficient way to meet these requirements?

  • A. Use Amazon Kinesis Data Firehose to collect data and deliver it to Amazon Redshift and Amazon Kinesis Data Analytics simultaneousl
  • B. Create a reference data source in Kinesis Data Analytics to temporarily store the threshold values from Amazon S3 and compare the count of vehicles for a particular toll station against its corresponding threshold valu
  • C. Use AWS Lambda to publish an Amazon Simple Notification Service (Amazon SNS) notification if the threshold is not met.
  • D. Use Amazon Kinesis Data Streams to collect all the data from toll station
  • E. Create a stream in Kinesis Data Streams to temporarily store the threshold values from Amazon S3. Send both streams to Amazon Kinesis Data Analytics to compare the count of vehicles for a particular toll station against its corresponding threshold valu
  • F. Use AWS Lambda to publish an Amazon Simple Notification Service (Amazon SNS) notification if the threshold is not me
  • G. Connect Amazon Kinesis Data Firehose to Kinesis Data Streams to deliver the data to Amazon Redshift.
  • H. Use Amazon Kinesis Data Firehose to collect data and deliver it to Amazon Redshif
  • I. Then, automatically trigger an AWS Lambda function that queries the data in Amazon Redshift, compares the count of vehicles for a particular toll station against its corresponding threshold values read from Amazon S3, and publishes an Amazon Simple Notification Service (Amazon SNS) notification if the threshold is not met.
  • J. Use Amazon Kinesis Data Firehose to collect data and deliver it to Amazon Redshift and Amazon Kinesis Data Analytics simultaneousl
  • K. Use Kinesis Data Analytics to compare the count of vehicles against the threshold value for the station stored in a table as an in-application stream based on information stored in Amazon S3. Configure an AWS Lambda function as an output for the application that will publish an Amazon Simple Queue Service (Amazon SQS) notification to alert operations personnel if the threshold is not met.

Answer: D

NEW QUESTION 14
A financial services company needs to aggregate daily stock trade data from the exchanges into a data store.
The company requires that data be streamed directly into the data store, but also occasionally allows data to be modified using SQL. The solution should integrate complex, analytic queries running with minimal latency. The solution must provide a business intelligence dashboard that enables viewing of the top contributors to anomalies in stock prices.
Which solution meets the company’s requirements?

  • A. Use Amazon Kinesis Data Firehose to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
  • B. Use Amazon Kinesis Data Streams to stream data to Amazon Redshif
  • C. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
  • D. Use Amazon Kinesis Data Firehose to stream data to Amazon Redshif
  • E. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
  • F. Use Amazon Kinesis Data Streams to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.

Answer: C

NEW QUESTION 15
A company is streaming its high-volume billing data (100 MBps) to Amazon Kinesis Data Streams. A data analyst partitioned the data on account_id to ensure that all records belonging to an account go to the same Kinesis shard and order is maintained. While building a custom consumer using the Kinesis Java SDK, the data analyst notices that, sometimes, the messages arrive out of order for account_id. Upon further investigation, the data analyst discovers the messages that are out of order seem to be arriving from different shards for the same account_id and are seen when a stream resize runs.
What is an explanation for this behavior and what is the solution?

  • A. There are multiple shards in a stream and order needs to be maintained in the shar
  • B. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.
  • C. The hash key generation process for the records is not working correctl
  • D. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.
  • E. The records are not being received by Kinesis Data Streams in orde
  • F. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.
  • G. The consumer is not processing the parent shard completely before processing the child shards after a stream resiz
  • H. The data analyst should process the parent shard completely first before processing the child shards.

Answer: D

Explanation:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-after-resharding.html the parent shards that remain after the reshard could still contain data that you haven't read yet that was added to the stream before the reshard. If you read data from the child shards before having read all data from the parent shards, you could read data for a particular hash key out of the order given by the data records' sequence numbers. Therefore, assuming that the order of the data is important, you should, after a reshard, always continue to read data from the parent shards until it is exhausted. Only then should you begin reading data from the child shards.

NEW QUESTION 16
A company has a data lake on AWS that ingests sources of data from multiple business units and uses Amazon Athena for queries. The storage layer is Amazon S3 using the AWS Glue Data Catalog. The company wants to make the data available to its data scientists and business analysts. However, the company first needs to manage data access for Athena based on user roles and responsibilities.
What should the company do to apply these access controls with the LEAST operational overhead?

  • A. Define security policy-based rules for the users and applications by role in AWS Lake Formation.
  • B. Define security policy-based rules for the users and applications by role in AWS Identity and Access Management (IAM).
  • C. Define security policy-based rules for the tables and columns by role in AWS Glue.
  • D. Define security policy-based rules for the tables and columns by role in AWS Identity and Access Management (IAM).

Answer: D

NEW QUESTION 17
A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?

  • A. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution ste
  • B. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection fla
  • C. Use Amazon CloudWatch Events to schedule the Lambda function to run daily.
  • D. Use the AWS Management Console to spin up an Amazon EMR cluster with Python Hu
  • E. Hive, and Apache Oozi
  • F. Set the termination protection flag to true and use Spot Instances for the core nodes of the cluste
  • G. Configure an Oozie workflow in the cluster to invoke the Hive script daily.
  • H. Create an AWS Glue job with the Hive script to perform the batch operatio
  • I. Configure the job to run once a day using a time-based schedule.
  • J. Use AWS Lambda layers and load the Hive runtime to AWS Lambda and copy the Hive script.Schedule the Lambda function to run daily by creating a workflow using AWS Step Functions.

Answer: C

NEW QUESTION 18
A manufacturing company wants to create an operational analytics dashboard to visualize metrics from equipment in near-real time. The company uses Amazon Kinesis Data Streams to stream the data to other applications. The dashboard must automatically refresh every 5 seconds. A data analytics specialist must design a solution that requires the least possible implementation effort.
Which solution meets these requirements?

  • A. Use Amazon Kinesis Data Firehose to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
  • B. Use Apache Spark Streaming on Amazon EMR to read the data in near-real tim
  • C. Develop a custom application for the dashboard by using D3.js.
  • D. Use Amazon Kinesis Data Firehose to push the data into an Amazon Elasticsearch Service (Amazon ES) cluste
  • E. Visualize the data by using a Kibana dashboard.
  • F. Use AWS Glue streaming ETL to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.

Answer: B

NEW QUESTION 19
A banking company is currently using an Amazon Redshift cluster with dense storage (DS) nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.)

  • A. Set up a trusted connection with HSM using a client and server certificate with automatic key rotation.
  • B. Modify the cluster with an HSM encryption option and automatic key rotation.
  • C. Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
  • D. Enable HSM with key rotation through the AWS CLI.
  • E. Enable Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) encryption in the HSM.

Answer: BD

NEW QUESTION 20
......

P.S. Easily pass DAS-C01 Exam with 130 Q&As DumpSolutions.com Dumps & pdf Version, Welcome to Download the Newest DumpSolutions.com DAS-C01 Dumps: https://www.dumpsolutions.com/DAS-C01-dumps/ (130 New Questions)