무료 온라인 액세스 Amazon.Data-Engineer-Associate-KR.v2026-04-11.q93 모의 시험 (Page 14)

Data-Engineer-Associate-KR 문제 61

데이터 엔지니어는 10개의 소스 시스템에서 Amazon Redshift 데이터베이스에 있는 10개의 테이블로 데이터를 처리하고 로드하기 위해 ETL(추출, 변환 및 로드) 파이프라인을 구축해야 합니다. 모든 소스 시스템은 15분마다 .csv, JSON 또는 Apache Parquet 파일을 생성합니다. 소스 시스템은 모두 파일을 하나의 Amazon S3 버킷으로 전달합니다. 파일 크기는 10MB에서 20GB까지입니다. ETL 파이프라인은 데이터 스키마 변경에도 불구하고 올바르게 작동해야 합니다.
이러한 요구 사항을 충족하는 데이터 파이프라인 솔루션은 무엇인가요? (2개를 선택하세요.)

A. Amazon EventBridge 규칙을 사용하여 15분마다 AWS Glue 작업을 실행합니다. 데이터를 처리하고 Amazon Redshift 테이블에 로드하도록 AWS Glue 작업을 구성합니다.

B. Amazon EventBridge 규칙을 사용하여 15분마다 AWS Glue 워크플로 작업을 호출합니다. AWS Glue 크롤러를 실행한 다음 크롤러 실행이 성공적으로 완료되면 AWS Glue 작업을 실행하는 온디맨드 트리거를 갖도록 AWS Glue 워크플로를 구성합니다. 데이터를 처리하고 Amazon Redshift 테이블에 로드하도록 AWS Glue 작업을 구성합니다.

C. 파일이 S3 버킷에 로드될 때 AWS Glue 크롤러를 호출하도록 AWS Lambda 함수를 구성합니다. 데이터를 처리하고 Amazon Redshift 테이블에 로드하도록 AWS Glue 작업을 구성합니다.
AWS Glue 작업을 실행하기 위한 두 번째 Lambda 함수를 생성합니다. AWS Glue 크롤러 실행이 성공적으로 완료되면 두 번째 Lambda 함수를 호출하는 Amazon EventBridge 규칙을 생성합니다.

D. 파일이 S3 버킷에 로드될 때 AWS Glue 워크플로를 호출하도록 AWS Lambda 함수를 구성합니다. AWS Glue 크롤러를 실행한 다음 크롤러 실행이 성공적으로 완료되면 AWS Glue 작업을 실행하는 온디맨드 트리거를 갖도록 AWS Glue 워크플로를 구성합니다. 데이터를 처리하고 Amazon Redshift 테이블에 로드하도록 AWS Glue 작업을 구성합니다.

E. 파일이 S3 버킷에 로드될 때 AWS Glue 작업을 호출하도록 AWS Lambda 함수를 구성합니다. S3 버킷의 파일을 Apache Spark DataFrame으로 읽도록 AWS Glue 작업을 구성합니다. DataFrame의 더 작은 파티션을 Amazon Kinesis Data Firehose 전송 스트림에 배치하도록 AWS Glue 작업을 구성합니다. Amazon Redshift 테이블에 데이터를 로드하도록 전송 스트림을 구성합니다.

정답: A,B

Using an Amazon EventBridge rule to run an AWS Glue job or invoke an AWS Glue workflow job every 15 minutes are two possible solutions that will meet the requirements. AWS Glue is a serverless ETL service that can process and load data from various sources to various targets, including Amazon Redshift. AWS Glue can handle different data formats, such as CSV, JSON, and Parquet, and also support schema evolution, meaning it can adapt to changes in the data schema over time. AWS Glue can also leverage Apache Spark to perform distributed processing and transformation of large datasets. AWS Glue integrates with Amazon EventBridge, which is a serverless event bus service that can trigger actions based on rules and schedules. By using an Amazon EventBridge rule, you can invoke an AWS Glue job or workflow every 15 minutes, and configure the job or workflow to run an AWS Glue crawler and then load the data into the Amazon Redshift tables. This way, you can build a cost-effective and scalable ETL pipeline that can handle data from 10 source systems and function correctly despite changes to the data schema.
The other options are not solutions that will meet the requirements. Option C, configuring an AWS Lambda function to invoke an AWS Glue crawler when a file is loaded into the S3 bucket, and creating a second Lambda function to run the AWS Glue job, is not a feasible solution, as it would require a lot of Lambda invocations and coordination. AWS Lambda has some limits on the execution time, memory, and concurrency, which can affect the performance and reliability of the ETL pipeline. Option D, configuring an AWS Lambda function to invoke an AWS Glue workflow when a file is loaded into the S3 bucket, is not a necessary solution, as you can use an Amazon EventBridge rule to invoke the AWS Glue workflow directly, without the need for a Lambda function. Option E, configuring an AWS Lambda function to invoke an AWS Glue job when a file is loaded into the S3 bucket, and configuring the AWS Glue job to put smaller partitions of the DataFrame into an Amazon Kinesis Data Firehose delivery stream, is not a cost-effective solution, as it would incur additional costs for Lambda invocations and data delivery. Moreover, using Amazon Kinesis Data Firehose to load data into Amazon Redshift is not suitable for frequent and small batches of data, as it can cause performance issues and data fragmentation. References:
* AWS Glue
* Amazon EventBridge
* Using AWS Glue to run ETL jobs against non-native JDBC data sources
* [AWS Lambda quotas]
* [Amazon Kinesis Data Firehose quotas]

Data-Engineer-Associate-KR 문제 62

데이터 엔지니어는 회사의 Amazon S3 버킷과 Amazon RDS 데이터베이스를 기반으로 엔터프라이즈 데이터 카탈로그를 구축해야 합니다. 데이터 카탈로그에는 카탈로그의 데이터에 대한 스토리지 형식 메타데이터가 포함되어야 합니다.
가장 적은 노력으로 이러한 요구 사항을 충족할 수 있는 솔루션은 무엇일까요?

A. AWS Glue 크롤러를 사용하여 S3 버킷과 RDS 데이터베이스를 스캔하고 데이터 카탈로그를 빌드합니다. 데이터 관리자를 사용하여 데이터를 검사하고 데이터 카탈로그를 데이터 형식으로 업데이트합니다.

B. AWS Glue 크롤러를 사용하여 데이터 카탈로그를 빌드합니다. AWS Glue 크롤러 분류기를 사용하여 데이터 형식을 인식하고 카탈로그에 형식을 저장합니다.

C. Amazon Macie를 사용하여 데이터 카탈로그를 구축하고 민감한 데이터 요소를 식별합니다. Macie에서 데이터 형식 정보를 수집합니다.

D. 스크립트를 사용하여 데이터 요소를 스캔하고 데이터 형식에 따라 데이터 분류를 지정합니다.

Data-Engineer-Associate-KR 문제 63

회사에는 Amazon API Gateway를 사용하여 REST API를 호출하는 프런트엔드 ReactJS 웹 사이트가 있습니다. API는 웹사이트의 기능을 수행합니다. 데이터 엔지니어는 API 게이트웨이를 통해 가끔 호출될 수 있는 Python 스크립트를 작성해야 합니다. 코드는 API Gateway에 결과를 반환해야 합니다.
최소한의 운영 오버헤드로 이러한 요구 사항을 충족하는 솔루션은 무엇입니까?

A. Amazon Elastic Container Service(Amazon ECS) 클러스터에 사용자 지정 Python 스크립트를 배포합니다.

B. 동시성이 프로비저닝된 AWS Lambda Python 함수를 생성합니다.

C. Amazon Elastic Kubernetes Service(Amazon EKS)에서 API 게이트웨이와 통합할 수 있는 사용자 지정 Python 스크립트를 배포합니다.

D. AWS Lambda 함수를 생성합니다. 모의 이벤트를 사용하여 5분마다 Lambda 함수를 호출하도록 Amazon EventBridge 규칙을 예약하여 함수가 웜 상태인지 확인하세요.

정답: B

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. You can use Lambda to create functions that perform custom logic and integrate with other AWS services, such as API Gateway. Lambda automatically scales your application by running code in response to each trigger. You pay only for the compute time you consume1.
Amazon ECS is a fully managed container orchestration service that allows you to run and scale containerized applications on AWS. You can use ECS to deploy, manage, and scale Docker containers using either Amazon EC2 instances or AWS Fargate, a serverless compute engine for containers2.
Amazon EKS is a fully managed Kubernetes service that allows you to run Kubernetes clusters on AWS without needing to install, operate, or maintain your own Kubernetes control plane. You can use EKS to deploy, manage, and scale containerized applications using Kubernetes on AWS3.
The solution that meets the requirements with the least operational overhead is to create an AWS Lambda Python function with provisioned concurrency. This solution has the following advantages:
It does not require you to provision, manage, or scale any servers or clusters, as Lambda handles all the infrastructure for you. This reduces the operational complexity and cost of running your code.
It allows you to write your Python script as a Lambda function and integrate it with API Gateway using a simple configuration. API Gateway can invoke your Lambda function synchronously or asynchronously, and return the results to the frontend website.
It ensures that your Lambda function is ready to respond to API requests without any cold start delays, by using provisioned concurrency. Provisioned concurrency is a feature that keeps your function initialized and hyper-ready to respond in double-digit milliseconds. You can specify the number of concurrent executions that you want to provision for your function.
Option A is incorrect because it requires you to deploy a custom Python script on an Amazon ECS cluster.
This solution has the following disadvantages:
It requires you to provision, manage, and scale your own ECS cluster, either using EC2 instances or Fargate.
This increases the operational complexity and cost of running your code.
It requires you to package your Python script as a Docker container image and store it in a container registry, such as Amazon ECR or Docker Hub. This adds an extra step to your deployment process.
It requires you to configure your ECS cluster to integrate with API Gateway, either using an Application Load Balancer or a Network Load Balancer. This adds another layer of complexity to your architecture.
Option C is incorrect because it requires you to deploy a custom Python script that can integrate with API Gateway on Amazon EKS. This solution has the following disadvantages:
It requires you to provision, manage, and scale your own EKS cluster, either using EC2 instances or Fargate.
This increases the operational complexity and cost of running your code.
It requires you to package your Python script as a Docker container image and store it in a container registry, such as Amazon ECR or Docker Hub. This adds an extra step to your deployment process.
It requires you to configure your EKS cluster to integrate with API Gateway, either using an Application Load Balancer, a Network Load Balancer, or a service of type LoadBalancer. This adds another layer of complexity to your architecture.
Option D is incorrect because it requires you to create an AWS Lambda function and ensure that the function is warm by scheduling an Amazon EventBridge rule to invoke the Lambda function every 5 minutes by using mock events. This solution has the following disadvantages:
It does not guarantee that your Lambda function will always be warm, as Lambda may scale down your function if it does not receive any requests for a long period of time. This may cause cold start delays when your function is invoked by API Gateway.
It incurs unnecessary costs, as you pay for the compute time of your Lambda function every time it is invoked by the EventBridge rule, even if it does not perform any useful work1.
1: AWS Lambda - Features
2: Amazon Elastic Container Service - Features
3: Amazon Elastic Kubernetes Service - Features
[4]: Building API Gateway REST API with Lambda integration - Amazon API Gateway
[5]: Improving latency with Provisioned Concurrency - AWS Lambda
[6]: Integrating Amazon ECS with Amazon API Gateway - Amazon Elastic Container Service
[7]: Integrating Amazon EKS with Amazon API Gateway - Amazon Elastic Kubernetes Service
[8]: Managing concurrency for a Lambda function - AWS Lambda

Data-Engineer-Associate-KR 문제 64

회사의 데이터 엔지니어는 테이블 SQL 쿼리의 성능을 최적화해야 합니다. 회사는 Amazon Redshift 클러스터에 데이터를 저장합니다. 데이터 엔지니어는 예산 제약으로 인해 클러스터 크기를 늘릴 수 없습니다.
회사는 데이터를 여러 테이블에 저장하고 EVEN 배포 스타일을 사용하여 데이터를 로드합니다. 일부 테이블의 크기는 수백 기가바이트입니다. 다른 테이블의 크기는 10MB 미만입니다.
어떤 솔루션이 이러한 요구 사항을 충족합니까?

A. 모든 테이블에 대해 EVEN 배포 스타일을 계속 사용합니다. 모든 테이블에 기본 키와 외래 키를 지정합니다.

B. 대형 테이블에는 ALL 배포 스타일을 사용합니다. 모든 테이블에 기본 키와 외래 키를 지정합니다.

C. 거의 업데이트되지 않는 작은 테이블에는 ALL 배포 스타일을 사용합니다. 모든 테이블에 기본 키와 외래 키를 지정합니다.

D. 모든 테이블에 대한 배포, 정렬 및 파티션 키의 조합을 지정합니다.

Data-Engineer-Associate-KR 문제 65

한 회사가 Amazon Redshift를 사용하여 데이터 웨어하우스 솔루션을 구축하고 있습니다. 이 회사는 Redshift 클러스터에 있는 택트 테이블에 수백 개의 타일을 로드하고 있습니다.
회사는 데이터웨어하우스 솔루션이 가능한 최대 처리량을 달성하기를 원합니다. 솔루션은 회사가 데이터를 tact 테이블에 로드할 때 클러스터 리소스를 최적으로 사용해야 합니다.
어떤 솔루션이 이러한 요구 사항을 충족시킬까요?

A. 여러 개의 COPY 명령을 사용하여 Redshift 클러스터에 데이터를 로드합니다.

B. S3DistCp를 사용하여 여러 파일을 Hadoop Distributed File System(HDFS)에 로드합니다. HDFS 커넥터를 사용하여 데이터를 Redshift 클러스터로 수집합니다.

C. Redshift 클러스터 노드 수와 같은 수의 INSERT 문을 사용합니다. 각 노드에 병렬로 데이터를 로드합니다.

D. 단일 COPY 명령을 사용하여 Redshift 클러스터에 데이터를 로드합니다.

다른 버전: 396Amazon.Data-Engineer-Associate-KR.v2026-05-29.q121; 392Amazon.Data-Engineer-Associate-KR.v2026-05-21.q127; 412Amazon.Data-Engineer-Associate-KR.v2026-05-19.q119; 695Amazon.Data-Engineer-Associate-KR.v2026-04-01.q89; 283Amazon.Data-Engineer-Associate-KR.v2026-03-30.q96; 333Amazon.Data-Engineer-Associate-KR.v2026-03-21.q85; 715Amazon.Data-Engineer-Associate-KR.v2026-02-07.q109; 984Amazon.Data-Engineer-Associate-KR.v2025-03-07.q62; 465Amazon.Data-Engineer-Associate-KR.v2025-03-03.q61

최근 업로드: 108Microsoft.AZ-500-KR.v2026-06-04.q213; 110Microsoft.DP-600-KR.v2026-06-04.q98; 104Microsoft.AZ-204-KR.v2026-06-04.q237; 136Microsoft.PL-600-KR.v2026-06-04.q112; 188Microsoft.SC-300-KR.v2026-06-03.q151; 149Microsoft.DP-600-KR.v2026-06-03.q70; 865PMI.PMP-KR.v2026-06-01.q1069; 227Microsoft.MS-102-KR.v2026-06-01.q252; 207Amazon.DOP-C02-KR.v2026-06-01.q207; 159Microsoft.AZ-104-KR.v2026-06-01.q197