Practice Test 2 | Google Cloud Certified Professional Data Engineer | Dumps | Mock Test

Question 38

A company has over 25TB of data in Avro format stored in on-premise disks. You are migrating the tech stack used to Google Cloud. The current data pipeline built on-premise does the required data transformation and enrichment using Apache Spark. You decide to use Dataproc for data processing. When the migration was approved by the management, one of the base requirements was for data to be highly available and cross-zone durability should be guaranteed. What should you do?

A. Use Google Storage to store data. Allow Dataproc cluster to access data from Google Storage.
B. Use BigQuery to store data. Install Dataproc-BigQuery connector to access data.
C. Use Dataproc cluster’s HDFS namenodes to store data.
D. Use BigTable to store data. Use Dataproc-BigTable connector to access data.

Answer

Answer: A.

Description:

When you want to move Hadoop & Spark workloads from an on-premises environment to Google Cloud Platform (GCP), It’s recommended to use Dataproc to run Apache Spark & Hadoop clusters.

Cloud Storage is a good option if:

Your data in ORC, Parquet, Avro, or any other format will be used by different clusters or jobs, and you need data persistence if the cluster terminates.
You need high throughput and your data is stored in files larger than 128 MB.
You need cross-zone durability for your data.
You need data to be highly available—for example, you want to eliminate HDFS NameNode as a single point of failure.

Source(s):

Migrating Apache Spark Jobs to Cloud Dataproc:

https://cloud.google.com/solutions/migration/hadoop/ migrating-apache-spark-jobs-to-cloud-dataproc

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

PCDataE

Practice Test 2 | Google Cloud Certified Professional Data Engineer | Dumps | Mock Test

Question 38

Answer

Ads Blocker Detected!!!