Practice Test 2 | Google Cloud Certified Professional Data Engineer | Dumps | Mock Test
You are using Dataflow SDK to analyze data related to customer segmentation. You need to extract certain fields from the data files to be processed for further transformation.
Which operation is used to perform the operation required?
A. ParDo
B. PCollection
C. Transform
D. Pipeline
Answer: A.
ParDo is the core parallel processing operation in the Apache Beam SDKs, invoking a user-specified function on each of the elements of the input PCollection. ParDo collects the zero or more output elements into an output PCollection. The ParDo transform processes elements independently and possibly in parallel.
In Google Dataflow SDK, ParDo allows for parallel programming. It acts on one item at a time (like a map in MapReduce). ParDo is useful for:
- Filtering and emitting input.
- Type conversion.
- Extracting parts of input and calculating values from different parts of inputs.
Source(s):
Dataflow – Programming Model for Apache Beam: https://cloud.google.com/dataflow/docs/concepts/beam-programming-model
Comments are closed, but trackbacks and pingbacks are open.