You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database
is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
A
A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded
into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the
lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time
has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?
B
You are building a real-time prediction engine that streams files, which may contain PII (personal identifiable information)
data, into Cloud Storage and eventually into BigQuery. You want to ensure that the sensitive data is masked but still
maintains referential integrity, because names and emails are often used as join keys. How should you use the Cloud Data
Loss Prevention API (DLP API) to ensure that the PII data is not accessible by unauthorized individuals?
B
You are updating the code for a subscriber to a Pub/Sub feed. You are concerned that upon deployment the subscriber may
erroneously acknowledge messages, leading to message loss. Your subscriber is not set up to retain acknowledged
messages. What should you do to ensure that you can recover from errors after deployment?
C
Explanation:
Reference: https://cloud.google.com/pubsub/docs/replay-overview
You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription
source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent
throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing
data. Which Stackdriver alerts should you create?
B
Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider
services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate
the entire pipeline?
D
You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store
this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You
want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the
dataset. Which database and data model should you choose?
D
You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will
account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data
periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a
different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this?
(Choose two.)
C E
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company
has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-
speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications
challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time
analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive,
they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location
availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data
consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to
support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000
installations. Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology
definition.
MJTelco will also use three separate operating environments development/test, staging, and production to meet the
needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an
unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge
machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers
Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their
customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows
and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized
to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to
meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also
need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on
automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot
afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google
Clouds machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with
our data pipelines.
MJTelco is building a custom interface to share data. They have these requirements:
1. They need to do aggregations over their petabyte-scale datasets.
2. They need to scan specific time range rows with a very fast response time (milliseconds).
Which combination of Google Cloud Platform products should you recommend?
C
You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need
to process, store and analyze these very large datasets in real time. What should you do?
B
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company
has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-
speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications
challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time
analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive,
they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location
availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data
consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to
support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000
installations. Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology
definition.
MJTelco will also use three separate operating environments development/test, staging, and production to meet the
needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an
unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge
machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers
Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their
customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows
and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized
to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to
meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also
need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on
automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot
afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google
Clouds machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with
our data pipelines.
Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of Google BigQuery
increasing. MJTelco asks you to provide a design solution. They require a single large data table called tracking_table.
Additionally, they want to minimize the cost of daily queries while performing fine-grained analysis of each days events.
They also want to use streaming ingestion. What should you do?
B
Your company built a TensorFlow neutral-network model with a large number of neurons and layers. The model fits well for
the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?
C
Explanation:
Reference: https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-
30505541d877
Youre using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. Youve recently
identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole
database. You need to ensure both the reliability of your production application as well as the analytical workload.
What should you do?
B
Your companys customer and order databases are often under heavy load. This makes performing analytics against them
difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump.
You want to perform analytics with minimal impact on operations. What should you do?
C
You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the
Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google
Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into
Google BigQuery.
How should you securely run this workload?
B