Google Professional Machine Learning Engineer

Practice Test #3

Simulate the real exam experience with 50 questions and a 120-minute time limit. Practice with AI-verified answers and detailed explanations.

50Questions120Minutes700/1000Passing Score

Browse Practice Questions

AI-Powered

Triple AI-Verified Answers & Explanations

Every answer is cross-verified by 3 leading AI models to ensure maximum accuracy. Get detailed per-option explanations and in-depth question analysis.

GPT Pro

Claude Opus

Gemini Pro

Per-option explanations

In-depth question analysis

3-model consensus accuracy

Practice Questions

Question 1

You deployed a TensorFlow recommendation model to a Vertex AI Prediction endpoint in us-central1 with autoscaling enabled. Over the last week, you observed sustained traffic of ~1,200 requests per hour (about 20 RPS) during business hours, which is 2x higher than your original estimate, and you need to keep P95 latency under 150 ms during future surges. You want the endpoint to scale efficiently to handle this higher baseline and upcoming spikes without causing user-visible latency. What should you do?

Question 2

You are part of a data science team at a ride‑sharing platform and need to train and compare multiple TensorFlow models on Vertex AI using 850 million labeled trip records (≈2.3 TB) stored in a BigQuery table; training will run on 4–8 workers and you want to minimize data‑ingestion bottlenecks while ensuring the pipeline remains scalable and repeatable. What should you do?

Question Analysis

Core concept: This question tests scalable input pipelines for distributed TensorFlow training on Vertex AI. The key is decoupling training from the source system (BigQuery) and using an efficient, parallelizable file format and tf.data best practices to avoid input bottlenecks at multi-worker scale. Why the answer is correct: With 850M rows (~2.3 TB) and 4–8 workers, streaming directly from BigQuery or materializing into a single-node structure will bottleneck on network, per-request overhead, and/or BigQuery concurrency limits. Sharded TFRecords in Cloud Storage are a standard, repeatable “training-ready” dataset format: they enable high-throughput sequential reads, easy parallelization across workers, and deterministic reuse across experiments. Proper sharding (e.g., 1–2 GB) balances metadata overhead (too many small files) against parallelism (too few large files). Using tf.data.TFRecordDataset with parallel interleave, map, and prefetch allows overlapping I/O and compute, maximizing accelerator/CPU utilization. Key features / best practices: - Store training data in Cloud Storage in a binary, splittable format (TFRecord) with compression (e.g., GZIP) when appropriate. - Use many shards and let each worker read different shards (via file patterns and dataset sharding options) to reduce contention. - Use tf.data optimizations: parallel_interleave (or Dataset.interleave with num_parallel_calls), map with AUTOTUNE, prefetch(AUTOTUNE), and optionally cache only when it fits. - Make the pipeline repeatable: a one-time (or scheduled) export/transform step from BigQuery to TFRecords can be orchestrated (e.g., Vertex AI Pipelines / Dataflow) and versioned. Common misconceptions: - “Directly read from BigQuery” sounds convenient, but it couples training throughput to BigQuery read performance, quotas, and transient query/streaming behavior, which is risky at scale. - “CSV is universal” but is inefficient: large text parsing overhead, larger storage footprint, and slower input pipelines. - “Load into pandas” is a common prototype pattern but fails for multi-terabyte datasets and distributed training. Exam tips: For large-scale training on Vertex AI, prefer Cloud Storage + TFRecords (or similarly efficient formats) with tf.data performance patterns. Choose architectures that separate data preparation from training, support multi-worker parallel reads, and minimize per-record parsing overhead. When you see TB-scale data and multiple workers, avoid pandas and avoid text formats unless explicitly required.

Question 3

You are designing a TensorFlow Extended (TFX) pipeline with standard TFX components for a global media-streaming platform that analyzes user interaction logs; the pipeline includes feature engineering and data validation steps, and after promotion to production it must process up to 120 TB of historical clickstream data per day stored in BigQuery across 12 daily partitions (with an additional 2 TB ingested each day); you need the preprocessing steps to scale efficiently, automatically publish metrics and parameters to Vertex AI Experiments, and track all artifacts with Vertex ML Metadata. How should you configure the pipeline run?

Question Analysis

Core concept: This question tests how to run a standard TFX pipeline on Google Cloud so that (1) large-scale preprocessing/validation can elastically scale, and (2) the run is natively integrated with Vertex AI for orchestration, Experiments tracking, and ML Metadata artifact lineage. Why the answer is correct: Vertex AI Pipelines is the managed orchestration layer for TFX on Google Cloud and integrates with Vertex ML Metadata for artifact tracking. For preprocessing and validation at the stated scale (up to ~120 TB/day across partitions in BigQuery), the critical requirement is to execute TFX’s Beam-based components (e.g., ExampleGen with BigQuery, StatisticsGen, SchemaGen, ExampleValidator, Transform) on a scalable distributed runner. Configuring Apache Beam pipeline arguments to use the Dataflow runner is the standard, best-practice approach: Dataflow provides autoscaling, parallelism, and managed execution for Beam, which is exactly what these components use under the hood. Key features / configurations: - Orchestrate with Vertex AI Pipelines (managed Kubeflow Pipelines) to get MLMD integration and reproducible runs. - Set Beam pipeline_args for Dataflow (runner=DataflowRunner, project, region, temp_location, staging_location, service_account, network/subnetwork if needed, worker settings, autoscaling). This ensures Transform/validation steps scale to TB-scale data. - Use the Vertex AI Pipelines/TFX integration to publish run parameters and metrics; Experiments tracking is best achieved when the pipeline steps log metrics/params to Vertex AI (often via built-in integrations or custom components) while MLMD captures artifacts and lineage. Common misconceptions: Option A sounds plausible because “distributed training” is important, but the bottleneck described is preprocessing/validation over massive BigQuery data, not model training. Distributed training does not automatically scale Beam-based data processing. Options C and D propose using Beam orchestrators directly on Dataproc/Dataflow, but that bypasses the primary requirement of using standard TFX components with Vertex AI Pipelines’ managed orchestration and tight MLMD/Experiments integration. Exam tips: - For TFX on Google Cloud: Vertex AI Pipelines is the orchestrator; Dataflow is the scalable runner for Beam-based TFX components. - When you see TB-scale feature engineering/validation, think “Beam on Dataflow,” not “bigger training jobs.” - Map requirements to layers: orchestration (Vertex AI Pipelines), data processing (Dataflow), metadata/lineage (Vertex ML Metadata), and experiment tracking (Vertex AI Experiments).

Question 4

You are part of an operations team managing a fleet of 250 refrigerated delivery trucks. Each truck’s refrigeration unit streams telemetry at 10-second intervals, including compressor current (A), condenser coil temperature (°C), discharge pressure (kPa), and vibration RMS (g), resulting in roughly 14 months of historical data per truck. No breakdowns or incident events have been hand-labeled yet. Management asks for a predictive maintenance solution that can detect potential refrigeration unit failures with at least a 24-hour lead time so that routes can be rescheduled. What should you do first?

Question 5

A logistics platform has trained three versions of an ETA prediction model (v1, v2, v3), imported them into Vertex AI Model Registry, and deployed them to a single online prediction endpoint; you expect about 120,000 prediction requests per day and want to run a 7-day A/B/n test by initially routing 50%/25%/25% of traffic to v1/v2/v3 while tracking per-version accuracy and p95 latency with the least engineering overhead. What should you do to identify the best-performing model using the simplest approach?

Question Analysis

Core Concept: This question tests Vertex AI online prediction deployment patterns: using a single Endpoint with multiple deployed models (or model versions) and Endpoint traffic splitting to run A/B/n experiments, while observing operational metrics (latency) and outcome metrics (accuracy) with minimal custom infrastructure. Why the Answer is Correct: Option A uses Vertex AI Endpoint weighted traffic splitting to route 50%/25%/25% to v1/v2/v3. This is the simplest, lowest-overhead approach because traffic management is a first-class feature of Vertex AI Endpoints: you can deploy multiple models to one endpoint and set per-deployed-model traffic percentages. For p95 latency, Vertex AI integrates with Cloud Monitoring/Cloud Logging to provide request/response and serving metrics without building a custom serving stack. For per-version accuracy, you can attribute predictions to the deployed model ID/version via prediction logging and compare against ground truth once labels arrive (common in ETA problems), minimizing engineering by leveraging managed logging/monitoring rather than bespoke routing layers. Key Features / Best Practices: - Vertex AI Endpoint trafficSplit: configure weights per deployed model for A/B/n testing and adjust gradually (supports canary-style rollouts). - Prediction logging: enable request/response logging (with appropriate privacy controls) to join predictions with later-arriving actual ETAs for accuracy by model version. - Cloud Monitoring metrics: monitor latency distributions (including p95), error rates, and throughput per endpoint and (via labels/deployed_model_id) per deployment. - Architecture Framework alignment: operational excellence (managed serving, fewer moving parts), reliability (Google-managed autoscaling), and cost optimization (no extra clusters/services). Common Misconceptions: It’s tempting to assume you must use GKE/Traffic Director or Cloud Run traffic splitting for A/B tests. Those are valid general-purpose patterns, but they add unnecessary components when Vertex AI already provides model-level traffic splitting and managed observability. Another misconception is that “accuracy monitoring” must be fully automated in Vertex AI; in practice, accuracy often requires joining predictions with ground truth later, but Vertex AI logging makes that straightforward without building a custom router. Exam Tips: When the question says “least engineering overhead” and the models are already in Vertex AI Model Registry and deployed to a single endpoint, prefer native Vertex AI Endpoint capabilities (multiple deployments + traffic split + managed monitoring/logging). Reach for GKE/Cloud Run/Traffic Director only when you need custom serving logic, non-Vertex runtimes, or advanced mesh features beyond Vertex AI’s managed serving.

Want to practice all questions on the go?

Download Cloud Pass — includes practice tests, progress tracking & more.

Question 6

Your supply chain analytics team plans to run 180 training jobs per day for 10 days (3 feature sets × 4 model architectures × 15 hyperparameter grids) using containerized trainers; they must log per-run metrics (AUC, F1, and loss) with timestamps and be able to query trends over time (for example, 7-day rolling averages and the top 10 configurations by mean F1 in the last 30 days) via an API while minimizing manual effort. Which approach should they use to track and report these experiments?

Question 7

You are the lead ML engineer at a smart-meter analytics company; you trained a TensorFlow model that flags consumption anomalies, and each day at 01:00 UTC your ETL job writes the previous day’s readings (~1.5 million records, ~60 GB) as newline-delimited JSON to Cloud Storage under prefixes like gs://meter-prod/daily/2025-08-31/*.jsonl; you need to run inference over the entire daily batch with minimal manual intervention and do not require low-latency, per-request responses; what should you do?

Question 8

You are fine-tuning a Vision Transformer classifier on 1.2 million 224x224 product images using Keras; on a single NVIDIA T4 GPU with a global batch size of 64, each epoch takes about 90 minutes, and you have already enabled tf.data prefetch(AUTOTUNE), caching, and mixed precision. You switch to a VM with 4 T4 GPUs and wrap model creation/training with tf.distribute.MirroredStrategy, making no other changes and keeping the global batch size at 64; however, the epoch time remains ~90 minutes and per-GPU utilization hovers at 30–40%. Disk throughput and input pipeline profiling show no bottlenecks. What should you do to reduce wall-clock training time?

Question Analysis

Core Concept: This question tests distributed training performance with tf.distribute.MirroredStrategy (data-parallel synchronous training) and how global batch size affects GPU utilization and step time. In synchronous multi-GPU training, each step splits the global batch across replicas, runs forward/backward on each GPU, then performs an all-reduce to aggregate gradients. Why the Answer is Correct: With 4 GPUs and a fixed global batch size of 64, the per-replica batch becomes 16. For a ViT at 224x224, that per-step workload can be too small to efficiently saturate each T4, especially with mixed precision (faster math) and relatively high per-step overhead (kernel launch, framework overhead, and all-reduce). The result is low utilization (30–40%) and little to no speedup, matching the symptom that epoch time stays ~90 minutes. Increasing the global batch size increases per-replica batch (e.g., global 256 -> per-replica 64), improving arithmetic intensity and amortizing overhead and communication, which typically reduces wall-clock time per epoch. Key Features / Best Practices: - MirroredStrategy expects you to scale the global batch size with the number of replicas to keep per-replica batch roughly constant (strong scaling). A common rule is: new_global_batch = old_global_batch * num_replicas. - After increasing batch size, adjust the learning rate (often linear scaling rule) and consider warmup to maintain convergence. - Ensure you are not inadvertently limiting parallelism with small steps or too-frequent host/device sync points (e.g., overly chatty callbacks), but the primary lever here is batch sizing. - This aligns with Google Cloud Architecture Framework performance principles: maximize accelerator utilization and reduce per-step overhead. Common Misconceptions: - It’s tempting to blame the input pipeline and reach for dataset distribution APIs, but profiling already ruled out I/O bottlenecks. - A custom training loop rarely fixes underutilization caused by too-small per-replica batches; it can even reduce performance if not carefully optimized. - Switching to TPU doesn’t address the root cause; TPUs also need sufficiently large per-core batch sizes and can be limited by similar overhead/communication patterns. Exam Tips: When multi-GPU speedup is poor and input is not the bottleneck, check (1) per-replica batch size, (2) all-reduce/communication overhead, and (3) step overhead. For MirroredStrategy, scaling the global batch size (and tuning LR accordingly) is the most common and expected fix.

Question 9

You work for an online marketplace that must automatically flag product photos containing restricted brand logos; each image belongs to exactly one class (logo present vs. not present). You trained a convolutional neural network, deployed a model version to Vertex AI Prediction, and attached a model evaluation job to that version. At a softmax decision threshold of 0.50, the evaluation reports precision = 0.71, but the business requires precision >= 0.90. To increase precision by changing only the final layer softmax threshold, what should happen as a consequence of your adjustment?

Question 10

You work for a nationwide e-commerce marketplace. After receiving approval to collect the necessary customer behavior data, you trained a Vertex AI AutoML Tabular model to predict the probability that an order will be returned within 30 days. You deployed the model to online prediction, and it serves about 200,000 predictions per day. Seasonal promotions and marketing campaigns may change how features such as discount_rate, shipping_speed, and product_category interact, which could degrade accuracy over time. You want to be alerted if feature interactions change and to understand which features drive the predictions, while keeping monitoring costs low. What should you do?

Question Analysis

Core Concept: This question tests Vertex AI Model Monitoring for online prediction, specifically the difference between feature drift and feature attribution drift, and how to control monitoring cost via sampling rate and monitoring frequency. Why the Answer is Correct: The business concern is that “feature interactions” change due to seasonality and campaigns, degrading accuracy. Pure feature drift detects changes in the distribution of input features (e.g., discount_rate values shifting), but it does not directly tell you whether the model’s reliance on features (including interaction effects learned by the model) has changed. Feature attribution drift monitoring tracks changes in feature attributions (e.g., how much discount_rate vs. shipping_speed contributes to predictions over time). This better matches the requirement to “understand which features drive the predictions” and to be alerted when the relationship between features and predictions changes. To keep monitoring costs low while serving ~200,000 predictions/day, you should not log/monitor 100% of requests. A sampling rate of 0.1 (10%) reduces the volume of logged predictions and monitoring computations by ~10x while still providing enough signal for weekly trend detection in a high-traffic system. Key Features / Configurations: - Vertex AI Model Monitoring supports drift monitoring on input features and on feature attributions. - Attribution drift is especially useful when the model’s decision logic changes due to changing interactions, even if marginal feature distributions don’t shift dramatically. - Sampling rate controls what fraction of prediction requests are logged and used for monitoring; lower sampling reduces BigQuery/logging/storage and monitoring job costs. - Weekly monitoring frequency is reasonable for seasonal/campaign-driven shifts and further reduces cost compared to daily. Common Misconceptions: - Choosing feature drift because “features change” is tempting, but it misses the explicit need to understand drivers of predictions and interaction/importance changes. - Setting sampling to 1 seems “more accurate,” but it is unnecessarily expensive at this scale and not required for weekly alerting. Exam Tips: When the prompt mentions “which features drive predictions,” “model reliance,” or “interactions changing,” think feature attribution monitoring (and attribution drift). When cost is a constraint for high QPS/volume endpoints, prefer lower sampling rates and an appropriate monitoring cadence rather than monitoring every request.

Success Stories(7)

C***************Nov 24, 2025

Study period: 1 month

Just want to say a massive thank you to the entire Cloud pass, for helping me pass my exam first time. I wont lie, it wasn't easy, especially the way the real exam is worded, however the way practice questions teaches you why your option was wrong, really helps to frame your mind and helps you to understand what the question is asking for and the solutions your mind should be focusing on. Thanks once again.

f****Nov 23, 2025

Study period: 1 month

Good questions banks and explanations that help me practise and pass the exam.

민

민**Nov 12, 2025

Study period: 1 month

강의 듣고 바로 문제 풀었는데 정답률 80% 가량 나왔고, 높은 점수로 시험 합격했어요. 앱 잘 이용했습니다

S************Nov 11, 2025

Study period: 1 month

Good mix of theory and practical scenarios

A***********Nov 6, 2025

Study period: 1 month

I used the app mainly to review the fundamentals—data preparation, model tuning, and deployment options on GCP. The explanations were simple and to the point, which really helped before the exam.

Other Practice Tests

Practice Test #1

50 Questions·120 min·Pass 700/1000

Practice Test #2

50 Questions·120 min·Pass 700/1000

← View All Google Professional Machine Learning Engineer Questions

Start Practicing Now

Download Cloud Pass and start practicing all Google Professional Machine Learning Engineer exam questions.

Want to practice all questions on the go?

Get the app

Download Cloud Pass — includes practice tests, progress tracking & more.

Cloud Pass

Google Professional Machine Learning Engineer

Practice Test #3

Simulate the real exam experience with 50 questions and a 120-minute time limit. Practice with AI-verified answers and detailed explanations.

50Questions120Minutes700/1000Passing Score

Browse Practice Questions

AI-Powered

Triple AI-Verified Answers & Explanations

Every answer is cross-verified by 3 leading AI models to ensure maximum accuracy. Get detailed per-option explanations and in-depth question analysis.

GPT Pro

Claude Opus

Gemini Pro

Per-option explanations

In-depth question analysis

3-model consensus accuracy

Practice Questions

Question 1

Question 2

Question Analysis

Question 3

Question Analysis

Question 4

Question 5

Question Analysis

Want to practice all questions on the go?

Download Cloud Pass — includes practice tests, progress tracking & more.

Question 6

Question 7

Question 8

Question Analysis

Question 9

Question 10

Question Analysis

Success Stories(7)

C***************Nov 24, 2025

Study period: 1 month

f****Nov 23, 2025

Study period: 1 month

Good questions banks and explanations that help me practise and pass the exam.

민

민**Nov 12, 2025

Study period: 1 month

강의 듣고 바로 문제 풀었는데 정답률 80% 가량 나왔고, 높은 점수로 시험 합격했어요. 앱 잘 이용했습니다

S************Nov 11, 2025

Study period: 1 month

Good mix of theory and practical scenarios

A***********Nov 6, 2025

Study period: 1 month

I used the app mainly to review the fundamentals—data preparation, model tuning, and deployment options on GCP. The explanations were simple and to the point, which really helped before the exam.

Other Practice Tests

Practice Test #1

50 Questions·120 min·Pass 700/1000

Practice Test #2

50 Questions·120 min·Pass 700/1000

← View All Google Professional Machine Learning Engineer Questions

Start Practicing Now

Download Cloud Pass and start practicing all Google Professional Machine Learning Engineer exam questions.

Want to practice all questions on the go?

Get the app

Download Cloud Pass — includes practice tests, progress tracking & more.