A team wants to deploy a new feature to production for internal users only and be able to instantly
disable it if problems occur, without redeploying code. Which strategy is most suitable?
B
Explanation:
Feature flags are the most effective way to control feature exposure to specific users, such as internal
testers, while enabling fast rollback without redeployment. Option B is correct because feature flags
allow teams to decouple deployment from release, giving precise runtime control over feature
availability. This means that once the code is deployed, the team can toggle the feature on or off for
different cohorts (e.g., internal users) dynamically.
Option A (blue/green deployment) controls traffic between two environments but does not provide
user-level granularity. Option C (canary deployments) gradually expose changes but focus on random
subsets of users rather than targeted groups such as internal employees. Option D requires
redeployment or rollback, which introduces risk and slows down incident response.
Feature flags are widely recognized in platform engineering as a core continuous delivery practice
that improves safety, accelerates experimentation, and enhances resilience by enabling immediate
mitigation of issues.
Reference:
— CNCF Platforms Whitepaper
— Cloud Native Platform Engineering Study Guide
— Continuous Delivery Foundation Guidance
In the context of observability, which telemetry signal is primarily used to record events that occur
within a system and are timestamped?
A
Explanation:
Logs are detailed, timestamped records of discrete events that occur within a system. They provide
granular insight into what has happened, making them crucial for debugging, auditing, and incident
investigations. Option A is correct because logs capture both normal and error events, often
containing contextual information such as error codes, user IDs, or request payloads.
Option B (alerts) are secondary outputs generated from telemetry signals like logs or metrics and are
not raw data themselves. Option C (traces) represent the flow of requests across distributed systems,
showing relationships and latency between services but not arbitrary events. Option D (metrics) are
numeric aggregates sampled over intervals (e.g., CPU usage, latency), not discrete, timestamped
events.
Observability guidance in cloud native systems emphasizes the "three pillars" of telemetry: logs,
metrics, and traces. Logs are indispensable for root cause analysis and compliance because they
preserve historical event context.
Reference:
— CNCF Observability Whitepaper
— OpenTelemetry Documentation (aligned with CNCF)
— Cloud Native Platform Engineering Study Guide
In assessing the effectiveness of platform engineering initiatives, which DORA metric most directly
correlates to the time it takes for code from its initial commit to be deployed into production?
A
Explanation:
Lead Time for Changes is a DORA (DevOps Research and Assessment) metric that measures the time
from code commit to successful deployment in production. Option A is correct because it directly
reflects how quickly the platform enables developers to turn ideas into delivered software. Shorter
lead times indicate an efficient delivery pipeline, streamlined workflows, and effective automation.
Option B (Deployment Frequency) measures how often code is deployed, not how long it takes to
reach production. Option C (Mean Time to Recovery) measures operational resilience after failures.
Option D (Change Failure Rate) indicates stability by measuring the percentage of deployments
causing incidents. While all DORA metrics are valuable, only Lead Time for Changes measures end-
to-end speed of delivery.
In platform engineering, improving lead time often involves automating CI/CD pipelines,
implementing GitOps, and reducing manual approvals. It is a core measurement of developer
experience and platform efficiency.
Reference:
— CNCF Platforms Whitepaper
— Accelerate: State of DevOps Report (DORA Metrics)
— Cloud Native Platform Engineering Study Guide
In the context of observability for cloud native platforms, which of the following best describes the
role of OpenTelemetry?
C
Explanation:
OpenTelemetry is an open-source CNCF project that provides vendor-neutral, standardized APIs,
SDKs, and agents for collecting and exporting observability data such as metrics, logs, and traces.
Option C is correct because OpenTelemetry’s purpose is to unify how telemetry data is generated,
transmitted, and consumed, regardless of which backend (e.g., Prometheus, Jaeger, Elastic,
commercial APM tools) is used.
Option A is incorrect because OpenTelemetry supports all three signal types (metrics, logs, traces),
not just logs. Option B is incorrect because it is an open, community-driven standard and not tied to a
single vendor or cloud provider. Option D is misleading because OpenTelemetry covers distributed
applications, services, and infrastructure—far beyond just infrastructure monitoring.
OpenTelemetry reduces vendor lock-in and promotes interoperability, making it a cornerstone of
cloud native observability strategies. Platform engineering teams rely on it to ensure consistent data
collection, enabling better insights, faster debugging, and improved reliability of cloud native
platforms.
Reference:
— CNCF Observability Whitepaper
— OpenTelemetry CNCF Project Documentation
— Cloud Native Platform Engineering Study Guide
A company is implementing a service mesh for secure service-to-service communication in their
cloud native environment. What is the primary benefit of using mutual TLS (mTLS) within this
context?
A
Explanation:
Mutual TLS (mTLS) is a core feature of service meshes, such as Istio or Linkerd, that enhances security
in cloud native environments by ensuring that both communicating services authenticate each other
and that the communication channel is encrypted. Option A is correct because mTLS delivers two
critical benefits: authentication (verifying the identity of both client and server services) and
encryption (protecting data in transit from interception or tampering).
Option B is incorrect because mTLS does not bypass security—it enforces it. Option C is partly true in
that service meshes often support observability and logging, but that is not the primary purpose of
mTLS. Option D relates to scaling, which is outside the scope of mTLS.
In platform engineering, mTLS is a fundamental security mechanism that provides zero-trust
networking between microservices, ensuring secure communication without requiring application-
level changes. It strengthens compliance with security and data protection requirements, which are
crucial in regulated industries.
Reference:
— CNCF Service Mesh Whitepaper
— CNCF Platforms Whitepaper
— Cloud Native Platform Engineering Study Guide
What is the primary purpose of using multiple environments (e.g., development, staging,
production) in a cloud native platform?
A
Explanation:
The primary reason for implementing multiple environments in cloud native platforms is to isolate
the different phases of the software development lifecycle. Option A is correct because
environments such as development, staging, and production enable testing and validation at each
stage without impacting end users. Development environments allow rapid iteration, staging
environments simulate production for integration and performance testing, and production
environments serve real users.
Option B (reducing costs) may be a side effect but is not the main purpose. Option C (distributing
traffic) relates more to load balancing and high availability, not environment separation. Option D is
the opposite of the goal—different environments often require tailored infrastructure to meet their
distinct purposes.
Isolation through multiple environments is fundamental to reducing risk, supporting continuous
delivery, and ensuring stability. This practice also allows for compliance checks, automated testing,
and user acceptance validation before changes reach production.
Reference:
— CNCF Platforms Whitepaper
— Team Topologies & Platform Engineering Guidance
— Cloud Native Platform Engineering Study Guide
As a Cloud Native Platform Associate, you need to implement an observability strategy for your
Kubernetes clusters. Which of the following tools is most commonly used for collecting and
monitoring metrics in cloud native environments?
D
Explanation:
Prometheus is the de facto standard for collecting and monitoring metrics in Kubernetes and other
cloud native environments. Option D is correct because Prometheus is a CNCF graduated project
designed for multi-dimensional data collection, time-series storage, and powerful querying using
PromQL. It integrates seamlessly with Kubernetes, automatically discovering targets such as Pods
and Services through service discovery.
Option A (Grafana) is widely used for visualization but relies on Prometheus or other data sources to
collect metrics. Option B (ELK Stack) is better suited for log aggregation rather than real-time metrics.
Option C (OpenTelemetry) provides standardized instrumentation but is focused on generating and
exporting metrics, logs, and traces rather than storage, querying, and alerting.
Prometheus plays a central role in platform observability strategies, often paired with Alertmanager
for notifications and Grafana for dashboards. Together, they enable proactive monitoring, SLO/SLI
measurement, and incident detection, making Prometheus indispensable in cloud native platform
engineering.
Reference:
— CNCF Observability Whitepaper
— Prometheus CNCF Project Documentation
— Cloud Native Platform Engineering Study Guide
Which platform component enables one-click provisioning of sandbox environments, including both
infrastructure and application code?
A
Explanation:
A CI/CD pipeline is the platform component that enables automated provisioning of sandbox
environments with both infrastructure and application code. Option A is correct because modern
pipelines integrate Infrastructure as Code (IaC) with application deployment, enabling “one-click” or
self-service provisioning of complete environments. This capability is central to platform engineering
because it empowers developers to spin up temporary or permanent sandbox environments quickly
for testing, experimentation, or demos.
Option B (service mesh) focuses on secure, observable service-to-service communication but does
not provision environments. Option C (service bus) is used for asynchronous communication
between services, not environment provisioning. Option D (observability pipeline) deals with
collecting telemetry data, not provisioning.
By leveraging CI/CD pipelines integrated with GitOps and IaC tools (such as Terraform, Crossplane, or
Kubernetes manifests), platform teams ensure consistency, compliance, and automation. Developers
benefit from reduced friction, faster feedback cycles, and a better overall developer experience.
Reference:
— CNCF Platforms Whitepaper
— CNCF GitOps Principles
— Cloud Native Platform Engineering Study Guide
Development teams frequently raise support tickets for short-term access to staging clusters,
creating a growing burden on the platform team. What's the best long-term solution to balance
control, efficiency, and developer experience?
A
Explanation:
The most sustainable solution for managing developer access while balancing governance and self-
service is to adopt GitOps-based RBAC management. Option A is correct because it leverages Git as
the source of truth for access permissions, allowing developers to request access through pull
requests. For non-sensitive environments such as staging, approvals can be automated, ensuring
efficiency while still maintaining auditability. This approach aligns with platform engineering
principles of self-service, automation, and compliance.
Option B places the burden entirely on one engineer, which does not scale. Option C introduces
bottlenecks, delays, and reduces developer experience. Option D bypasses governance and
auditability, potentially creating security risks.
GitOps for RBAC not only improves developer experience but also ensures all changes are versioned,
reviewed, and auditable. This model supports compliance while reducing manual intervention from
the platform team, thus enhancing efficiency.
Reference:
— CNCF GitOps Principles
— CNCF Platforms Whitepaper
— Cloud Native Platform Engineering Study Guide
In a GitOps approach, how should the desired state of a system be managed and integrated?
D
Explanation:
The GitOps model is built on the principle that the desired state of infrastructure and applications
must be stored in Git as the single source of truth. Option D is correct because Git provides
versioning, immutability, and auditability, while reconciliation controllers (e.g., Argo CD or Flux) pull
the desired state into the system continuously. This ensures that actual cluster state always matches
the declared Git state.
Option A is partially correct but fails because GitOps eliminates manual push workflows—
automation ensures changes are pulled and reconciled. Option B describes Kubernetes CRDs, which
may be part of the system but do not embody GitOps on their own. Option C contradicts GitOps
principles, which rely on pull-based reconciliation, not centralized push.
Storing desired state in Git provides full traceability, automated rollbacks, and continuous
reconciliation, improving reliability and compliance. This makes GitOps a core practice for cloud
native platform engineering.
Reference:
— CNCF GitOps Principles
— CNCF Platforms Whitepaper
— Cloud Native Platform Engineering Study Guide