nvidia ncp-aio practice test

Exam Title: AI Operations

Last update: Nov 27 ,2025
Question 1

What must be done before installing new versions of DOCA drivers on a BlueField DPU?

  • A. Uninstall any previous versions of DOCA drivers.
  • B. Re-flash the firmware every time.
  • C. Disable network interfaces during installation.
  • D. Reboot the host system.
Answer:

A


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
Before installing new versions of DOCA drivers on NVIDIA BlueField DPUs, it is required to uninstall
any previous versions of DOCA drivers to prevent conflicts and ensure a clean upgrade. This ensures
that the new installation is not affected by leftover files or configurations from earlier versions. Re-
flashing firmware or disabling network interfaces is not always required before every driver
installation. Rebooting the host system might be recommended after installation but is not a
prerequisite before installing drivers.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 2

A Slurm user needs to display real-time information about the running processes and resource usage
of a Slurm job.
Which command should be used?

  • A. smap -j <jobid>
  • B. scontrol show job <jobid>
  • C. sstat -j <job(.step)>
  • D. sinfo -j <jobid>
Answer:

C


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The Slurm command sstat is designed to provide real-time statistics about running jobs, including
process-level details and resource usage such as CPU, memory, and GPU utilization. Using sstat -j
<jobid> or sstat -j <jobid.step> allows monitoring of active job resource consumption.
smap is not a standard Slurm command.
scontrol show job gives job configuration and status but not real-time resource usage.
sinfo displays node and partition information, not job-specific resource stats.
Therefore, sstat is the correct command for real-time job process and resource monitoring.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 3

Which two (2) ways does the pre-configured GPU Operator in NVIDIA Enterprise Catalog differ from
the GPU Operator in the public NGC catalog? (Choose two.)

  • A. It is configured to use a prebuilt vGPU driver image.
  • B. It supports Mixed Strategies for Kubernetes deployments.
  • C. It automatically installs the NVIDIA Datacenter driver.
  • D. It is configured to use the NVIDIA License System (NLS).
  • E. It additionally installs Network Operator.
Answer:

A, D


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The pre-configured GPU Operator in the NVIDIA Enterprise Catalog differs from the public NGC
catalog GPU Operator primarily by its configuration to use a prebuilt vGPU driver image and being
configured to use the NVIDIA License System (NLS). These adaptations allow better support for
enterprise environments where vGPU functionality and license management are critical.
Other options such as automatic installation of the Datacenter driver or additional installation of
Network Operator are not specific differences highlighted between the two operators.

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 4

You are managing multiple edge AI deployments using NVIDIA Fleet Command. You need to ensure
that each AI application running on the same GPU is isolated from others to prevent interference.
Which feature of Fleet Command should you use to achieve this?

  • A. Remote Console
  • B. Secure NFS support
  • C. Multi-Instance GPU (MIG) support
  • D. Over-the-air updates
Answer:

C


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
NVIDIA Fleet Command is a cloud-native software platform designed to deploy, manage, and
orchestrate AI applications at the edge. When managing multiple AI applications on the same GPU,
Multi-Instance GPU (MIG) support is critical. MIG allows a single GPU to be partitioned into multiple
independent instances, each with dedicated resources (compute, memory, bandwidth), enabling
workload isolation and preventing interference between applications.
Remote Console allows remote access for management but does not provide GPU resource isolation.
Secure NFS support is for secure network file system sharing, unrelated to GPU resource partitioning.
Over-the-air updates are for updating software remotely, not for GPU resource management.
Therefore, to ensure application isolation on the same GPU in Fleet Command environments,
enabling MIG support (option C) is the recommended and standard practice.
This capability is emphasized in NVIDIA’s AI Operations and Fleet Command documentation for
managing edge AI deployments efficiently and securely.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 5

You are deploying AI applications at the edge and want to ensure they continue running even if one
of the servers at an edge location fails.
How can you configure NVIDIA Fleet Command to achieve this?

  • A. Use Secure NFS support for data redundancy.
  • B. Set up over-the-air updates to automatically restart failed applications.
  • C. Enable high availability for edge clusters.
  • D. Configure Fleet Command's multi-instance GPU (MIG) to handle failover.
Answer:

C


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
To ensure continued operation of AI applications at the edge despite server failures, NVIDIA Fleet
Command allows administrators to enable high availability (HA) for edge clusters. This HA
configuration ensures redundancy and failover capabilities, so applications remain operational when
an edge server goes down.
Over-the-air updates handle software patching but do not inherently provide failover. MIG manages
GPU resource partitioning, not failover. Secure NFS supports storage redundancy but is not the
primary solution for application failover.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 6

You are an administrator managing a large-scale Kubernetes-based GPU cluster using Run:AI.
To automate repetitive administrative tasks and efficiently manage resources across multiple nodes,
which of the following is essential when using the Run:AI Administrator CLI for environments where
automation or scripting is required?

  • A. Use the runai-adm command to directly update Kubernetes nodes without requiring kubectl.
  • B. Use the CLI to manually allocate specific GPUs to individual jobs for better resource management.
  • C. Ensure that the Kubernetes configuration file is set up with cluster administrative rights before using the CLI.
  • D. Install the CLI on Windows machines to take advantage of its scripting capabilities.
Answer:

C


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
When automating tasks with the Run:AI Administrator CLI, it is essential to ensure that the
Kubernetes configuration file (kubeconfig) is correctly set up with cluster administrative rights. This
enables the CLI to interact programmatically with the Kubernetes API for managing nodes, resources,
and workloads efficiently. Without proper administrative permissions in the kubeconfig, automated
operations will fail due to insufficient rights.
Manual GPU allocation is typically handled by scheduling policies rather than CLI manual
assignments. The CLI does not replace kubectl commands entirely, and installation on Windows is
not a critical requirement.
Explanation:
The Run:AI Administrator CLI requires a Kubernetes configuration file with cluster-administrative
rights in order to perform automation or scripting tasks across the cluster. Without those rights, the
CLI cannot manage nodes or resources programmatically.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 7

A Fleet Command system administrator wants to create an organization user that will have the
following rights:
For locations - read only
For Applications - read/write/admin
For Deployments - read/write/admin
For Dashboards - read only
What role should the system administrator assign to this user?

  • A. Fleet Command Operator
  • B. Fleet Command Admin
  • C. Fleet Command Supporter
  • D. Fleet Command Viewer
Answer:

A


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The Fleet Command Operator role is designed to provide users with read-only access to locations
and dashboards while granting full read/write/admin rights for applications and deployments. This
matches the described access requirements where the user can manage applications and
deployments but only view locations and dashboards without modification rights. Other roles like
Fleet Command Admin have broader permissions, Supporter has more limited access, and Viewer is
primarily read-only for all resources.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 8

An organization only needs basic network monitoring and validation tools.
Which UFM platform should they use?

  • A. UFM Enterprise
  • B. UFM Telemetry
  • C. UFM Cyber-AI
  • D. UFM Pro
Answer:

B


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The UFM Telemetry platform provides basic network monitoring and validation capabilities, making
it suitable for organizations that require foundational insight into their network status without
advanced analytics or AI-driven cybersecurity features. Other platforms such as UFM Enterprise or
UFM Pro offer broader or more advanced functionalities, while UFM Cyber-AI focuses on AI-driven
cybersecurity.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 9

Your organization is running multiple AI models on a single A100 GPU using MIG in a multi-tenant
environment. One of the tenants reports a performance issue, but you notice that other tenants are
unaffected.
What feature of MIG ensures that one tenant's workload does not impact others?

  • A. Hardware-level isolation of memory, cache, and compute resources for each instance.
  • B. Dynamic resource allocation based on workload demand.
  • C. Shared memory access across all instances.
  • D. Automatic scaling of instances based on workload size.
Answer:

A


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
NVIDIA's Multi-Instance GPU (MIG) technology provides hardware-level isolation of critical GPU
resources such as memory, cache, and compute units for each GPU instance. This ensures that
workloads running in one instance are fully isolated and cannot interfere with the performance of
workloads in other instances, supporting multi-tenancy without contention.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Question 10

You are deploying an AI workload on a Kubernetes cluster that requires access to GPUs for training
deep learning models. However, the pods are not able to detect the GPUs on the nodes.
What would be the first step to troubleshoot this issue?

  • A. Verify that the NVIDIA GPU Operator is installed and running on the cluster.
  • B. Ensure that all pods are using the latest version of TensorFlow or PyTorch.
  • C. Check if the nodes have sufficient memory allocated for AI workloads.
  • D. Increase the number of CPU cores allocated to each pod to ensure better resource utilization.
Answer:

A


Explanation:
Comprehensive and Detailed Explanation From Exact Extract:
The first step in troubleshooting Kubernetes pods that cannot detect GPUs is to verify whether the
NVIDIA GPU Operator is properly installed and running. The GPU Operator manages the installation
and configuration of all NVIDIA GPU components in the cluster, including drivers, device plugins, and
monitoring tools. Without it, pods will not have access to GPU resources. Ensuring correct
installation and operational status of the GPU Operator is essential before checking application-level
versions or resource allocations.

vote your answer:
A
B
C
D
A 0 B 0 C 0 D 0
Comments
Page 1 out of 6
Viewing questions 1-10 out of 66
Go To
page 2