What is a benefit or HPE Machine Learning Development Environment, beyond open source
Determined AI?
C
Explanation:
The benefit of HPE Machine Learning Development Environment beyond open source Determined AI
is Distributed Training. Distributed training allows multiple machines to train a single model in
parallel, greatly increasing the speed and efficiency of the training process. HPE ML Development
Environment provides tools and support for distributed training, allowing users to make the most of
their resources and quickly train their models.
A customer is deploying HPE Machine learning Development Environment on on-prem
infrastructure. The customer wants to run some experiments on servers with 8 NVIDIA A too GPUs
and other experiments on servers with only Z NVIDIA T4 GPUs. What should you recommend?
D
Explanation:
By establishing multiple compute resource pools on the cluster, you can ensure that the correct
servers are used for each experiment, depending on the number of GPUs required. This will help
ensure that the experiments are run on the servers with the correct resources without having to
manually assign each experiment to the appropriate server.
Compared to Asynchronous Successive Halving Algorithm (ASHA), what is an advantage of Adaptive
ASHA?
B
Explanation:
Adaptive ASHA is an enhanced version of ASHA that uses a reinforcement learning approach to select
hyperparameter configurations. This allows Adaptive ASHA to select higher-performing configs and
clone those configurations, allowing for better performance than ASHA.
What is a benefit of HPE Machine Learning Development Environment mat tends to resonate with
executives?
B
Explanation:
HPE Machine Learning Development Environment is designed to deliver results more quickly than
traditional methods, allowing companies to get a return on their investment sooner and benefit from
their DL projects faster. This tends to be a benefit that resonates with executives, as it can help them
realize their goals more quickly and efficiently.
Your cluster uses Amazon S3 to store checkpoints. You ran an experiment on an HPE Machine
Learning Development Environment cluster, you want to find the location tor the best checkpoint
created during the experiment. What can you do?
D
Explanation:
HPE Machine Learning Development Environment uses Amazon S3 to store checkpoints. To find the
location of the best checkpoint created during an experiment, you need to look for a "determined-
checkpoint/" bucket within Amazon S3, referencing your experiment ID. This bucket will contain all of
the checkpoints that were created during the experiment.
What is a reason to use the best tit policy on an HPE Machine Learning Development Environment
resource pool?
D
Explanation:
The best fit policy on an HPE Machine Learning Development Environment resource pool ensures
that the highest priority experiments obtain access to more resources, while still ensuring that all
experiments receive their fair share. This allows you to make the most of your resources and
prioritize the experiments that are most important to you.
What is one of the responsibilities of the conductor of an HPE Machine Learning Development
Environment cluster?
D
Explanation:
The conductor of an HPE Machine Learning Development Environment cluster is responsible for
ensuring that all experiment metadata is stored and accessible. This includes tracking experiment
runs, storing configuration parameters, and ensuring results are stored for future reference.
What type of interconnect does HPE Machine learning Development System use for high-speed,
agent-to-agent communications?
A
Explanation:
HPE Machine Learning Development System uses Remote Direct Memory Access (RDMA)
overconverged Ethernet (RoCE) for high-speed, agent-to-agent communications. This technology
allows data to be transferred directly between agents without the need for copying, which results in
improved performance and reduced latency.
An ML engineer is running experiments on HPE Machine Learning Development Environment. The
engineer notices all of the checkpoints for a trial except one disappear after the trial ends. The
engineer wants to Keep more of these checkpoints. What can you recommend?
A
Explanation:
The best recommendation for an ML engineer running experiments on HPE Machine Learning
Development Environment to keep more of the checkpoints is to adjust the experiment config's
checkpoint storage settings to save more of the latest and best checkpoints. This can be done by
monitoring ongoing trials in the WebUI and clicking checkpoint flags to auto-save the desired
checkpoints. Additionally, the engineer should double-check that the checkpoint storage location is
operating under 90% of total capacity to ensure that enough capacity is available to store the
checkpoints. Finally, they can adjust the checkpoint storage settings to save checkpoints to a shared
file system instead of cloud storage if desired.
The 10 agents in "my-compute-poor nave 8 GPUs each, you want to change an experiment config to
run on multiple GPUs at once. What Is a valid setting for "resources_per_trial?
A
Explanation:
The valid setting for "resourcespertrial" for the 10 agents in "my-compute-poor" with 8 GPUs each
would be 20, as this would be the total number of GPUs available across all 10 agents. This setting
would allow the experiment config to run on multiple GPUs at once.