Clean, well-labeled datasets used for machine learning are partitioned into three subsets: Training
sets, Validation sets, and Test sets. As your team is doing this, what's the best way to split up this
data?
B
You need to hire a data scientist to join your team. What skill sets should you be looking for when
hiring and interviewing this person? (Select all that apply.)
B, C, D, F
Creating machine learning models can be complicated. Your team wants to use tools called
Automated Machine Learning (AutoML) to simplify the process. You know of another team that has
used AutoML tools and it's saved the team a lot of time.
However, what's the one area you should not have the AutoML tool help with?
D
One of the key elements of a data-centric methodology is the data requirements phase. During
CPMAI Phase II, several unexpected issues have developed and are now threatening the data
collection efforts.
What course of action might make the issue worse?
C
You're testing your model and it is overly sensitive to the fluctuations of data and having trouble
generalizing. What type of problem is this?
B
Major factors for the project you are currently working on is around the training time, cost, and
complexity of training your models. Which algorithm is not the best choice given these constraints?
C
Your team is testing the NLP model they just created to make sure it's performing as expected. Some
of your team members want to move this model to production and move to the next iteration.
What's wrong with this workflow?
A
Your team is looking to develop an RPA bot to help with back-office processes such as data entry.
What type of bot should your team be creating?
A
The growth of Big Data has led to a desire to be able to do more to process and extract more value
from Big Dat
a. Simply storing data and providing analytics is no longer enough anymore to remain competitive.
To keep your organization competitive, you need to:
C
During CPMAI Phase II, it's important to not only understand the sources of your data but also what
data is required for training as well as identifying the features that are required.
When looking to gather data, what approach is best when determining how much data you need?
A