In order to perform honest assessment on a predictive model, what is an acceptable division
between training, validation, and testing data?
D
Refer to the exhibit:
Based upon the comparative ROC plot for two competing models, which is the champion model and
why?
B
A marketing campaign will send brochures describing an expensive product to a set of customers.
The cost for mailing and production per customer is $50. The company makes $500 revenue for each
sale.
What is the profit matrix for a typical person in the population?
C
A confusion matrix is created for data that were oversampled due to a rare target.
What values are not affected by this oversampling?
D
This question will ask you to provide missing code segments.
A logistic regression model was fit on a data set where 40% of the outcomes were events (TARGET=1)
and 60% were non-events (TARGET=0). The analyst knows that the population where the model will
be deployed has 5% events and 95% non-events. The analyst also knows that the company's profit
margin for correctly targeted events is nine times higher than the company's loss for incorrectly
targeted non-event.
Given the following SAS program:
What X and Y values should be added to the program to correctly score the data?
B
An analyst has a sufficient volume of data to perform a 3-way partition of the data into training,
validation, and test sets to perform honest assessment during the model building process.
What is the purpose of the training data set?
A
Refer to the confusion matrix:
Calculate the sensitivity. (0 - negative outcome, 1 - positive outcome)
Click the calculator button to display a calculator if needed.
A
The total modeling data has been split into training, validation, and test data.
What is the best data to use for model assessment?
D
What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data
prior to partitioning the data for honest assessment as opposed to performing the data cleansing
after partitioning the data?
D
A company has branch offices in eight regions. Customers within each region are classified as either
"High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the
total amount of purchases per customer is used as the response variable.
Suppose there is a significant interaction between REGION and VALUE. What can you conclude?
B