databricks certified machine learning professional practice test

Exam Title: certified machine learning professional

Last update: Dec 30 ,2025
Question 1

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.
Which of the following tools can be used to provide this type of continuous processing?

  • A. Spark UDFs
  • B. Structured Streaming
  • C. MLflow
  • D. Delta Lake
  • E. AutoML
Answer:

a

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 2

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.
Which of the following code blocks accomplishes this task?

  • A. spark.read.format(delta).load(path).drop(star_rating)
  • B. spark.read.format(delta).table(path).drop(star_rating)
  • C. Delta tables cannot be modified
  • D. spark.read.table(path).drop(star_rating)
  • E. spark.sql(SELECT * EXCEPT star_rating FROM path)
Answer:

d

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 3

A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in the model_uri variable and its Run ID in the run_id variable. They have also determined that the model was logged with the name model. Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name best_model.
Which of the following lines of code can they use to register the model to the MLflow Model Registry?

  • A. mlflow.register_model(model_uri, "best_model")
  • B. mlflow.register_model(run_id, "best_model")
  • C. mlflow.register_model(f"runs:/{run_id}/best_model", "model")
  • D. mlflow.register_model(model_uri, "model")
  • E. mlflow.register_model(f"runs:/{run_id}/model")
Answer:

d

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 4

Which of the following lists all of the model stages are available in the MLflow Model Registry?

  • A. Development, Staging, Production
  • B. None, Staging, Production
  • C. Staging, Production, Archived
  • D. None, Staging, Production, Archived
  • E. Development, Staging, Production, Archived
Answer:

a

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 5

A data scientist has developed a model model and computed the RMSE of the model on the test set. They have assigned this value to the variable rmse. They now want to manually store the RMSE value with the MLflow run.
They write the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?

  • A. log_artifact
  • B. log_model
  • C. log_metric
  • D. log_param
  • E. There is no way to store values like this.
Answer:

a

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 6

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.
Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

  • A. The pyfunc model can be used to deploy models in a parallelizable fashion
  • B. The same preprocessing logic will automatically be applied when calling fit
  • C. The same preprocessing logic will automatically be applied when calling predict
  • D. This approach has no impact when loading the logged pyfunc model for downstream deployment
  • E. There is no longer a need for pipeline-like machine learning objects
Answer:

e

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 7

A data scientist has developed a model to predict ice cream sales using the expected temperature and expected number of hours of sun in the day. However, the expected temperature is dropping beneath the range of the input variable on which the model was trained.
Which of the following types of drift is present in the above scenario?

  • A. Label drift
  • B. None of these
  • C. Concept drift
  • D. Prediction drift
  • E. Feature drift
Answer:

e

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 8

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:
1. Deploy a model to production and compute predicted values
2. Obtain the observed (actual) label values
3. _____
4. Run a statistical test to determine if there are changes over time
Which of the following should be completed as Step #3?

  • A. Obtain the observed values (actual) feature values
  • B. Measure the latency of the prediction time
  • C. Retrain the model
  • D. None of these should be completed as Step #3
  • E. Compute the evaluation metric using the observed and predicted values
Answer:

d

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 9

A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.
Which of the following lines of code can they use to accomplish this task?

  • A. mlflow.sklearn.autolog()
  • B. mlflow.spark.autolog()
  • C. spark.conf.set(autologging, True)
  • D. It is not possible to automatically log MLflow runs.
  • E. mlflow.autolog()
Answer:

c

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Question 10

Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?

  • A. mlflow.shap.log_explanation
  • B. None of these operations can accomplish the task.
  • C. mlflow.shap
  • D. mlflow.log_figure
  • E. client.log_artifact
Answer:

c

vote your answer:
A
B
C
D
E
A 0 B 0 C 0 D 0 E 0
Comments
Page 1 out of 5
Viewing questions 1-10 out of 57
Go To
page 2