The term "greedy algorithms" refers to machine-learning algorithms that:
D
Explanation:
Greedy algorithms build the solution iteratively by choosing at each step the option that appears
best at that moment, without reconsidering earlier choices.
A data scientist is deploying a model that needs to be accessed by multiple departments with
minimal development effort by the departments. Which of the following APIs would be best for the
data scientist to use?
D
Explanation:
RESTful APIs use standard HTTP methods and lightweight data formats (typically JSON), making them
easy for diverse teams to integrate with minimal effort and without heavy tooling.
Which of the following compute delivery models allows packaging of only critical dependencies
while developing a reusable asset?
B
Explanation:
Containers encapsulate just the application and its critical dependencies on a lightweight runtime,
making the resulting asset portable and reusable without bundling an entire operating system.
A data analyst is analyzing data and would like to build conceptual associations. Which of the
following is the best way to accomplish this task?
A
Explanation:
n-grams capture contiguous sequences of words, revealing which terms co-occur and form
meaningful multi-word concepts. By analyzing these frequent word combinations, you directly
uncover conceptual associations in the text.
Which of the following belong in a presentation to the senior management team and/or C-suite
executives? (Choose two.)
C
Explanation:
Senior leaders need actionable insights and the overarching outcomes, not the implementation
details, so you present your key recommendations alongside a summary of results at a high level.
During EDA, a data scientist wants to look for patterns, such as linearity, in the dat
a. Which of the following plots should the data scientist use?
C
Explanation:
Scatter plots display pairs of numeric values on two axes, letting you visually assess relationships and
patterns, such as linear trends, between variables.
Which of the following distribution methods or models can most effectively represent the actual
arrival times of a bus that runs on an hourly schedule?
C
Explanation:
Scheduled buses tend to arrive around a fixed time with random delays that cluster symmetrically
around the hour. A normal distribution effectively models those continuous, bell-shaped deviations
from the exact schedule.
A data scientist has constructed a model that meets the minimum performance requirements
specified in the proposal for a prediction project. The data scientist thinks the model's accuracy
should be improved, but the proposed deadline is approaching. Which of the following actions
should the data scientist take first?
C
Explanation:
Since the model already meets the agreed-upon requirements and the deadline is near, the first step
is to confirm with the stakeholder whether pursuing further accuracy gains is worth the additional
time and resources. This ensures you align with business priorities before collecting more data,
requesting funding, or tweaking the model further.
Which of the following best describes the minimization of the residual term in a ridge linear
regression?
C
Explanation:
Ridge regression extends ordinary least squares by adding an L2 penalty on the coefficients, but it
still minimizes the sum of squared residuals (e²) as its loss term.
A statistician notices gaps in data associated with age-related illnesses and wants to further
aggregate these observations. Which of the following is the best technique to achieve this goal?
C
Explanation:
Binning groups continuous age values into discrete intervals (e.g., age ranges), filling gaps by
aggregating observations into broader categories. This directly addresses uneven or sparse age data
by creating consistent age groups.