What does CDH packaging do on install to facilitate Kerberos security setup?
B
You want to understand more about how users browse your public website. For example, you want to
know which pages they visit prior to placing an order. You have a server farm of 200 web servers
hosting your website. Which is the most efficient process to gather these web server across logs into
your Hadoop cluster analysis?
A,B
Which three basic configuration parameters must you set to migrate your cluster from MapReduce 1
(MRv1) to MapReduce V2 (MRv2)?
A,B,D
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB.
Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide
to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python
using Hadoop streaming.
Which data serialization system gives the flexibility to do this?
A,B
Identify two features/issues that YARN is designated to address:
B,D
Explanation:
Reference:
http://www.revelytix.com/?q=content/hadoop-ecosystem(YARN, first para)
Which YARN daemon or service monitors a Controller’s per-application resource using (e.g., memory
CPU)?
A
Which is the default scheduler in YARN?
B
Explanation:
Reference:
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarnsite/CapacityScheduler.html
Which YARN process run as “container 0” of a submitted job and is responsible for resource
qrequests?
C
Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a
reasonable time without starting long-running jobs?
C
Explanation:
Reference:
http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html
Your cluster is configured with HDFS and MapReduce version 2 (MRv2) on YARN. What is the result
when you execute: hadoop jar SampleJar MyClass on a client machine?
A