Hadoop on OSX

Hadoop on OSX

Links:

  • http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/

Installation

Summary:

  • Java is needed
  • XCode is needed
  • brew is needed (and needs to be healty)
  • remote login needs to be enabled

Check that java is installed:

Gizur-Laptop-5:cfengine jonas$ java -version
java version "1.6.0_41"
Java(TM) SE Runtime Environment (build 1.6.0_41-b02-445-11M4107)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-445, mixed mode)

Make sure that brew is ok

brew doctor

I always seams to have a lot of errors that I have to clean up. I won’t show this.

Install Hadoop:

brew install hadoop
==> Downloading http://www.apache.org/dyn/closer.cgi?path=hadoop/core/hadoop-1.1.1/hadoop-1.1.1.tar.gz
==> Best Mirror http://apache.mirrors.spacedump.net/hadoop/core/hadoop-1.1.1/hadoop-1.1.1.tar.gz
######################################################################## 100,0%
==> Caveats
In Hadoop's config file:
  /usr/local/Cellar/hadoop/1.1.1/libexec/conf/hadoop-env.sh
$JAVA_HOME has been set to be the output of:
  /usr/libexec/java_home
==> Summary
🍺  /usr/local/Cellar/hadoop/1.1.1: 270 files, 75M, built in 113 seconds

/usr/local/Cellar/hadoop/1.1.1/libexec/conf/hadoop-env.sh:

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

/usr/local/Cellar/hadoop/1.1.1/libexec/conf/core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/${user.name}/hadoop-store</value>
        <description>A base for other temporary directories.</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:8020</value>
    </property>
</configuration>

/usr/local/Cellar/hadoop/1.1.1/libexec/conf/mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
    </property>

    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>2</value>
    </property>

    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>2</value>
    </property>
</configuration>

/usr/local/Cellar/hadoop/1.1.1/libexec/conf/hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
</configuration>

Start hadoop:

# Format and then exit
hadoop namenode -format
13/03/01 17:31:31 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Gizur-Laptop-5.local/10.0.1.117
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.1.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1411108; compiled by 'hortonfo' on Mon Nov 19 10:48:11 UTC 2012
************************************************************/
Re-format filesystem in /Users/jonas/hadoop-store/dfs/name ? (Y or N) Y
13/03/01 17:31:37 INFO util.GSet: VM type       = 64-bit
13/03/01 17:31:37 INFO util.GSet: 2% max memory = 19.83375 MB
13/03/01 17:31:37 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/03/01 17:31:37 INFO util.GSet: recommended=2097152, actual=2097152
13/03/01 17:31:37 INFO namenode.FSNamesystem: fsOwner=jonas
13/03/01 17:31:37 INFO namenode.FSNamesystem: supergroup=supergroup
13/03/01 17:31:37 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/03/01 17:31:37 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/03/01 17:31:37 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/03/01 17:31:37 INFO namenode.NameNode: Caching file names occuring more than 10 times 
13/03/01 17:31:37 INFO common.Storage: Image file of size 111 saved in 0 seconds.
13/03/01 17:31:37 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/Users/jonas/hadoop-store/dfs/name/current/edits
13/03/01 17:31:37 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/Users/jonas/hadoop-store/dfs/name/current/edits
13/03/01 17:31:37 INFO common.Storage: Storage directory /Users/jonas/hadoop-store/dfs/name has been successfully formatted.
13/03/01 17:31:37 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Gizur-Laptop-5.local/10.0.1.117
************************************************************/

# Start some
/usr/local/Cellar/hadoop/1.1.1/bin/start-all.sh
Gizur-Laptop-5:dfs jonas$ /usr/local/Cellar/hadoop/1.1.1/bin/start-all.sh
starting namenode, logging to /usr/local/Cellar/hadoop/1.1.1/libexec/bin/../logs/hadoop-jonas-namenode-Gizur-Laptop-5.local.out
Password:
localhost: starting datanode, logging to /usr/local/Cellar/hadoop/1.1.1/libexec/bin/../logs/hadoop-jonas-datanode-Gizur-Laptop-5.local.out
Password:
localhost: starting secondarynamenode, logging to /usr/local/Cellar/hadoop/1.1.1/libexec/bin/../logs/hadoop-jonas-secondarynamenode-Gizur-Laptop-5.local.out
starting jobtracker, logging to /usr/local/Cellar/hadoop/1.1.1/libexec/bin/../logs/hadoop-jonas-jobtracker-Gizur-Laptop-5.local.out
Password:
localhost: starting tasktracker, logging to /usr/local/Cellar/hadoop/1.1.1/libexec/bin/../logs/hadoop-jonas-tasktracker-Gizur-Laptop-5.local.out


# Run example
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100
Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
13/03/01 17:34:38 INFO mapred.FileInputFormat: Total input paths to process : 10
13/03/01 17:34:39 INFO mapred.JobClient: Running job: job_201303011732_0001
13/03/01 17:34:40 INFO mapred.JobClient:  map 0% reduce 0%
13/03/01 17:34:45 INFO mapred.JobClient:  map 10% reduce 0%
13/03/01 17:34:46 INFO mapred.JobClient:  map 20% reduce 0%
13/03/01 17:34:49 INFO mapred.JobClient:  map 40% reduce 0%
13/03/01 17:34:52 INFO mapred.JobClient:  map 60% reduce 0%
13/03/01 17:34:54 INFO mapred.JobClient:  map 60% reduce 10%
13/03/01 17:34:55 INFO mapred.JobClient:  map 80% reduce 10%
13/03/01 17:34:57 INFO mapred.JobClient:  map 100% reduce 20%
13/03/01 17:35:00 INFO mapred.JobClient:  map 100% reduce 33%
13/03/01 17:35:02 INFO mapred.JobClient:  map 100% reduce 100%
13/03/01 17:35:03 INFO mapred.JobClient: Job complete: job_201303011732_0001
13/03/01 17:35:03 INFO mapred.JobClient: Counters: 27
13/03/01 17:35:03 INFO mapred.JobClient:   Job Counters 
13/03/01 17:35:03 INFO mapred.JobClient:     Launched reduce tasks=1
13/03/01 17:35:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29317
13/03/01 17:35:03 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/03/01 17:35:03 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/03/01 17:35:03 INFO mapred.JobClient:     Launched map tasks=10
13/03/01 17:35:03 INFO mapred.JobClient:     Data-local map tasks=10
13/03/01 17:35:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=16443
13/03/01 17:35:03 INFO mapred.JobClient:   File Input Format Counters 
13/03/01 17:35:03 INFO mapred.JobClient:     Bytes Read=1180
13/03/01 17:35:03 INFO mapred.JobClient:   File Output Format Counters 
13/03/01 17:35:03 INFO mapred.JobClient:     Bytes Written=97
13/03/01 17:35:03 INFO mapred.JobClient:   FileSystemCounters
13/03/01 17:35:03 INFO mapred.JobClient:     FILE_BYTES_READ=226
13/03/01 17:35:03 INFO mapred.JobClient:     HDFS_BYTES_READ=2400
13/03/01 17:35:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=264981
13/03/01 17:35:03 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
13/03/01 17:35:03 INFO mapred.JobClient:   Map-Reduce Framework
13/03/01 17:35:03 INFO mapred.JobClient:     Map output materialized bytes=280
13/03/01 17:35:03 INFO mapred.JobClient:     Map input records=10
13/03/01 17:35:03 INFO mapred.JobClient:     Reduce shuffle bytes=280
13/03/01 17:35:03 INFO mapred.JobClient:     Spilled Records=40
13/03/01 17:35:03 INFO mapred.JobClient:     Map output bytes=180
13/03/01 17:35:03 INFO mapred.JobClient:     Total committed heap usage (bytes)=1931190272
13/03/01 17:35:03 INFO mapred.JobClient:     Map input bytes=240
13/03/01 17:35:03 INFO mapred.JobClient:     Combine input records=0
13/03/01 17:35:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1220
13/03/01 17:35:03 INFO mapred.JobClient:     Reduce input records=20
13/03/01 17:35:03 INFO mapred.JobClient:     Reduce input groups=20
13/03/01 17:35:03 INFO mapred.JobClient:     Combine output records=0
13/03/01 17:35:03 INFO mapred.JobClient:     Reduce output records=0
13/03/01 17:35:03 INFO mapred.JobClient:     Map output records=20
Job Finished in 24.631 seconds
Estimated value of Pi is 3.14800000000000000000


ps ax | grep hadoop | wc -l
# expected output is 6