1、安装须要的信赖包及软件

ca888亚洲娱乐城:Hadoop学习笔记一,如何安装Hadoop。Hadoop是四个布满式系统底蕴架构,他使得客商能够在不驾驭布满式底层细节的事态下,开辟分布式程序。

近年伊始钻探大额那块,未来从最底工的Hadoop起来,后续将稳步学习Hadoop全体生态圈的相继部分零器件。

亟需安装的依赖包有:

Hadoop的关键宗旨:HDFS和MapReduce。HDFS负担积攒,MapReduce负担计算。

Hadoop设置分为地面安装、伪遍及式、完全布满式和高可用分布式,这里为个体学习用(实情是自个儿并未有那么多机器,装设想机的话,内部存款和储蓄器恐怕也相当不够,T_T卡塔 尔(阿拉伯语:قطر‎,仅涉及到本地安装和伪分布式安装。

gcc、c++、 autoconf、automake、libtool

上面介绍安装Hadoop的显要:

条件筹划

# 操作系统信息
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

# 系统内核信息
$ uname -r
3.10.0-693.11.6.el7.x86_64

# hostname信息 ( 配置 )
$ hostnamectl set-hostname v108.zlikun.com
$ hostnamectl status
   Static hostname: v108.zlikun.com
         Icon name: computer-vm
           Chassis: vm
        Machine ID: da1dac0e4969496a8906d711f95f2a7f
           Boot ID: 8ffc47fb1b7148ab992d8bf6f3f32ac1
    Virtualization: vmware
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-693.11.6.el7.x86_64
      Architecture: x86-64

# 在`/etc/hosts`文件中配置主机名 ( 不要在意为什么是v108这个细节,只是我电脑上的第8个虚拟机而已,^_^ )
192.168.1.108   v108.zlikun.com v108

亟待安装的配套软件有:

实则安装Hadoop也不费劲,首要须求以下的几点先行条件,如若以下先行条件弄好了,遵照官方网站配置运营就特别轻易了。

安装JAVA

# 解压 `jdk-8u151-linux-x64.tar.gz` 包,并移动到 `/usr/local` 目录下
/usr/local/jdk1.8.0_151

# 在 `/etc/profile` 中配置环境变量
export JAVA_HOME=/usr/local/jdk1.8.0_151
export PATH=$PATH:$JAVA_HOME/bin

# 查看JDK版本
$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

Java6、Maven

1、Java运转条件,提议Sun的发行版

本土安装Hadoop

# Hadoop使用2.7.5版,文档地址:http://hadoop.apache.org/docs/r2.7.5/ ,下面是参考安装文档:
# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/SingleCluster.html

# 安装过程:本地安装只需要将 `hadoop-2.7.5.tar.gz` 安装包解压到指定目录即可
$ tar zxvf hadoop-2.7.5.tar.gz
$ mv hadoop-2.7.5 /opt/hadoop
# 这里删除全部 *.cmd 格式文件 ( 这些文件仅限Windows下使用,非必须删除,取决于个人习惯 )
$ rm -rf /opt/hadoop/*/*.cmd
# 配置环境变量 HADOOP_HOME
$ echo 'export HADOOP_HOME=/opt/hadoop' >> /etc/profile

# 下面是Hadoop的目录结构
/opt/hadoop/
├── bin
│   ├── container-executor
│   ├── hadoop
│   ├── hdfs
│   ├── mapred
│   ├── rcc
│   ├── test-container-executor
│   └── yarn
├── etc
│   └── hadoop
├── include
│   ├── hdfs.h
│   ├── Pipes.hh
│   ├── SerialUtils.hh
│   ├── StringUtils.hh
│   └── TemplateFactory.hh
├── lib
│   └── native
├── libexec
│   ├── hadoop-config.sh
│   ├── hdfs-config.sh
│   ├── httpfs-config.sh
│   ├── kms-config.sh
│   ├── mapred-config.sh
│   └── yarn-config.sh
├── LICENSE.txt
├── NOTICE.txt
├── README.txt
├── sbin
│   ├── distribute-exclude.sh
│   ├── hadoop-daemon.sh
│   ├── hadoop-daemons.sh
│   ├── hdfs-config.sh
│   ├── httpfs.sh
│   ├── kms.sh
│   ├── mr-jobhistory-daemon.sh
│   ├── refresh-namenodes.sh
│   ├── slaves.sh
│   ├── start-all.sh
│   ├── start-balancer.sh
│   ├── start-dfs.sh
│   ├── start-secure-dns.sh
│   ├── start-yarn.sh
│   ├── stop-all.sh
│   ├── stop-balancer.sh
│   ├── stop-dfs.sh
│   ├── stop-secure-dns.sh
│   ├── stop-yarn.sh
│   ├── yarn-daemon.sh
│   └── yarn-daemons.sh
└── share
    ├── doc
    └── hadoop

# 执行 `bin/hadoop` 命令,可以查看帮助文档
$ cd /opt/hadoop
$ bin/hadoop 
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

# 通常安装Hadoop后第一件事就是配置它的JAVA_HOME参数
$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_151

关于地点的依据包,假使在Ubuntu下,使用sudo
apt-get install *
命令安装,假若在CentOS下,使用sudo
yum install *指令来安装。

2、SSH公钥免密认证

词频计算示例

# 准备一个本地文件,后面将对其进行词频统计
$ mkdir input
$ echo 'java golang ruby rust erlang java javascript lua rust java' > input/lang.txt

# 运行官方自带 `MapReduce` 程序,进行词频统计
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount input output
18/01/30 08:42:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/01/30 08:42:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/01/30 08:42:34 INFO input.FileInputFormat: Total input paths to process : 1
18/01/30 08:42:34 INFO mapreduce.JobSubmitter: number of splits:1
18/01/30 08:42:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local935371141_0001
18/01/30 08:42:34 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/01/30 08:42:34 INFO mapreduce.Job: Running job: job_local935371141_0001
18/01/30 08:42:34 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/01/30 08:42:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:34 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/01/30 08:42:34 INFO mapred.LocalJobRunner: Waiting for map tasks
18/01/30 08:42:34 INFO mapred.LocalJobRunner: Starting task: attempt_local935371141_0001_m_000000_0
18/01/30 08:42:34 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:34 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 08:42:34 INFO mapred.MapTask: Processing split: file:/opt/hadoop/input/lang.txt:0+59
18/01/30 08:42:34 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/01/30 08:42:34 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/01/30 08:42:34 INFO mapred.MapTask: soft limit at 83886080
18/01/30 08:42:34 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/01/30 08:42:34 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/01/30 08:42:34 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/01/30 08:42:34 INFO mapred.LocalJobRunner: 
18/01/30 08:42:34 INFO mapred.MapTask: Starting flush of map output
18/01/30 08:42:34 INFO mapred.MapTask: Spilling map output
18/01/30 08:42:34 INFO mapred.MapTask: bufstart = 0; bufend = 99; bufvoid = 104857600
18/01/30 08:42:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
18/01/30 08:42:34 INFO mapred.MapTask: Finished spill 0
18/01/30 08:42:34 INFO mapred.Task: Task:attempt_local935371141_0001_m_000000_0 is done. And is in the process of committing
18/01/30 08:42:35 INFO mapred.LocalJobRunner: map
18/01/30 08:42:35 INFO mapred.Task: Task 'attempt_local935371141_0001_m_000000_0' done.
18/01/30 08:42:35 INFO mapred.Task: Final Counters for attempt_local935371141_0001_m_000000_0: Counters: 18
        File System Counters
                FILE: Number of bytes read=296042
                FILE: Number of bytes written=585271
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=96
                Combine input records=10
                Combine output records=7
                Spilled Records=7
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=16
                Total committed heap usage (bytes)=165744640
        File Input Format Counters 
                Bytes Read=59
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local935371141_0001_m_000000_0
18/01/30 08:42:35 INFO mapred.LocalJobRunner: map task executor complete.
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Starting task: attempt_local935371141_0001_r_000000_0
18/01/30 08:42:35 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 08:42:35 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 08:42:35 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@475ebd65
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/01/30 08:42:35 INFO reduce.EventFetcher: attempt_local935371141_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/01/30 08:42:35 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local935371141_0001_m_000000_0 decomp: 88 len: 92 to MEMORY
18/01/30 08:42:35 INFO reduce.InMemoryMapOutput: Read 88 bytes from map-output for attempt_local935371141_0001_m_000000_0
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 88, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->88
18/01/30 08:42:35 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/01/30 08:42:35 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
18/01/30 08:42:35 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 08:42:35 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merged 1 segments, 88 bytes to disk to satisfy reduce memory limit
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merging 1 files, 92 bytes from disk
18/01/30 08:42:35 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/01/30 08:42:35 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 08:42:35 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/01/30 08:42:35 INFO mapred.Task: Task:attempt_local935371141_0001_r_000000_0 is done. And is in the process of committing
18/01/30 08:42:35 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 08:42:35 INFO mapred.Task: Task attempt_local935371141_0001_r_000000_0 is allowed to commit now
18/01/30 08:42:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local935371141_0001_r_000000_0' to file:/opt/hadoop/output/_temporary/0/task_local935371141_0001_r_000000
18/01/30 08:42:35 INFO mapred.LocalJobRunner: reduce > reduce
18/01/30 08:42:35 INFO mapred.Task: Task 'attempt_local935371141_0001_r_000000_0' done.
18/01/30 08:42:35 INFO mapred.Task: Final Counters for attempt_local935371141_0001_r_000000_0: Counters: 24
        File System Counters
                FILE: Number of bytes read=296258
                FILE: Number of bytes written=585433
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Combine input records=0
                Combine output records=0
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=7
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=2
                Total committed heap usage (bytes)=165744640
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Output Format Counters 
                Bytes Written=70
18/01/30 08:42:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local935371141_0001_r_000000_0
18/01/30 08:42:35 INFO mapred.LocalJobRunner: reduce task executor complete.
18/01/30 08:42:35 INFO mapreduce.Job: Job job_local935371141_0001 running in uber mode : false
18/01/30 08:42:35 INFO mapreduce.Job:  map 100% reduce 100%
18/01/30 08:42:35 INFO mapreduce.Job: Job job_local935371141_0001 completed successfully
18/01/30 08:42:35 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=592300
                FILE: Number of bytes written=1170704
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=96
                Combine input records=10
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=18
                Total committed heap usage (bytes)=331489280
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=59
        File Output Format Counters 
                Bytes Written=70

# 查看统计结果
$ cat output/*
erlang  1
golang  1
java    3
javascript      1
lua     1
ruby    1
rust    2

时至前几天,本地Hadoop就设置到位了,能够开展一些轻巧的测验,这里能够参见官方演示示例:

至于配套的Java和Maven的安装,参谋博文《Linux下Java、Maven、Tomcat的安装》。

上述条件化解,剩下的就只是Hadoop的配备了,那有个别安插不一致版本恐怕有不一致,详细参照官方文书档案表达。

2、下载snappy-1.1.2

环境

虚拟机:VMWare10.0.1 build-1379776

操作系统:CentOS7 六贰10位

可供下载之处:

安装Java环境

下载地址:

ca888亚洲娱乐城 1

基于自个儿的操作系统版本采用相应的下载包,假使是支撑rpm包的,直接下载rpm,也许应用rpm地址

rpm –ivh http://download.oracle.com/otn-pub/java/jdk/8u20-b26/jdk-8u20-linux-x64.rpm

JDK会持续立异,所以安装新型版本的JDK必要您自个儿去官方网站获取最新安装包的rpm地址。

——————————————分割线——————————————

布署SSH公钥免密认证

CentOS中暗中认可自带了openssh-server、openssh-clients以至rsync,假设你的系统中尚无,那么请自行检索安装情势。

免费下载地址在

创造协同的账户

享有机器上成立hadoop(名称自定卡塔 尔(阿拉伯语:قطر‎账户,密码也合併安装为hadoop

useradd -d /home/hadoop -s /usr/bin/bash –g wheel hadoop
passwd hadoop

顾客名与密码都以www.linuxidc.com

SSH配置

vi /etc/ssh/sshd_config

找到如下三个布局项,并改成如下设置。借使被疏解了,就去掉前面包车型客车#免除注释使配置生效。

RSAAuthentication yes
PubkeyAuthentication yes

# The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2
# but this is overridden so installations will only check .ssh/authorized_keys
AuthorizedKeysFile      .ssh/authorized_keys

.ssh/authorized_keys便是公钥的贮存路径。

切实下载目录在
/2014年资料/12月/25日/Hadoop
2.2.0和HBase-0.98 安装snappy

密钥公钥生成

用hadoop账户登入。

cd ~
ssh-keygen –t rsa –P ''

将调换的~/.ssh/id_rsa.pub文件保留成~/.ssh/authorized_keys

cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

用scp命令将.ssh目录拷贝到其余机器上,偷懒做法让抱有的机器的密钥形似,分享公钥。

scp ~/.ssh/* hadoop@slave1:~/.ssh/

注意保管~/.ssh/id_rsa的拜望权限必需是600,禁绝其余客户访问。

下载情势见
http://www.linuxidc.com/Linux/2013-07/87684.htm

Hadoop安装

参考官方配置文书档案

——————————————分割线——————————————

3、编写翻译并动态安装

下载后解压到有个别文件夹,此处固然解压的地址位home目录。再执行如下命令如下:

$ cd ~/snappy-1.1.2
$ sudo ./configure
$ sudo ./make
$ sudo make install

下一场推行如下命令查看是或不是安装成功。

$ cd /usr/local/lib
$ ll libsnappy.*
-rw-r–r– 1 root root 233506 Aug 7 11:56 libsnappy.a
-rwxr-xr-x 1 root root    953 Aug 7 11:56 libsnappy.la
lrwxrwxrwx 1 root root    18 Aug 7 11:56 libsnappy.so ->
libsnappy.so.1.2.1
lrwxrwxrwx 1 root root    18 Aug 7 11:56 libsnappy.so.1 ->
libsnappy.so.1.2.1
-rwxr-xr-x 1 root root 147758 Aug 7 11:56
libsnappy.so.1.2.1借使设置进程中未有超越错误,且/usr/local/lib目录下有上边的文件,表示安装成功。

4、hadoop-snappy源码编写翻译

1卡塔 尔(英语:State of Qatar)下载源码,二种办法

a、安装svn,借使是ubuntu,使用sudo apt-get install
subversion;如果是centos,使用sudo yum install subversion命令安装。

b、使用svn 从谷歌(Google卡塔尔的svn旅馆中checkout源码,使用如下命令:

$ svn checkout
hadoop-snappy

如此那般就在实施命令的目录下将hadoop-snappy的源码拷贝出来放在hadoop-snappy目录中。

不过因为Google的劳务在陆地总是出标题,所以也得以选用直接下载,见本文下面的Linux公社财富下载连接。

2)编译hadoop-snappy源码

切换成hadoop-snappy源码的目录下,实施如下命令:

a、即便地点安装snappy使用的是暗中同意路线,命令为:

mvn package

b、假诺地方安装的snappy使用的是自定义路线,则下令为:

mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]

相关文章