WIN10利用docker toolbox搭建hadoop和spark集群
一、安装Docker toolbox
下载链接:(DockerToolbox-18.03.0-ce.exe)
1.运行安装包
双击安装包,在安装界面勾选上Git(如果你已经有了Git就不用勾选)
注意:使用docker需要在BIOS里开启CPU虚拟化技术,并关闭Windows系统的Hype-V功能,否则会报错!!!
成功提示:
2.修改启动项
(1)新建一个start.sh的文件,将以下代码复制进去
#!/bin/bash
trap '[ "$?" -eq 0 ] || read -p "Looks like something went wrong in step ´$STEP´... Press any key to continue..."' EXIT
#Quick Hack: used to convert e.g. "C:\Program Files\Docker Toolbox" to "/c/Program Files/Docker Toolbox"
win_to_unix_path(){
wd="$(pwd)"
cd "$1"
the_path="$(pwd)"
cd "$wd"
echo $the_path
}
# This is needed to ensure that binaries provided
# by Docker Toolbox over-ride binaries provided by
# Docker for Windows when launching using the Quickstart.
export PATH="$(win_to_unix_path "${DOCKER_TOOLBOX_INSTALL_PATH}"):$PATH"
VM=${DOCKER_MACHINE_NAME-default}
DOCKER_MACHINE="${DOCKER_TOOLBOX_INSTALL_PATH}\docker-machine.exe"
STEP="Looking for vboxmanage.exe"
if [ ! -z "$VBOX_MSI_INSTALL_PATH" ]; then
VBOXMANAGE="${VBOX_MSI_INSTALL_PATH}VBoxManage.exe"
else
VBOXMANAGE="${VBOX_INSTALL_PATH}VBoxManage.exe"
fi
BLUE='\033[1;34m'
GREEN='\033[0;32m'
NC='\033[0m'
#clear all_proxy if not socks address
if [[ $ALL_PROXY != socks* ]]; then
unset ALL_PROXY
fi
if [[ $all_proxy != socks* ]]; then
unset all_proxy
fi
if [ ! -f "${DOCKER_MACHINE}" ]; then
echo "Docker Machine is not installed. Please re-run the Toolbox Installer and try again."
exit 1
fi
if [ ! -f "${VBOXMANAGE}" ]; then
echo "VirtualBox is not installed. Please re-run the Toolbox Installer and try again."
exit 1
fi
"${VBOXMANAGE}" list vms | grep \""${VM}"\" &> /dev/null VM_EXISTS_CODE=$? set -e STEP="Checking if machine $VM exists" if [ $VM_EXISTS_CODE -eq 1 ]; then "${DOCKER_MACHINE}" rm -f "${VM}" &> /dev/null || : rm -rf ~/.docker/machine/machines/"${VM}" #set proxy variables if they exists if [ "${HTTP_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env HTTP_PROXY=$HTTP_PROXY" fi if [ "${HTTPS_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env HTTPS_PROXY=$HTTPS_PROXY" fi if [ "${NO_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env NO_PROXY=$NO_PROXY" fi "${DOCKER_MACHINE}" create -d virtualbox --virtualbox-no-vtx-check $PROXY_ENV "${VM}" fi STEP="Checking status on $VM" VM_STATUS="$( set +e ; "${DOCKER_MACHINE}" status "${VM}" )" if [ "${VM_STATUS}" != "Running" ]; then "${DOCKER_MACHINE}" start "${VM}" yes | "${DOCKER_MACHINE}" regenerate-certs "${VM}" fi STEP="Setting env" eval "$("${DOCKER_MACHINE}" env --shell=bash --no-proxy "${VM}" | sed -e "s/export/SETX/g" | sed -e "s/=/ /g")" &> /dev/null #for persistent Environment Variables, available in next sessions eval "$("${DOCKER_MACHINE}" env --shell=bash --no-proxy "${VM}")" #for transient Environment Variables, available in current session STEP="Finalize" clear cat << EOF ## . ## ## ## == ## ## ## ## ## === /"""""""""""""""""\___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\_______/
EOF
echo -e "${BLUE}docker${NC} is configured to use the ${GREEN}${VM}${NC} machine with IP ${GREEN}$("${DOCKER_MACHINE}" ip ${VM})${NC}"
echo "For help getting started, check out the docs at https://docs.docker.com"
echo
echo
#cd #Bad: working dir should be whatever directory was invoked from rather than fixed to the Home folder
docker () {
MSYS_NO_PATHCONV=1 docker.exe "$@"
}
export -f docker
if [ $# -eq 0 ]; then
echo "Start interactive shell"
exec "$BASH" --login -i
else
echo "Start shell with command"
exec "$BASH" -c "$*"
fi
(2)把start.sh这个文件拖进Docker toolbox的安装目录,替换原有的启动项文件。(注意:需要管理员权限!!)
3.启动docker
(1)双击桌面上的Docker Quickstart Terminal图标进行启动
(2)经过漫长的等待会出现以下画面,这就意味着成功了
PS:启动docker时会出现卡在waiting for an IP…的情况,可以选择断开网络,再进行启动。这一步骤主要利用GitHub是boot2docker.iso进行升级。安装docker前可以测试一下和GitHub的连接,如果网络状况良好,就不用断网,等上大概五六分钟就好了。
(3)可以输入命令docker-machine ls
查看一下
(4)这里附上菜鸟教程的docker命令大全,方便学习。
二、搭建hadoop集群
1.停止docker容器
命令:docker-machine stop default
2.创建共享文件夹
(1)创建下图的3个文件夹,目录不能改变
(2)打开Oracle VM VirtualBox并设置docker的共享文件夹
(3)可以顺便设置一下docker容器的其他配置信息,配置完成后点击Docker Quickstart Terminal图标启动
3.创建docker-compose.yml文件
(1)创建目录e:\spark\docker\hadoop
(2)创建docker-compose.yml文件
在文件中输入以下命令
version: "2"
services:
#hue:
# image: gethue/hue
# hostname: hue
# container_name: hue
# volumes:
# - ./hue.ini:/usr/share/hue/desktop/conf/z-hue.ini
# ports:
# - "8888:8888"
#filebrowser:
# image: bde2020/hdfs-filebrowser
# hostname: filebrowser
# container_name: filebrowser
# domainname: hadoop
# #net: hadoop
# environment:
# - NAMENODE_HOST=namenode
# ports:
# - "18088:8088"
namenode:
image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
container_name: namenode
volumes:
- /data-volumes/hadoop_namenode/:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
ports:
- "9999:50070"
- "8020:8020"
resourcemanager:
image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
container_name: resourcemanager
depends_on:
- namenode
- datanode1
- datanode2
env_file:
- ./hadoop.env
ports:
- "8088:8088"
#historyserver:
# image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
# container_name: historyserver
# depends_on:
# - namenode
# - datanode1
# - datanode2
#volumes:
# - ../data-volumes/hadoop_historyserver/:/hadoop/yarn/timeline
# env_file:
# - ./hadoop.env
# ports:
# - "8188:8188"
nodemanager1:
image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
container_name: nodemanager1
depends_on:
- namenode
- datanode1
- datanode2
env_file:
- ./hadoop.env
ports:
- "8042:8042"
datanode1:
image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
container_name: datanode1
depends_on:
- namenode
volumes:
- /data-volumes/hadoop_datanode1/:/hadoop/dfs/data
env_file:
- ./hadoop.env
ports:
- "50075:50075"
datanode2:
image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
container_name: datanode2
depends_on:
- namenode
volumes:
- /data-volumes/hadoop_datanode2/:/hadoop/dfs/data
env_file:
- ./hadoop.env
networks:
default:
external:
name: hdfs_share
#docker network create hdfs_share
4.创建虚拟网络
(1)进入Git bash命令窗口
(2)输入命令docker network create hdfs_share
5.docker加载hadoop镜像
(1)下载镜像
链接:https://pan.baidu.com/s/1zUvtvZt7REIYi***UrLc_Q
提取码:vz4z
(2)将这四个文件拖到如下的目录里
将hadoop.env和hue.ini拖入下图的文件夹中
(3)在docker的命令窗口里分别输入如下命令
docker load -i E:/spark/docker/images/bde2020_hadoop-namenode_1.1.0-hadoop2.7.1-java8.tar.gz
docker load -i E:/spark/docker/images/bde2020_hadoop-datanode_1.1.0-hadoop2.7.1-java8.tar.gz
docker load -i E:/spark/docker/images/bde2020_hadoop-resourcemanager_1.1.0-hadoop2.7.1-java8.tar.gz
docker load -i E:/spark/docker/images/bde2020_hadoop-nodemanager_1.1.0-hadoop2.7.1-java8.tar.gz
(4)输入命令docker images -a
查看
(5)cd到E:/spark/docker/hadoop下
命令cd E:/spark/docker/hadoop
(6)输入命令docker-compose.exe -f hadoop-compose.yml up -d
初始化hadoop集群
(7)利用命令·docker ps -a·验证一下
(8)访问 http://192.168.99.100:9999/dfshealth.html#tab-datanode,查看hadoop的datanode的两个节点是否正常。
三、搭建spark集群
1.新建目录
新建目录E:\Spark\docker\spark
2.新建docker-compose.yml文件
在新建的目录里创建docker-compose.yml,并输入以下内容
version: '2'
services:
spark-master:
image: bde2020/spark-master:2.4.4-hadoop2.7
container_name: spark-master
ports:
- "8080:8080"
- "7077:7077"
environment:
- INIT_DAEMON_STEP=setup_spark
spark-worker-1:
image: bde2020/spark-worker:2.4.4-hadoop2.7
container_name: spark-worker-1
depends_on:
- spark-master
ports:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://spark-master:7077"
networks:
default:
external:
name: hdfs_share
3.docker加载Spark镜像
(1)下载镜像
链接:https://pan.baidu.com/s/1zUvtvZt7REIYi***UrLc_Q
提取码:vz4z
(2)将这两个镜像放到如下目录
(3)打开docker的命令窗口并输入:
docker load -i E:/spark/docker/images/bde2020_spark-master_2.4.4-hadoop2.7.tar.gz
docker load -i E:/spark/docker/images/bde2020_spark-worker_2.4.4-hadoop2.7.tar.gz
(4)利用命令docker images -a
查看
(5)cd 到 E:\spark\docker\spark
命令cd E:/spark/docker/spark
(6)输入命令docker-compose.exe -f spark-compose.yml up -d
启动spark集群
(7)输入命令docker ps -a
可以查看状态
4) 验证spark服务
访问 http://192.168.99.100:8080/,查看spark的各个节点是否正常。
PS:192.168.99.10,如果用虚拟机linux运行docker,在浏览器访问docke容器服务时,不要用localhost或者127.0.0.1,而是用docker 创建的Oracle VM这个虚拟机linux的ip地址。