一、安装Docker toolbox

下载链接:(DockerToolbox-18.03.0-ce.exe

1.运行安装包

双击安装包,在安装界面勾选上Git(如果你已经有了Git就不用勾选)

注意:使用docker需要在BIOS里开启CPU虚拟化技术,并关闭Windows系统的Hype-V功能,否则会报错!!!

成功提示:

2.修改启动项

(1)新建一个start.sh的文件,将以下代码复制进去

#!/bin/bash

trap '[ "$?" -eq 0 ] || read -p "Looks like something went wrong in step ´$STEP´... Press any key to continue..."' EXIT

#Quick Hack: used to convert e.g. "C:\Program Files\Docker Toolbox" to "/c/Program Files/Docker Toolbox"
win_to_unix_path(){ 
	wd="$(pwd)"
	cd "$1"
		the_path="$(pwd)"
	cd "$wd"
	echo $the_path
}

# This is needed to ensure that binaries provided
# by Docker Toolbox over-ride binaries provided by
# Docker for Windows when launching using the Quickstart.
export PATH="$(win_to_unix_path "${DOCKER_TOOLBOX_INSTALL_PATH}"):$PATH"
VM=${DOCKER_MACHINE_NAME-default}
DOCKER_MACHINE="${DOCKER_TOOLBOX_INSTALL_PATH}\docker-machine.exe"

STEP="Looking for vboxmanage.exe"
if [ ! -z "$VBOX_MSI_INSTALL_PATH" ]; then
  VBOXMANAGE="${VBOX_MSI_INSTALL_PATH}VBoxManage.exe"
else
  VBOXMANAGE="${VBOX_INSTALL_PATH}VBoxManage.exe"
fi

BLUE='\033[1;34m'
GREEN='\033[0;32m'
NC='\033[0m'

#clear all_proxy if not socks address
if  [[ $ALL_PROXY != socks* ]]; then
  unset ALL_PROXY
fi
if  [[ $all_proxy != socks* ]]; then
  unset all_proxy
fi

if [ ! -f "${DOCKER_MACHINE}" ]; then
  echo "Docker Machine is not installed. Please re-run the Toolbox Installer and try again."
  exit 1
fi

if [ ! -f "${VBOXMANAGE}" ]; then
  echo "VirtualBox is not installed. Please re-run the Toolbox Installer and try again."
  exit 1
fi

"${VBOXMANAGE}" list vms | grep \""${VM}"\" &> /dev/null VM_EXISTS_CODE=$? set -e STEP="Checking if machine $VM exists" if [ $VM_EXISTS_CODE -eq 1 ]; then "${DOCKER_MACHINE}" rm -f "${VM}" &> /dev/null || : rm -rf ~/.docker/machine/machines/"${VM}" #set proxy variables if they exists if [ "${HTTP_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env HTTP_PROXY=$HTTP_PROXY" fi if [ "${HTTPS_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env HTTPS_PROXY=$HTTPS_PROXY" fi if [ "${NO_PROXY}" ]; then PROXY_ENV="$PROXY_ENV --engine-env NO_PROXY=$NO_PROXY" fi "${DOCKER_MACHINE}" create -d virtualbox --virtualbox-no-vtx-check $PROXY_ENV "${VM}" fi STEP="Checking status on $VM" VM_STATUS="$( set +e ; "${DOCKER_MACHINE}" status "${VM}" )" if [ "${VM_STATUS}" != "Running" ]; then "${DOCKER_MACHINE}" start "${VM}" yes | "${DOCKER_MACHINE}" regenerate-certs "${VM}" fi STEP="Setting env" eval "$("${DOCKER_MACHINE}" env --shell=bash --no-proxy "${VM}" | sed -e "s/export/SETX/g" | sed -e "s/=/ /g")" &> /dev/null #for persistent Environment Variables, available in next sessions eval "$("${DOCKER_MACHINE}" env --shell=bash --no-proxy "${VM}")" #for transient Environment Variables, available in current session STEP="Finalize" clear cat << EOF ## . ## ## ## == ## ## ## ## ## === /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/

EOF
echo -e "${BLUE}docker${NC} is configured to use the ${GREEN}${VM}${NC} machine with IP ${GREEN}$("${DOCKER_MACHINE}" ip ${VM})${NC}"
echo "For help getting started, check out the docs at https://docs.docker.com"
echo
echo 
#cd #Bad: working dir should be whatever directory was invoked from rather than fixed to the Home folder

docker () {
  MSYS_NO_PATHCONV=1 docker.exe "$@"
}
export -f docker

if [ $# -eq 0 ]; then
  echo "Start interactive shell"
  exec "$BASH" --login -i
else
  echo "Start shell with command"
  exec "$BASH" -c "$*"
fi

(2)把start.sh这个文件拖进Docker toolbox的安装目录,替换原有的启动项文件。(注意:需要管理员权限!!)

3.启动docker

(1)双击桌面上的Docker Quickstart Terminal图标进行启动

(2)经过漫长的等待会出现以下画面,这就意味着成功了
PS:启动docker时会出现卡在waiting for an IP…的情况,可以选择断开网络,再进行启动。这一步骤主要利用GitHub是boot2docker.iso进行升级。安装docker前可以测试一下和GitHub的连接,如果网络状况良好,就不用断网,等上大概五六分钟就好了。

(3)可以输入命令docker-machine ls查看一下

(4)这里附上菜鸟教程的docker命令大全,方便学习。

二、搭建hadoop集群

1.停止docker容器

命令:docker-machine stop default

2.创建共享文件夹

(1)创建下图的3个文件夹,目录不能改变

(2)打开Oracle VM VirtualBox并设置docker的共享文件夹

(3)可以顺便设置一下docker容器的其他配置信息,配置完成后点击Docker Quickstart Terminal图标启动

3.创建docker-compose.yml文件

(1)创建目录e:\spark\docker\hadoop
(2)创建docker-compose.yml文件
在文件中输入以下命令

version: "2"

services:
  
  #hue:
  # image: gethue/hue
  # hostname: hue
  # container_name: hue 
  # volumes:
  # - ./hue.ini:/usr/share/hue/desktop/conf/z-hue.ini
  # ports:
  # - "8888:8888"
      
  #filebrowser:
  # image: bde2020/hdfs-filebrowser
  # hostname: filebrowser
  # container_name: filebrowser
  # domainname: hadoop
  # #net: hadoop
  # environment:
  # - NAMENODE_HOST=namenode
  # ports:
  # - "18088:8088"
    
  namenode:
    image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
    container_name: namenode
    volumes:
      - /data-volumes/hadoop_namenode/:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    ports:
      - "9999:50070"
      - "8020:8020"
  
  resourcemanager:
    image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
    container_name: resourcemanager
    depends_on:
      - namenode
      - datanode1
      - datanode2
    env_file:
      - ./hadoop.env
    ports:
      - "8088:8088"
      
  #historyserver:
  # image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
 # container_name: historyserver
  # depends_on:
   # - namenode
    # - datanode1
     # - datanode2
    #volumes:
    # - ../data-volumes/hadoop_historyserver/:/hadoop/yarn/timeline
   # env_file:
    # - ./hadoop.env
   # ports:
    # - "8188:8188"
  
  nodemanager1:
    image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
    container_name: nodemanager1
    depends_on:
      - namenode
      - datanode1
      - datanode2
    env_file:
      - ./hadoop.env
    ports:
      - "8042:8042"
  
  datanode1:
    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
    container_name: datanode1
    depends_on:
      - namenode
    volumes:
      - /data-volumes/hadoop_datanode1/:/hadoop/dfs/data
    env_file:
      - ./hadoop.env
    ports:
      - "50075:50075"
  
  datanode2:
    image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
    container_name: datanode2
    depends_on:
      - namenode
    volumes:
      - /data-volumes/hadoop_datanode2/:/hadoop/dfs/data
    env_file:
      - ./hadoop.env 

networks:
  default:
    external:
      name: hdfs_share
      
#docker network create hdfs_share

4.创建虚拟网络

(1)进入Git bash命令窗口
(2)输入命令docker network create hdfs_share

5.docker加载hadoop镜像

(1)下载镜像
链接:https://pan.baidu.com/s/1zUvtvZt7REIYi***UrLc_Q
提取码:vz4z

(2)将这四个文件拖到如下的目录里

将hadoop.env和hue.ini拖入下图的文件夹中

(3)在docker的命令窗口里分别输入如下命令

docker load -i E:/spark/docker/images/bde2020_hadoop-namenode_1.1.0-hadoop2.7.1-java8.tar.gz

docker load -i E:/spark/docker/images/bde2020_hadoop-datanode_1.1.0-hadoop2.7.1-java8.tar.gz

docker load -i E:/spark/docker/images/bde2020_hadoop-resourcemanager_1.1.0-hadoop2.7.1-java8.tar.gz

docker load -i E:/spark/docker/images/bde2020_hadoop-nodemanager_1.1.0-hadoop2.7.1-java8.tar.gz

(4)输入命令docker images -a查看

(5)cd到E:/spark/docker/hadoop下
命令cd E:/spark/docker/hadoop

(6)输入命令docker-compose.exe -f hadoop-compose.yml up -d初始化hadoop集群

(7)利用命令·docker ps -a·验证一下

(8)访问 http://192.168.99.100:9999/dfshealth.html#tab-datanode,查看hadoop的datanode的两个节点是否正常。

三、搭建spark集群

1.新建目录

新建目录E:\Spark\docker\spark

2.新建docker-compose.yml文件

在新建的目录里创建docker-compose.yml,并输入以下内容

version: '2'
services:
  spark-master:
    image: bde2020/spark-master:2.4.4-hadoop2.7
    container_name: spark-master
    ports:
      - "8080:8080"
      - "7077:7077"
    environment:
      - INIT_DAEMON_STEP=setup_spark
  spark-worker-1:
    image: bde2020/spark-worker:2.4.4-hadoop2.7
    container_name: spark-worker-1
    depends_on:
      - spark-master
    ports:
      - "8081:8081"
    environment:
      - "SPARK_MASTER=spark://spark-master:7077"
      
networks:
  default:
    external:
      name: hdfs_share
      

3.docker加载Spark镜像

(1)下载镜像
链接:https://pan.baidu.com/s/1zUvtvZt7REIYi***UrLc_Q
提取码:vz4z

(2)将这两个镜像放到如下目录

(3)打开docker的命令窗口并输入:

docker load -i E:/spark/docker/images/bde2020_spark-master_2.4.4-hadoop2.7.tar.gz

docker load -i E:/spark/docker/images/bde2020_spark-worker_2.4.4-hadoop2.7.tar.gz  

(4)利用命令docker images -a查看

(5)cd 到 E:\spark\docker\spark
命令cd E:/spark/docker/spark

(6)输入命令docker-compose.exe -f spark-compose.yml up -d启动spark集群

(7)输入命令docker ps -a可以查看状态

4) 验证spark服务
访问 http://192.168.99.100:8080/,查看spark的各个节点是否正常。

PS:192.168.99.10,如果用虚拟机linux运行docker,在浏览器访问docke容器服务时,不要用localhost或者127.0.0.1,而是用docker 创建的Oracle VM这个虚拟机linux的ip地址。