Notes On Docker

Contents

local

Overview

We will look at deploying hadoop cluster using docker. Classic use case to illustrate power of docker.

image

Work Flow

  • Install docker on your system. Preferably use docker repository to get latest docker. Check the status:

    sudo systemctl status docker
    
  • Search and identify the available images to pull from. (Note: You can also build image from your DockerFile) :

    docker search centos
    docker search hadoop
    
    # If you want to display all tags of specific repo ...
    docker run harisekhon/pytools dockerhub_show_tags.py centos   
    docker run harisekhon/pytools dockerhub_show_tags.py harisekhon/hadoop
    
  • Pull image from repo or build image from your DockerFile :

    docker pull centos:7   # centos:7   ===  image_name:tag_name
    docker images -a       # List all available images.
    
    docker pull harisekhon/hadoop:2.9
    docker pull uhopper/hadoop
    
  • Invoke container from available image and attach stdin/stdout :

    docker container run -it uhopper/hadoop  bash   # -i Interactive; -t attach pseudo tty
    # The option -dt means: Attach tty, but run in background -d daemon mode.
    

Create Hadoop cluster

To create 1 Master + 2 Slaves with following services:

  • Master node to run NameNode and ResourceManager (They can run in different nodes in general). The NameNode and DataNodes belong to HDFS and totally independent of Resource Manager and Node Managers.
  • Slave Node to run NodeManager and DataNode services.
  • Total Services 2 + 2*2 = 6 Services. These 6 Services could run in 6 nodes, if need be.
  • Let us use docker cluster from http://github.com:Lewuathe/docker-hadoop-cluster.git Also See: https://hub.docker.com/r/lewuathe/hadoop-base This lets you build your own docker configuration with the base image of your version.

* Create Simple primitive 2 slaves cluster .. :

# Just to create basic Hadoop cluster with hdfs ...
# See https://github.com/codito/hadoop-expt

git clone https://github.com/codito/hadoop-expt
cd hadoop-expt
docker-compose -f hadoop-basic.yml up

docker inspect namenode | grep IPAddress

xdg-open http://172.18.0.3:50070/     # See name node data nodes setup details here.

# See https://bitbucket.org/uhopper/hadoop-docker/src/master/README.md

Synopsis

# Install docker and docker-compose using yum install or apt install etc.

sudo systemctl status docker

docker run hello-world

docker search centos               # Search and see most popular repos.
docker search hadoop               # 


docker pull centos:7
docker images                         # images are snapshot of containers.
docker container run -it centos bash  # container created from image on "run" 
docker rm  contaier-id
docker rmi image-id


# Inspect data root dir and other network details
docker info


docker ps                             # lists running containers
dock container ls -a                  # List all
dock container rm container-id        # List all
docker port namenode                  # No ports exposed yet.

# See  https://medium.freecodecamp.org/expose-vs-publish-docker-port-commands-explained-simply-434593dbc9a3
# You must expose from container *and* publish from host for port mapping to work.
# To use ssh to initiate port forwarding 



# Here’s how we can set the blocksize.

# Set following in the hadoop.env file
HDFS_CONF_dfs_blocksize=1m

Ctrl-C
docker-compose up  # Recreates /etc/ files using the .env file values.

docker exec -it namenode bash

root@namenode:/$ cd tmp
Download Complete Works of Mark Twain and push it to HDFS.

# Download Complete Works of Mark Twain from Project Gutenberg
root@namenode:/tmp$ curl -Lo marktwain.txt http://www.gutenberg.org/cache/epub/3200/pg3200.txt
root@namenode:/tmp$ hdfs dfs -mkdir -p hdfs://namenode:8020/project/wordcount
root@namenode:/tmp$ hdfs dfs -cp file:///tmp/marktwain.txt hdfs://namenode:8020/project/wordcount/marktwain.txt
Check if the file is uploaded via command line.

root@namenode:/tmp$ hdfs dfs -ls -R hdfs://namenode:8020/project
drwxr-xr-x   - root supergroup          0 2017-10-28 14:21 hdfs://namenode:8020/project/wordcount
-rw-r--r--   3 root supergroup   16013958 2017-10-28 14:21 hdfs://namenode:8020/project/wordcount/marktwain.txt

centos

  • 10M+ downloads.

This is official centos repository. Many docker files are built on top of it. See https://hub.docker.com/_/centos/ Size shows as only 75MB but after pull it shows around 200 MB size.

uhopper/hadoop

  • 1M+ Downloads

Base Hadoop image with dynamic configuration based on environment variables. It makes it easy to compose hadoop clusters using docker compose files using same image but configuring different nodes using env variables.

harisekhon/hadoop

  • 10K+ Downloads

It creates a generic Hadoop installed container.

In addition, this user has very useful other resources ... See https://github.com/HariSekhon/Dockerfiles

All you may have to do to fire up new node:

  • cd zookeeper && docker-compose up
  • cd zookeeper && make run # IF you want interactive shell.

Above is shortcut for: docker run -ti -p 2181:2181 harisekhon/zookeeper

cloudera/quickstart

1M+ downloads. The size is 4GB+ Contains entire cloudera quickstart vm contents. Updated 3+ years ago.

FAQ

How do I change data root dir of docker in Ubuntu 18.04 ?

systemctl stop docker
mv /var/lib/docker /raid/docker-data
ln -s /raid/docker-data /var/lib/docker
systemctl start docker

# Alternative: Update (create if not exists) /etc/docker/daemon.json.
# Not recommended since third party tools may expect /var/lib/docker.
{
  "data-root": "/raid/docker-data"
}

How to invoke a temporary environment ?

docker container run -it --rm centos bash   
# As soon as youexit the shell, the container is removed.
# docker run is older syntax. docker container run is new syntax.

How to attach and detach ?

Use docker attach to attach your terminal’s standard input, output, and error (or any combination of the three) to a running container using the container’s ID or name.

The container is tty enabled if it was invoked with -t option.

To interrupt and kill you can use Ctrl-C.

To detach, use C-p C-q

What is difference between docker container ls and docker ps ?

They are same. docker container ls is new style and consistent :

docker network ls
docker image   ls
docker volume  ls

Similarly these commands are equivalent too :

Old                      New
run                      container run