Notes On Docker
Contents
local
We will look at deploying hadoop cluster using docker. Classic use case to illustrate power of docker.
Install docker on your system. Preferably use docker repository to get latest docker. Check the status:
sudo systemctl status docker
Search and identify the available images to pull from. (Note: You can also build image from your DockerFile) :
docker search centos
docker search hadoop
# If you want to display all tags of specific repo ...
docker run harisekhon/pytools dockerhub_show_tags.py centos
docker run harisekhon/pytools dockerhub_show_tags.py harisekhon/hadoop
Pull image from repo or build image from your DockerFile :
docker pull centos:7 # centos:7 === image_name:tag_name
docker images -a # List all available images.
docker pull harisekhon/hadoop:2.9
docker pull uhopper/hadoop
Invoke container from available image and attach stdin/stdout :
docker container run -it uhopper/hadoop bash # -i Interactive; -t attach pseudo tty
# The option -dt means: Attach tty, but run in background -d daemon mode.
To create 1 Master + 2 Slaves with following services:
* Create Simple primitive 2 slaves cluster .. :
# Just to create basic Hadoop cluster with hdfs ...
# See https://github.com/codito/hadoop-expt
git clone https://github.com/codito/hadoop-expt
cd hadoop-expt
docker-compose -f hadoop-basic.yml up
docker inspect namenode | grep IPAddress
xdg-open http://172.18.0.3:50070/ # See name node data nodes setup details here.
# See https://bitbucket.org/uhopper/hadoop-docker/src/master/README.md
# Install docker and docker-compose using yum install or apt install etc.
sudo systemctl status docker
docker run hello-world
docker search centos # Search and see most popular repos.
docker search hadoop #
docker pull centos:7
docker images # images are snapshot of containers.
docker container run -it centos bash # container created from image on "run"
docker rm contaier-id
docker rmi image-id
# Inspect data root dir and other network details
docker info
docker ps # lists running containers
dock container ls -a # List all
dock container rm container-id # List all
docker port namenode # No ports exposed yet.
# See https://medium.freecodecamp.org/expose-vs-publish-docker-port-commands-explained-simply-434593dbc9a3
# You must expose from container *and* publish from host for port mapping to work.
# To use ssh to initiate port forwarding
# Here’s how we can set the blocksize.
# Set following in the hadoop.env file
HDFS_CONF_dfs_blocksize=1m
Ctrl-C
docker-compose up # Recreates /etc/ files using the .env file values.
docker exec -it namenode bash
root@namenode:/$ cd tmp
Download Complete Works of Mark Twain and push it to HDFS.
# Download Complete Works of Mark Twain from Project Gutenberg
root@namenode:/tmp$ curl -Lo marktwain.txt http://www.gutenberg.org/cache/epub/3200/pg3200.txt
root@namenode:/tmp$ hdfs dfs -mkdir -p hdfs://namenode:8020/project/wordcount
root@namenode:/tmp$ hdfs dfs -cp file:///tmp/marktwain.txt hdfs://namenode:8020/project/wordcount/marktwain.txt
Check if the file is uploaded via command line.
root@namenode:/tmp$ hdfs dfs -ls -R hdfs://namenode:8020/project
drwxr-xr-x - root supergroup 0 2017-10-28 14:21 hdfs://namenode:8020/project/wordcount
-rw-r--r-- 3 root supergroup 16013958 2017-10-28 14:21 hdfs://namenode:8020/project/wordcount/marktwain.txt
This is official centos repository. Many docker files are built on top of it. See https://hub.docker.com/_/centos/ Size shows as only 75MB but after pull it shows around 200 MB size.
Base Hadoop image with dynamic configuration based on environment variables. It makes it easy to compose hadoop clusters using docker compose files using same image but configuring different nodes using env variables.
It creates a generic Hadoop installed container.
In addition, this user has very useful other resources ... See https://github.com/HariSekhon/Dockerfiles
All you may have to do to fire up new node:
Above is shortcut for: docker run -ti -p 2181:2181 harisekhon/zookeeper
1M+ downloads. The size is 4GB+ Contains entire cloudera quickstart vm contents. Updated 3+ years ago.
systemctl stop docker
mv /var/lib/docker /raid/docker-data
ln -s /raid/docker-data /var/lib/docker
systemctl start docker
# Alternative: Update (create if not exists) /etc/docker/daemon.json.
# Not recommended since third party tools may expect /var/lib/docker.
{
"data-root": "/raid/docker-data"
}
docker container run -it --rm centos bash
# As soon as youexit the shell, the container is removed.
# docker run is older syntax. docker container run is new syntax.
Use docker attach to attach your terminal’s standard input, output, and error (or any combination of the three) to a running container using the container’s ID or name.
The container is tty enabled if it was invoked with -t option.
To interrupt and kill you can use Ctrl-C.
To detach, use C-p C-q
They are same. docker container ls is new style and consistent :
docker network ls
docker image ls
docker volume ls
Similarly these commands are equivalent too :
Old New
run container run