Notes on Dockers and Kubernetes
It uses cgroups (control groups) to limit resources access per process.
It uses Namespacing (Isolating resources per process)
Container represents application which includes one or many services.
Docker Swarms and Kubernetes can be compared. However Kubernetes is more sophisticated, provides in-built tools for logging and monitoring. Does auto-scaling!!!, needs manual load balancing configuration (docker swam does auto load-balancing), does better rolling updates. Docker swam uses 3rd party tools like ELK for monitoring where k8s has better in-built tools.
docker does not run init, cron, syslog processes. You may want to use phusion/baseimage or phusion/passenger-docker if you want to avoid too many zombie processes. If you want to run multiple processes in background, you can use phusion images or use supervisord process.
Example Dockerfile :
# syntax=docker/dockerfile:1
FROM node:12-alpine
RUN apk add --no-cache python2 g++ make
WORKDIR /app
COPY . .
RUN yarn install --production
CMD ["node", "src/index.js"]
Build and run docker image:
docker build -t getting-started . # Tag the image.
docker run -dp 3000:3000 getting-started <my_optional_cmd>
You can have multiple RUN commands to be executed within the image. You can have atmost only one CMD which is default command. You can override this during invocation.
docker-compose is useful:
- To bringup an application which depends on multiple containers in specific order
- Create proper networking artifacts among them. Easy to specify port forwarding settings.
- Each container in docker-compose.yaml file is called a service.
The service name can be used as hostname from other containers.
To wait for one docker container before starting another container, you can make use of health-check feature in docker-compose. That will enforce depdendency wait before forking other containers. See https://stackoverflow.com/questions/31746182/docker-compose-wait-for-container-x-before-starting-y
You can use docker volumes to share the directory between host and containers. This breaks isolation, but works to share resources.
Commands:
docker version
docker ps
docker images
docker run hello-world
docker run -it ubuntu bash
docker run busybox echo hi there # we override the default command
docker run busybox ping google.com; docker ps --all
docker login
docker create hello-world
docker start
## docker run === docker create + docker start
docker-compose -v
docker system prune ## Delete unused images
docker logs <container-id>
docker stop <container-id> ## Sends SIGTERM
docker kill <container-id> ## Sends SIGKILL
docker run redis
docker exec -it <container-id> redis-cli ## Run additional command on running container.
## -it - interactive, allows stdin to be forwarded.
docker exec -it <container-id> bash ## Most common use
## Create a new container after running few commands. Take snapshot of current docker image.
docker commit -c 'CMD ["redis-server"]' <container-id>
docker run -p 8080:80 <image-id> ## Forward localhost:8080 to container's port 80.
## Dockerfile vs docker-compose.yml
docker-compose up ## Similar to: docker run my-image
docker-compose up --build ## Rebuild and run
docker-compose ps ## Looks for ./docker-compose.yml to list containers belonging to this.
See https://kubernetes.io/docs/concepts/overview/components/
Cluster nodes - Master Node and Worker nodes
Self healing i.e. reschedule and replace containers,
Gives indepedent IP address and DNS resolution for containers.
Can auto-mount storage volumes from public cloud or local.
Master Node Components:
- API Server:
- REST API server,
- Reads/writes into etcd.
- Can scale horizontally.
- Can configure secondary API servers so that primary acts like proxy load balancer.
- Clients can directly interact with worker nodes for main application workload. Need not be routed through master. API Server only provides configuration entrypoint and also service endpoint lookup DNS service ??
Scheduler:
etcd: Distributed key/value store:
controller-Managers:
kube-controller-manager:
cloud-controller-manager:
Could be many master nodes for HA. The etcd is better than zookeeper -distributed key-value store. https://news.ycombinator.com/item?id=18687516
Worker Node: kubelet, Kube-Proxy, Pods. kubelet is node agent which runs the containers as directed by scheduler. Kube-proxy is the networking proxy agent which maintains subnets, routing and helps with load balancing. Each pod can contain many docker instances.
Node is like EC2. Pod is like a VM. Container is a docker instance. k8s does not handle container directly. Smallest unit of management is a Pod. Pod is an abstraction layer. Note that a Container could be docker or it could also be rkt container or even a VM. Pod is a logical host. Containers in pod share same IP and Ports space. Containers in same Pod share logical volumes. Multi-container Pods may use some container as helper or proxy or bridge for the other container. (e.g. nginx reverse proxy to nodejs application) We avoid bundling many processes in single container - One process per container is easier to troubleshoot and reuse. Hence a collection of container for Pod makes more sense.
You can deploy brokers to Kubernetes as StatefulSets. This ensures every broker had a unique identity and ensures that the persistent volumes (data) of each broker remains attached to the same pod (and are never interchanged between pods).
start minkube with VM Driver:
minikube start --driver=hyperv
# By default docker driver will be used instead of VM driver e.g. hyperv
# The docker driver may not work properly.
kubeadm tool: used for provisioning cluster. Mainly used to automate and test cluster not much in production.:
kubeadm init --apiserver-advertise-address=192.168.3.19
--pod-network-cidr=192.168.0.0/16
# It prepares the host as masternode
# If you want to uninstall master node config: kubeadm reset
# For worker nodes to join the cluster, run following as root there:
kubeadm join 192.168.3.19:6443 --token asdfghjklll ...
File locations and files:
- clientpod.yaml : Specifies POD configuration:
{
apiVersion: v1, kind: 'Pod', // apiVersion limits the kind option values. name: 'client-pod', containers: [ { name: 'client', image: 'myuser/mysql', ports: <exposed-ports> }]}
client-node-port.yaml : Specifies node port mappings.
{
kind: 'Service', // In Kubernetes context, service is network
service. name: 'client-node-port', // Service name creates pseudo
hostname with IP. // curl http://client-node-port:8080 // Above
used for inter-pod communication. // The nodePort is applied to
top-level VM IP. // http://VM-IP:32100 is auto balanced to
http://pod-IP:80 // Note that Every POD has unique IP. type:
NodePort | ClusterIP | LoadBalancer, ports: [ { port: 8080,
targetPort:80, nodePort:32100 } ] // // Better leave nodePort out,
it will be auto-assigned to 30000-32767 // // containerPort =>
Node-Port => Host-Port mapping. // Host:port auto load balanced to
all node ports ??? selector: { component: 'web' }
}
Config files used to create Objects of type: Pod | Service | StatefulSet | ReplicaController
/var/lib/kubelet/config.yaml
/etc/kubernetes/pki/ (certificates for API server and for healthchecks)
/etc/kubernetes/admin.conf, kubelet.conf, controller-manager.conf, scheduler.conf, manifests/kube-scheduler.yaml, etc
control plane runs as static PODs as defined in manifests/.yaml files.
To create password:
# kubectl create secret ...
# kubectl get secrets
To deploy mysql container on your cluster, you just need .yaml file:
# kubectl create -f \
https://k8s.io/examples/application/word-press/mysql-deployment.yaml
# // The yaml file specifies ports, pvc (persistent volume claim for 20GB).
# kubectl get pvc
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
hello-bzrzk 1/1 Running 0 22s 10.244.1.2 multinode-demo-m02
hello-frcvw 1/1 Running 0 22s 10.244.0.3 multinode-demo
# kubectl get services # Lists kubernetes, mysql, wordpress etc
# service type could be: LoadBalancer or ClusterIP or NodePort or Ingress
# i.e. How service endpoint works. Internal / External / Loadbalanced
# Exposure: ClusterIP < NodePort < LoadBalancer
# minikube service wordpress --url
http://192.168.99.100:31536 # Gives you service URL
# kubectl get deployments
# kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 59m // Built-in Service
# minikube ip
172.31.235.139
# You can do: docker ps and docker kill to kill a container
# but it will be auto-restarted by kubernetes.
# Every pod creates a new IP. You never directly access the POD by its IP.
# You always connect to POD using node's IP and the forwarded port.
# kubectl describe pods ## Longer form details.
# minikube docker-env ## Displays docker env variables
# eval $(minkube docker-env) ## Makes local docker client to point to docker daemon in node VM.
# docker system prune -a ## Will delete all unused images, networks, etc.
# // Rollout deployment
# kubectl create deployment mynginx --image=nginx: 1.15-alpine'
# kubectl get deploy,rs,po ## deployments,replicasets,pods
# ## -l app=mynginx to filter deployments.
# // scale the deployment up to three replicas.
# kubectl scale deploy mynginx --replicas=3
# kubectl describe deployment
# // The image is nginx: version 1.15-alpine.
# // And so far, we only have a single revision. called revision 1
# kubectl rollout history deploy mynginx [ --revision=1 ]
# // Let's upgrade our image.
# kubectl set image deployment mynginx nginx=nginx:1.16-alpine
# // Implicit roll out is kicked off after setting new image.
# kubectl rollout history deploy mynginx --revision=2
# ## it shows the 1.16-alpine image, the updated image.
# Now, let's take a quick look at our objects.
# kubectl get deploy,rs,po ## deployments,replicasets,pods
#
# // It shows the replicasets are migrated from old ones to new ones.
# // To rollback to previous version ...
# kubectl rollout undo deployment mynginx --to-revision=1
#
# Revision 3 is created which represents rollback version of version1
# you can perform up to 10 consecutive rolling updates and rollbacks.
# Now, finally, rolling updates and rollbacks
# are not specific for deployments, only.
# They are supported by other controllers, as well,
# such as DaemonSets and StatefulSets.
To Clean up:
# kubectl delete secret mysql-pass
# kubectl delete deployment -l app=wordpress
deployment "wordpress" deleted
deployment "wordpress-mysql" deleted (Note: dependents auto deleted)
# kubectl delete service -l app=wordpress
service "wordpress" deleted
service "wordpress-mysql" deleted
# kubectl delete pvc -l app=wordpress
# kubectl delete -f mypod.yaml # Delete pod that was created using this.
# Note: Deployment is responsible to run set of pods.
# Service is responsible for network access for pods.
Install minikube on windows :
# minikube start --driver=hyperv
* minikube v1.25.1 on Microsoft Windows 10 Pro 10.0.19042 Build 19042
* Using the hyperv driver based on user configuration
* Downloading VM boot image ...
> minikube-v1.25.0.iso.sha256: 65 B / 65 B [-------------] 100.00% ? p/s 0s
> minikube-v1.25.0.iso: 226.25 MiB / 226.25 MiB 100.00% 29.49 MiB p/s 7.9s
* Starting control plane node minikube in cluster minikube
* Downloading Kubernetes v1.23.1 preload ...
> preloaded-images-k8s-v16-v1...: 504.42 MiB / 504.42 MiB 100.00% 29.30 Mi
* Creating hyperv VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
* Preparing Kubernetes v1.23.1 on Docker 20.10.12 ...
- kubelet.housekeeping-interval=5m
- Generating certificates and keys ...
- Booting up control plane ...
- Configuring RBAC rules ...
* Verifying Kubernetes components...
- Using image gcr.io/k8s-minikube/storage-provisioner:v5
* Enabled addons: storage-provisioner, default-storageclass
* kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A'
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
# minikube kubectl -- get pods -A
> kubectl.exe.sha256: 64 B / 64 B [----------------------] 100.00% ? p/s 0s
> kubectl.exe: 45.62 MiB / 45.62 MiB [---------] 100.00% 12.56 MiB p/s 3.8s
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-fdbdj 1/1 Running 0 3m26s
kube-system etcd-minikube 1/1 Running 0 3m39s
kube-system kube-apiserver-minikube 1/1 Running 0 3m39s
kube-system kube-controller-manager-minikube 1/1 Running 0 3m39s
kube-system kube-proxy-qv5jp 1/1 Running 0 3m25s
kube-system kube-scheduler-minikube 1/1 Running 0 3m39s
kube-system storage-provisioner 1/1 Running 1 (2m55s ago) 3m37s
# minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
# minikube kubectl -- version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", ...}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", ...}
$ kubectl cluster-info
Kubernetes control plane is running at https://172.31.235.139:8443
CoreDNS is running at https://172.31.235.139:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
minikube IP vs ClusterIP:
- The minikube IP is a top-level VM IP address.
(Typically looks like 192.168.x.y OR 172.21.131.216)
Kubernetes Limits:
- No more than 110 pods per node
- No more than 5000 nodes
- No more than 150000 total pods
- No more than 300000 total containers
You can execute commands on specific pod like below:
kubectl exec mypod1 -c container1 -- /bin/cat /usr/share/nginx/html/index.html
# See for more info: https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/
You can build cluster imperatively or declaratively.
Non-minikube installation order:
kubeadm init --config=configfile.yaml
kubeadm init --apiserver-advertise-address=192.168.3.19 // By default, the default network interface used.
--pod-network-cidr=192.168.0.0/16
# Now empty cluster is available. worker nodes can join now.
kubeadm join 192.168.3.19:6443 --token asdfghjklll ... # Execute this from worker node.
sudo kubeadm init phase control-plane all --config=configfile.yaml
sudo kubeadm init phase etcd local --config=configfile.yaml
# you can now modify the control plane and etcd manifest files
sudo kubeadm init --skip-phases=control-plane,etcd --config=configfile.yaml
VirtualBox and HyperV Networking:
Within VirtualBox, the NAT guest always has default ethernet interface enp0s3: inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255 The host-side NAT IP is also fixed: 10.0.2.2; However it is not visible in Host when you run ipconfig or ifconfig.
If you want to network host and VM, then you should create VirtualBox Host-Only network adapter using virtualBox UI. See https://www.techrepublic.com/article/how-to-create-virtualbox-networks-with-the-host-network-manager/ And then for the guest, choose networking adapter as "Host-Only Adapter" instead of NAT.
Common Host Only network CIDR ranges start with 192.168.*.*/24 (VirtualBox prefers this range) OR 172.*.*.*/24 (Hyper-V kubernetes prefers this range). You should not use loopback IP range 127.*.*.* or link-local (169.254.0.0/16 and 224.0.0.0/24).
By default, minikube HostOnlyCIDR is 172.*.*.*/24
You can start minkube like below to specify HostOnlyCIDR:
minikube start --cpus 2 \
--memory 2048 \
--disk-size 20g \
--vm-driver virtualbox \
--network-plugin flannel \
--kubernetes-version v1.12.2 \
--host-only-cidr 192.168.77.1/24
When you want mysql running on Host to be made available to all pods, Here is a content of the mysql-service.yaml file:
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
spec:
ports:
- protocol: TCP
port: 3306
targetPort: 3306
---
apiVersion: v1
kind: Endpoints
metadata:
name: mysql-service
subsets:
- addresses:
- ip: 192.168.77.1
ports:
- port: 3306
Example Pod config file:
apiVersion: v1 // Hardcoded as v1 for Pod objects.
kind: Pod // Object type is Pod
metadata:
name: nginx-pod // Uniquely identifies the Pod.
labels:
app: nginx // Belongs to a class of app=nginx. Used to filter.
spec: // Pod Config
containers: // List of containers
- name: nginx // Container name
image: nginx:1.15.11 // DockerHub Image
ports:
- containerPort: 80 // Internal container Port
Note: Pod does not express it's desired replicas count. For that, it is used with one of the controllers: Deployments, ReplicaSets, ReplicationController Also the deployment .yaml file includes replicasets specification as well.
Deployment object describes Pods and with total number of instances (i.e. replicas)
It has it's own label and config (spec) and also assigns label and config for pods.
It can choose pods described in other .yaml file using selectors as well.
Example Deployment file:
apiVersion: apps/v1 // API endpoint of the API Server.
// Also matches existing version of Deployment object.
// If version changes, new deployment object will be created.
kind: Deployment // Object type is "Deployment"
metadata:
name: nginx-deployment
labels:
app: nginx
spec: // Configuration.
replicas: 3 // Total number of Pods desired
selector:
matchLabels:
app: nginx // Select Pods from other Pods .yaml file
template: // Pods template starts here.
metadata: // Metadata for pod
labels: // Assign labels for the Pod.
app: nginx // name is inherited from Pods .yaml file or ???
spec: // spec.template.spec is Pod config.
containers: // Containers of the Pod.
- name: nginx
image: nginx:1.15.11
ports:
- containerPort: 80 // Internal TCP port that container binds to.
// Note: The 4 required fields are: apiVersion, kind, metadata and spec.
// The extra field "status" is populated by kubernetes and tracks the
// current state vs the desired state.
ReplicaSet is next generation "Replication Controller" which is an implementation of auto-scaling for Pods. A ReplicaSet with replica count 3 for a specific Pod template would create identical Pod-1, Pod-2 and Pod-3. Replicas are created or destroyed by continously monitoring the "desired state config".
Labels are key-value pairs attached to Kubernetes objects such as:
You can select subset of objects using labels.
If multiple users and teams use same Kubernetes cluster we can partition the cluster into virtual sub-clusters using Namespaces. Object names should be unique only within the same Namespace:
$ kubectl get namespaces # These 4 created out-of-the-box.
NAME STATUS AGE
default Active 11h # Default one for new objects.
kube-node-lease Active 11h # node heartbeat data.
kube-public Active 11h # Public info about cluster.
kube-system Active 11h # Control plane agents
Provides multi-tenancy.
An abstract way to expose an application running on a set of Pods as a network service.
Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.
Service definition uses a selector to filter the Pods for which the service should apply:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
This specification creates a new Service object named "my-service", which targets TCP port 9376 on any Pod with the app=MyApp label.
Kubernetes assigns this Service an IP address (sometimes called the "cluster IP"), which is used by the Service proxies.
The controller for the Service selector continuously scans for Pods that match its selector, and then POSTs any updates to an Endpoint object (to api server) also named "my-service".
# Install Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
# Patch the dashboard to allow skipping login
kubectl patch deployment kubernetes-dashboard -n kubernetes-dashboard --type 'json' \
-p '[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--enable-skip-login"}]'
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.4.2/components.yaml
# Patch the metrisc server to work with insecure TLS
kubectl patch deployment metrics-server -n kube-system --type 'json' \
-p '[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Run the Kubectl proxy to allow accessing the dashboard
kubectl proxy
- kubeadm - Does not provision hosts, but does everything else.
- kubespray - Install on AWS, Azure, GCE, OpenStack, vSphere, or bare metal. Ansible based.
- kops, kube-aws, etc - Kubernetes incubator projects. Install on cloud using cli.
- https://github.com/kelseyhightower/kubernetes-the-hard-way
The kube-proxy is the network agent which runs on each node.
The kube-proxy is responsible for TCP, UDP, and SCTP stream forwarding or round-robin forwarding across a set of Pod backends, and it implements forwarding rules defined by users through Service API objects.
Kubernetes Networking model:
Container Network Interface:
* It is a CNCF (cloud native computing foundation) project.
* See https://github.com/containernetworking/cni/blob/master/SPEC.md
* It consists of:
- A specification and libraries for writing plugins to configure network interfaces in Linux containers.
- Also includes a number of supported plugins.
Networking challenges:
Kubernetes enables external accessibility through Services: