Usually, when you have a Linux distro, it would come with the standard ISO, package manager (yum, yast, apt) to distribute all of the individual pieces of compiled software, and other programs like netcat, httpd, etc. that just ends up being extra overhead. CoreOS differs in that they distribute an entire disk image that’s essentially a stripped down Linux kernel with a few PaaS-like features. So now, rather than shipping with all those programs right out of the gate when you may not ever utilize them, we only pull in the ones we need, and run them in an isolated, lightweight, slice in the form of a Linux container (LXC, KVM, Docker). We adhere to the one process, one container principle and run these containers side-by-side over any number of CoreOS machines.
So within this system, we trust Docker to:
- Be responsible for the containerization of all of your services / applications
- Provide a mechanism for searching and transporting containerized services
Before we dive into etcd, let’s take a bird’s eye view of the system as a whole.
Prerequisites: We’re going to be revisiting some of the concepts we covered in the Intro to Systemd and Unit Files article, but in the context of this CoreOS provisioning and orchestration system. It also wouldn’t hurt to be familiar with Docker.
So let’s say you set up your CoreOS machines, and you have containers and copies of containers with all of your services running on them under load balancers. The next problem you’ll face is how to distribute configuration across all these nodes. In the past, you’d leverage a configuration management system such as Puppet, Chef, Saltstack, or Ansible. etcd (pronounced “et see dee”) does away with all of those alternatives by distributing the /etc directory of each node across the entire cluster so that any of the values written in are replicated in a fault-tolerant manner, allowing you to easily recover from lost nodes. In short, etcd is a highly-avaliable, distributed key-value store that is used for service discovery and configuration management for clusters.
Some other things to point out:
- Similar to Consul and Zookeeper
- Shallower learning curve than Mesos, which requires you to understand Zookeeper for service discovery and locking and Hadoop for scheduling
- Frees the user administrator from thinking about when updates are going to be applied to the infrastructure by leveraging Locksmith for coordination
So after addressing containerization and configuration mangement, the entire process still has us concerned with who each of the individual resources are. Wouldn’t it be much each to just declare what needs to be run and have a mechanism launch it from a pool of resources? fleet to the rescue. fleet is a service that uses etcd to provide a distributed init system; basically, systemd at the cluster level.It stores the definition of a service (unit file) in etcd, making it accessible to the entire cluster, so that all of the nodes and come to a concensus as to which machine should be running that service.
If any of the nodes fail, the entire cluster becomes aware that a service is no longer running, and the start the service back up on one of the other available nodes. This is fault tolerance.
You should already be familiar with how port mapping works with Docker. While it’s nice to have isolated containers and this port-mapping taken care of for you, it’s difficult for containers to publicly expose their external IP and port. flannel addresses this by giving each container an IP address, allowing it to communicate with other containers. It also creates a cluster-wide virtual overlay network through packet encapsulation. etcd, again, is utilized to store the association between virtual IP and host addresses. Finally, a flanneld background process runs on each host, detecting changes in etcd and routes the packets, accordingly.