Automate your virtual Infrastructure

Christoph Stoettner

Agenda

  • Intro to Infrastructure as Code

  • Automate Virtual Machine Generation

    • Terraform

    • Packer

    • Ansible

Christoph Stoettner

  • Application Servers

  • DevOps, Docker, K8s, Private Cloud

  • Started with Linux / OSS around 1994/1995

    • Linux Kernel < 1.0

    • Slackware

  • VIM lover

My history (the history of an administrator)

  • First servers were bare metal

    • new applications needed new servers each time

    • wasted huge amount of resources (disk, memory, cpu)

  • Then we got our first virtualization Environment

    • Within months we had a big count of virtual servers

    • First time we had the chance to test updates in front

    • Web applications splitted in VMs, so no dependencies (updates)

      • Software, Library version

  • Server provisioning and deployment still manual

More work

  • Additional OS to patch

vmvsbaremetal

Server Sprawl

  • This phenomena is named Server Sprawl

  • Count of servers made it impossible to deploy patches on all of them

    • Updates often only deployed for high security risks

    • or if an application needed it (Java version, PHP)

  • On the other hand when 2 people install Apache 3 times httpd

    • the servers won’t be the same

Configuration Drift

First attempts to solve this

  • First scripts to get rid of daily clicks (Patches)

  • Test servers to create update documentation

  • Building silent installation files

  • Creating long checklists with click sequences

    • Worse My absolute favorite: some hundred screenshots

  • Long nights troubleshooting

    • Production servers behave different than test

      • Testserver often got all updates

      • Production only several (cummulativ) so in theory the same fixes

Configuration Drift

  • Differences can creep in over time:

  • Someone makes a fix to one of the database servers to fix a problem

  • New ticketsystem needs newer Java

  • One application server gets more traffic than the others

    • someone tunes it, and now its configuration is different

    • Differences should be captured to make it easy to reproduce and to rebuild

Unmanaged variation between servers leads to snowflake servers and automation fear.

Snowflake Servers

A snowflake server is different from any other server. It’s special in ways that can’t be replicated.

  • Once again, being different isn’t bad

    • But it’s a problem when the owners don’t understand how and why

    • wouldn’t be able to rebuild it

  • An ops team should be able

    • to confidently and quickly rebuild any server in their infrastructure.

    • Snowflake → build new reproducible process to build server

Everybody knows these famous servers

  • Don’t touch that server

  • Don’t point at it

  • Don’t even look at it

There is the possibly apocryphal story of the data center with a server that nobody had the login details for, and nobody was certain what the server did. Someone took the bull by the horns and unplugged the server from the network. The network failed completely, the cable was plugged back in, and nobody ever touched the server again.

Automatic deploy tools are available since ages

  • Chef

  • Puppet

  • Ansible

  • Often used to initially deploy servers

  • Administrator fear using them on servers running longer time

Automation Fear Spiral

automationfear

Infrastructure As Code

  • WTF?

    • "Do I need to write code now? I’m an administrator!"

  • Lot of our infrastructure already is code

    • Virtual machine

    • Virtual networks

    • Virtual disks

What’s code?

  • Code is just text

def printme( str ):
   # This prints a passed string into this function
   print str
   return

printme("Stoeps was here!")
  • Developers work with code / text since ages

  • Tons of tools, editors, formats, version control systems

  • Build-server to get automatic compiled binaries

Principles of Infrastructure as Code

  • Systems can be easily reproduced

  • Systems are disposable

  • Systems are consistent

  • Processes are repeatable

  • Design is always changing

  • Software and infrastructure must be designed as simply as possible to meet current requirements

  • Change management must be able to deliver changes safely and quickly

How can this be achieved

  • Use definition files

  • Self documented systems and services

    • Scripts generate their documentation

  • Version Control System

    • Handle definition and scripts like source code

    • History with all changes

  • Continously tested

  • Continously monitored

    • Configuration automated during install

Immutable vs Mutable Tools

  • Immutable

    • Configuration changes through completely replacing servers

    • Example: Containerization (Container generated from Images)

  • Mutable

    • Still possibility of configuration drift

      • Testsystems often get all patches

      • Production only some of them → different behavior

Server Livecycle

serverlifecyle

Tools I use in this talk

I show working code, but it’s simplified!

  • VMware Workstation

  • VMware vSphere Cluster

  • Packer (build templates)

  • Terraform (deploy servers based on the templates)

  • Ansible (add users, applications to servers)

Works with VMware Fusion and VMware ESXi too!

Some thoughts before we start

  • Terraform provisioning needs vmtools and perl

  • Terraform need to connect with SSH

  • Ansible needs Python

Security
  • Temporary using a password and login for root

  • Don’t forget to remove or change weak passwords

Packer

Packer - Builders

  • Easy to automated build all kind of templates

  • Alicloud ECS

  • Amazon EC2

  • Azure

  • CloudStack

  • DigitalOcean

  • Docker

  • File

  • Google Cloud

  • Hetzner Cloud

  • HyperOne

  • Hyper-V

  • Linode

  • LXC

  • LXD

  • NAVER Cloud

  • Null

  • 1&1

  • OpenStack

  • Oracle

  • Parallels

  • ProfitBricks

  • QEMU

  • Scaleway

  • Tencent Cloud

  • Triton

  • Vagrant

  • VirtualBox

  • VMware

  • Yandex.Cloud

  • Custom

Packer - Provisioners

  • "Customize" your image / vm / template

  • Several options

  • Ansible Local

  • Ansible (Remote)

  • Breakpoint

  • Chef Client

  • Chef Solo

  • Converge

  • File

  • InSpec

  • PowerShell

  • Puppet Masterless

  • Puppet Server

  • Salt Masterless

  • Shell

  • Shell (Local)

  • Windows Shell

  • Windows Restart

  • Custom

Packer - Postprocessors

  • Upload or work with the final image

  • Deletes artifacts, so it cleanup your host, but you loose the local image

  • Alicloud Import

  • Amazon Import

  • Artifice

  • Compress

  • Checksum

  • DigitalOcean Import

  • Docker Import

  • Docker Push

  • Docker Save

  • Docker Tag

  • Google Compute Export

  • Google Compute Import

  • Manifest

  • Shell (Local)

  • Vagrant

  • Vagrant Cloud

  • vSphere

  • vSphere Template

Let’s build a CentOS Image - Kickstarter script

kickstart-de.cfg
install
lang en_US.UTF-8
keyboard de
timezone Europe/Berlin
auth --useshadow --enablemd5
services --enabled=NetworkManager,sshd
eula --agreed
ignoredisk --only-use=sda
reboot
bootloader --location=mbr
zerombr
clearpart --all --initlabel
part swap --asprimary --fstype="swap" --size=1024
part /boot --fstype xfs --size=200
part pv.01 --size=1 --grow
volgroup rootvg01 pv.01
logvol / --fstype xfs --name=lv01 --vgname=rootvg01 --size=1 --grow
authconfig --enableshadow --passalgo=sha256
rootpw --iscrypted $5$cnxfyyiayqjelmbt$4/Lq1vPDBp2BZznXcLukwVy4n0DPp6tX.PrCz7YA62B
%packages --nobase --ignoremissing --excludedocs
@core
%end

Let’s build a CentOS Image - builders

centos.json - builders
{
  "builders": [
    {
      "type": "vmware-iso",
      "boot_command": [
         "<tab> text ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/kickstart-de.cfg<enter>"
      ],
      "communicator": "ssh",
      "guest_os_type": "centos7-64",
      "http_directory": "http",
      "iso_checksum_type": "sha256",
      "iso_checksum_url": "http://ftp.halifax.rwth-aachen.de/centos/7.6.1810/isos/x86_64/sha256sum.txt",
      "iso_url": "http://ftp.halifax.rwth-aachen.de/centos/7.6.1810/isos/x86_64/CentOS-7-x86_64-Minimal-1810.iso",
      "ssh_username":"root",
      "ssh_password":"password",
      "shutdown_command": "shutdown -P now",
      "version": 14
    }
  ]
}

Let’s build a CentOS Image - Provisioners

centos.json - provisioners
"provisioners": [{
        "type": "shell",
        "expect_disconnect": true,
        "execute_command": "sudo UPDATE=true KERNELUPDATE=true bash '{{ .Path  }}'",
        "scripts": [
          "script/ansible.sh", "script/vmtools.sh",                      (1)
          "script/sshd.sh", "script/reboot.sh", "script/cleanup.sh"      (2)
        ]}],
1Install openvm-tools, python (to use ansible later)
2Configure sshd, reboot and Cleanup (rm sshd hostkeys, delete caches …​)
Example: script/vmtools.sh
#!/usr/bin/env bash
yum -y install open-vm-tools
# vSphere provisioning needs perl
echo '==> Install perl'
yum -y install perl
echo '==> Restarting open-vm-tools'
systemctl restart vmtoolsd

Let’s build a CentOS Image - Post-Processors

centos.json - post-processors
"post-processors":[[{                     (1)
          "type": "vsphere",
          "cluster": "HVIE PWR HOSTS",
          "host": "khnum.example.com",
          "datacenter": "HVIE",
          "resource_pool": "rp_hvie_devops",
          "username": "cstoettner@example.com",
          "password": "{{user `vsphere_password`}}",
          "datastore": "devops-01_sas_7.2k_raid10",
          "vm_name": "stoeps-centos-gpn19",
          "vm_folder": "devops","vm_network": "vm-net-devops",
          "insecure": "true"
      },{
          "type": "vsphere-template","host": "khnum.example.com",
          "insecure": "true",
          "datacenter": "HVIE",
          "username": "cstoettner@example.com",
          "password": "{{user `vsphere_password`}}",
          "folder": "/devops/templates"
      }]]

Packer json file

{
  "builders": ["..."],
  "provisioners": ["..."],
  "post-processors": ["..."]   (1)
}
1optional, removes intermed artifacts
Validate
# Validate JSON
packer.io validate -var vsphere_password='my-funky-password' -var timestamp=$(date +"%Y%m%d%H%M") centos.json
Build
# Build the image, upload to vSphere
packer.io build -var vsphere_password='my-funky-password' -var timestamp=$(date +"%Y%m%d%H%M") centos.json

Let’s build a CentOS Image

  • That’s just the absolute minimum!

  • You can pimp nearly everything

    • Builders

      • Memory, CPU, Disk, Name, Output Directory

    • Provisioner

      • Run Shellscripts to cleanup, update, add ssh keys

    • Kickstarter

      • Remove unnecessary software

I install the absolute minimum for running Ansible.

Packer build image

Terraform

Terraform (2)

  • Providers

    • 160+ available providers

    • Cloud, Infrastructure Software, Network, Monitoring, Database …​

  • Provisioners

    • Additional tools to customize deployments

      • Scripts

      • Ansible

Terraform for an example Environment

  • One file for each server

├── build.tf                (1)
├── gpn19-server1.tf        (2)
├── gpn19-server2.tf
├── gpn19.terraform
├── .terraform              (3)
│   └── plugins
│       └── linux_amd64
│           ├── lock.json
│           └── terraform-provider-vsphere_v1.11.0_x4
├── terraform.tfstate       (4)
├── terraform.tfstate.backup
├── vars.tf                 (5)
└── versions.tf
1Environment settings
2Server definitions
3Plugin for your deployment terraform init
4State file
5Variables

Terraform Example - vars.tf

  • all used variables

  • variables can be set on the commandline (when defined before)

variable "vsphere_server" {
  default = "khnum.example.com"
}

variable "vsphere_user" {
  default = "cstoettner@example.com"
}

variable "vsphere_password" {
  description = "vsphere server password for the environment"
  default     = ""
}

variable "vsphere_datacenter" {
  default = "HVIE"
...

Terraform Example - gpn19-server1.tf

resource "vsphere_virtual_machine" "gpn19-server1" {
  name             = "gpn19-server1"
  resource_pool_id = data.vsphere_resource_pool.pool.id
  datastore_id     = data.vsphere_datastore.datastore.id
  num_cpus         = 4
  memory           = 4096
  guest_id         = data.vsphere_virtual_machine.template.guest_id
  scsi_type        = data.vsphere_virtual_machine.template.scsi_type
  network_interface {
    network_id   = data.vsphere_network.network.id
    adapter_type = data.vsphere_virtual_machine.template.network_interface_types[0]
  }
  folder = "${var.pana_devops_folder}/${var.project_folder}"
  disk {
    label            = "disk0"
...

Work with servers

# Plan
terraform plan -var "vsphere_password=abc" \
               -var "template=stoeps-centos-gpn19" \
               -out rebuild.terraform
terraform apply rebuild.terraform

# Delete
terraform destroy -var "vsphere_password=abc"

# Delete one server
terraform taint vsphere_virtual_machine.stoeps-cnx-ldap
terraform plan -var "vsphere_password=abc" \
  -var "template=stoeps-centos-gpn19" \
  -out rebuild.terraform
terraform apply rebuild.terraform

Terraform plan & apply

Terraform taint

Terraform destroy

Ansible

  • License: GPLv3+

  • https://www.ansible.com

  • Installation

    • Use your package manager and just install it

    • No Windows version, but should work with Win 10 and integrated Linux

  • Mutable Software

  • Clientless

    • Pure SSH

Idempotent

  • Just be careful that behavior is idempotent

    • Running multiple times shouldn’t change the behavior

Not idempotent
echo "127.0.0.2 greenlight.example.com" >> /etc/hosts
Somehow better
if (!grep -q 127.0.0.2 /etc/hosts); then
  echo "127.0.0.2 greenlight.example.com" >>  /etc/hosts
fi

Ansible files

.
├── group_vars
│   └── all
│       └── main.yml
├── inventory
├── roles
│   ├── common
│   │   └── tasks
│   │       └── main.yml
│   └── ldap
│       ├── defaults
│       │   └── main.yml
│       └── tasks
│           └── main.yml
├── site.yml
└── templates
    ├── base.ldif.j2
    ├── db.ldif.j2
    ├── ldap-config.sh.j2
    ├── monitor.ldif.j2
    └── user-ldap.ldif.j2

Inventory

  • Define servers and groups

  • There is always a group all

[gpn19]
gpn19-server1
gpn19-server2

[ldap]
gpn19-server2

Playbook site.yml

  • Definition of tasks running on groups (from inventory)

  • Simple Playbook

    • Run role common on all servers

    • Install (run role ldap) OpenLDAP on gpn19-server2

---
- hosts: all
  roles:
    - common

- hosts: ldap
  roles:
    - role: ldap

Variables

  • Global for this playbook:

    • group_vars/all/main.yml

  • Only for a server group:

    • group_vars/<groupname>/main.yml

Just as an example
---
ldap:
  domain: dc=devops,dc=example,dc=com
  passwordencrypted: "{SSHA}CdGAzVNlrqgLbKo6pebBxuDBBkxokkHm"
  passwordclear: "password"

Better store passwords in an Ansible vault → encrypted!

Jinga2 Templates

  • Templates for files

    • Perfect for silent install response files

  • Copied with Ansible

  • Variables replaced during deployment

ldap-config.sh.j2
#!/usr/bin/env bash

cd /etc/openldap/slapd.d
ldapmodify -Y EXTERNAL  -H ldapi:/// -f db.ldif
ldapmodify -Y EXTERNAL  -H ldapi:/// -f monitor.ldif
[...]
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/inetorgperson.ldif
sleep 5
ldapadd -x -w {{ ldap.passwordclear }} -D "cn=ldapadm,{{ ldap.domain }}" -f /etc/openldap/slapd.d/base.ldif
sleep 10
ldapadd -x -w {{ ldap.passwordclear }} -D "cn=ldapadm,{{ ldap.domain }}" -f /etc/openldap/slapd.d/user-ldap.ldif

Ansible role - common/tasks/main.yml

---
- name: Disable Firewall        (1)
  service:
    name=firewalld
    state=stopped
    enabled=no
- name: Change limits.conf      (2)
  pam_limits:
    domain: root
    limit_type: '-'
    limit_item: nproc
    value: '16384'
- name: Install unzip to support unarchive function of ansible, add xauth (3)
  package:
    name={{ item }}
    state=latest
  with_items:
    - unzip
    - vim
1Disable a service (stop and set to disable)
2Change limits.conf
3Install additional packages (independent of distribution)

Ansible role - ldap/tasks/main.yml

- name: Install system packages for OpenLDAP
  package:
    name={{ item }}
    state=latest
  with_items:
    - openldap-servers
    - openldap-clients

- name: Enable Slapd service
  service:
    name=slapd
    state=restarted
    enabled=yes

- name: Initial ldap config, copy templates db.ldif     (1)
  template: src=db.ldif.j2 dest=/etc/openldap/slapd.d/db.ldif
  tags: parse
1Use template file, copy, parse variables

Ansible role - ldap/tasks/main.yml (2)

- name: Copy sample db config    (1)
  copy:
    src: "/usr/share/openldap-servers/DB_CONFIG.example"
    dest: "/var/lib/ldap/DB_CONFIG"
    remote_src: yes     (2)
    directory_mode: yes
    owner: ldap
    group: ldap

- name: Create LDAP Config Script
  template:
    src: ldap-config.sh.j2
    dest: /tmp/ldap-config.sh
    mode: 0755
  tags: parse

- name: Configure LDAP and Import Users
  shell: "/tmp/ldap-config.sh"   (3)

- name: Remove config script
  file:
    path: /tmp/ldap-config.sh
    state: absent     (4)
1Copy remote file on the target to other folder
2Remote → not from templates
3Run shell script
4Remove file

Run Ansible

Benefits

Documentation to Minimum

With IaC, the code itself represents the updated documentation of the infrastructure. No additional documentation is required except setup instructions. Everything is coded and ideally keeps documentation to a minimum.

Best practices

  1. Codify everything

  2. Document as little as possible

  3. Maintain version control

  4. Continuously test, integrate, and deploy

  5. Make your infrastructure code modular

  6. Make your infrastructure immutable (when possible)

Use the cloud?

  • There are many reasons to have in house data centers

    • Security, Cost, …​

  • Build your own cloud

cloudcomputing