Alicloud ECS
Amazon EC2
Azure
CloudStack
DigitalOcean
Docker
File
Google Cloud
Hetzner Cloud
HyperOne
Christoph Stoettner
Intro to Infrastructure as Code
Automate Virtual Machine Generation
Terraform
Packer
Ansible
Application Servers
DevOps, Docker, K8s, Private Cloud
Started with Linux / OSS around 1994/1995
Linux Kernel < 1.0
Slackware
VIM lover
First servers were bare metal
new applications needed new servers each time
wasted huge amount of resources (disk, memory, cpu)
Then we got our first virtualization Environment
Within months we had a big count of virtual servers
First time we had the chance to test updates in front
Web applications splitted in VMs, so no dependencies (updates)
Software, Library version
Server provisioning and deployment still manual
Additional OS to patch
This phenomena is named Server Sprawl
Count of servers made it impossible to deploy patches on all of them
Updates often only deployed for high security risks
or if an application needed it (Java version, PHP)
On the other hand when 2 people install Apache 3 times httpd
the servers won’t be the same
Configuration Drift |
First scripts to get rid of daily clicks (Patches)
Test servers to create update documentation
Building silent installation files
Creating long checklists with click sequences
Worse My absolute favorite: some hundred screenshots
Long nights troubleshooting
Production servers behave different than test
Testserver often got all updates
Production only several (cummulativ) so in theory the same fixes
Differences can creep in over time:
Someone makes a fix to one of the database servers to fix a problem
New ticketsystem needs newer Java
One application server gets more traffic than the others
someone tunes it, and now its configuration is different
Differences should be captured to make it easy to reproduce and to rebuild
Unmanaged variation between servers leads to snowflake servers and automation fear. |
A snowflake server is different from any other server. It’s special in ways that can’t be replicated. |
Once again, being different isn’t bad
But it’s a problem when the owners don’t understand how and why
wouldn’t be able to rebuild it
An ops team should be able
to confidently and quickly rebuild any server in their infrastructure.
Snowflake → build new reproducible process to build server
Don’t touch that server
Don’t point at it
Don’t even look at it
There is the possibly apocryphal story of the data center with a server that nobody had the login details for, and nobody was certain what the server did. Someone took the bull by the horns and unplugged the server from the network. The network failed completely, the cable was plugged back in, and nobody ever touched the server again.
Chef
Puppet
Ansible
Often used to initially deploy servers
Administrator fear using them on servers running longer time
WTF?
"Do I need to write code now? I’m an administrator!"
Lot of our infrastructure already is code
Virtual machine
Virtual networks
Virtual disks
Code is just text
def printme( str ):
# This prints a passed string into this function
print str
return
printme("Stoeps was here!")
Developers work with code / text since ages
Tons of tools, editors, formats, version control systems
Build-server to get automatic compiled binaries
Systems can be easily reproduced
Systems are disposable
Systems are consistent
Processes are repeatable
Design is always changing
Software and infrastructure must be designed as simply as possible to meet current requirements
Change management must be able to deliver changes safely and quickly
Use definition files
Self documented systems and services
Scripts generate their documentation
Version Control System
Handle definition and scripts like source code
History with all changes
Continously tested
Continously monitored
Configuration automated during install
Immutable
Configuration changes through completely replacing servers
Example: Containerization (Container generated from Images)
Mutable
Still possibility of configuration drift
Testsystems often get all patches
Production only some of them → different behavior
I show working code, but it’s simplified! |
VMware Workstation
VMware vSphere Cluster
Packer (build templates)
Terraform (deploy servers based on the templates)
Ansible (add users, applications to servers)
Works with VMware Fusion and VMware ESXi too! |
You find all scripts and files in: https://gitlab.com/stoeps/gpn19-iac
Terraform provisioning needs vmtools and perl
Terraform need to connect with SSH
Ansible needs Python
Temporary using a password and login for root
Don’t forget to remove or change weak passwords
Just a binary
Windows | Linux | Mac
hundreds of templates on Github
Windows
Linux (Ubuntu, Debian, SuSE, Centos)
Installation
Download, unzip & copy into your PATH
Easy to automated build all kind of templates
Alicloud ECS
Amazon EC2
Azure
CloudStack
DigitalOcean
Docker
File
Google Cloud
Hetzner Cloud
HyperOne
Hyper-V
Linode
LXC
LXD
NAVER Cloud
Null
1&1
OpenStack
Oracle
Parallels
ProfitBricks
QEMU
Scaleway
Tencent Cloud
Triton
Vagrant
VirtualBox
VMware
Yandex.Cloud
Custom
"Customize" your image / vm / template
Several options
Ansible Local
Ansible (Remote)
Breakpoint
Chef Client
Chef Solo
Converge
File
InSpec
PowerShell
Puppet Masterless
Puppet Server
Salt Masterless
Shell
Shell (Local)
Windows Shell
Windows Restart
Custom
Upload or work with the final image
Deletes artifacts, so it cleanup your host, but you loose the local image
Alicloud Import
Amazon Import
Artifice
Compress
Checksum
DigitalOcean Import
Docker Import
Docker Push
Docker Save
Docker Tag
Google Compute Export
Google Compute Import
Manifest
Shell (Local)
Vagrant
Vagrant Cloud
vSphere
vSphere Template
install
lang en_US.UTF-8
keyboard de
timezone Europe/Berlin
auth --useshadow --enablemd5
services --enabled=NetworkManager,sshd
eula --agreed
ignoredisk --only-use=sda
reboot
bootloader --location=mbr
zerombr
clearpart --all --initlabel
part swap --asprimary --fstype="swap" --size=1024
part /boot --fstype xfs --size=200
part pv.01 --size=1 --grow
volgroup rootvg01 pv.01
logvol / --fstype xfs --name=lv01 --vgname=rootvg01 --size=1 --grow
authconfig --enableshadow --passalgo=sha256
rootpw --iscrypted $5$cnxfyyiayqjelmbt$4/Lq1vPDBp2BZznXcLukwVy4n0DPp6tX.PrCz7YA62B
%packages --nobase --ignoremissing --excludedocs
@core
%end
{
"builders": [
{
"type": "vmware-iso",
"boot_command": [
"<tab> text ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/kickstart-de.cfg<enter>"
],
"communicator": "ssh",
"guest_os_type": "centos7-64",
"http_directory": "http",
"iso_checksum_type": "sha256",
"iso_checksum_url": "http://ftp.halifax.rwth-aachen.de/centos/7.6.1810/isos/x86_64/sha256sum.txt",
"iso_url": "http://ftp.halifax.rwth-aachen.de/centos/7.6.1810/isos/x86_64/CentOS-7-x86_64-Minimal-1810.iso",
"ssh_username":"root",
"ssh_password":"password",
"shutdown_command": "shutdown -P now",
"version": 14
}
]
}
"provisioners": [{
"type": "shell",
"expect_disconnect": true,
"execute_command": "sudo UPDATE=true KERNELUPDATE=true bash '{{ .Path }}'",
"scripts": [
"script/ansible.sh", "script/vmtools.sh", (1)
"script/sshd.sh", "script/reboot.sh", "script/cleanup.sh" (2)
]}],
1 | Install openvm-tools, python (to use ansible later) |
2 | Configure sshd, reboot and Cleanup (rm sshd hostkeys, delete caches …) |
script/vmtools.sh
#!/usr/bin/env bash
yum -y install open-vm-tools
# vSphere provisioning needs perl
echo '==> Install perl'
yum -y install perl
echo '==> Restarting open-vm-tools'
systemctl restart vmtoolsd
"post-processors":[[{ (1)
"type": "vsphere",
"cluster": "HVIE PWR HOSTS",
"host": "khnum.example.com",
"datacenter": "HVIE",
"resource_pool": "rp_hvie_devops",
"username": "cstoettner@example.com",
"password": "{{user `vsphere_password`}}",
"datastore": "devops-01_sas_7.2k_raid10",
"vm_name": "stoeps-centos-gpn19",
"vm_folder": "devops","vm_network": "vm-net-devops",
"insecure": "true"
},{
"type": "vsphere-template","host": "khnum.example.com",
"insecure": "true",
"datacenter": "HVIE",
"username": "cstoettner@example.com",
"password": "{{user `vsphere_password`}}",
"folder": "/devops/templates"
}]]
1 | Double [ needed: https://github.com/hashicorp/packer/issues/6790#issuecomment-447088370 |
json
file{
"builders": ["..."],
"provisioners": ["..."],
"post-processors": ["..."] (1)
}
1 | optional, removes intermed artifacts |
# Validate JSON
packer.io validate -var vsphere_password='my-funky-password' -var timestamp=$(date +"%Y%m%d%H%M") centos.json
# Build the image, upload to vSphere
packer.io build -var vsphere_password='my-funky-password' -var timestamp=$(date +"%Y%m%d%H%M") centos.json
That’s just the absolute minimum!
You can pimp nearly everything
Builders
Memory, CPU, Disk, Name, Output Directory
Provisioner
Run Shellscripts to cleanup, update, add ssh keys
Kickstarter
Remove unnecessary software
I install the absolute minimum for running Ansible. |
Hashicorp
Infrastructure as code
Provision any infrastructure
Installation
Download, unzip & copy into your PATH
Providers
160+ available providers
Cloud, Infrastructure Software, Network, Monitoring, Database …
Provisioners
Additional tools to customize deployments
Scripts
Ansible
One file for each server
├── build.tf (1)
├── gpn19-server1.tf (2)
├── gpn19-server2.tf
├── gpn19.terraform
├── .terraform (3)
│ └── plugins
│ └── linux_amd64
│ ├── lock.json
│ └── terraform-provider-vsphere_v1.11.0_x4
├── terraform.tfstate (4)
├── terraform.tfstate.backup
├── vars.tf (5)
└── versions.tf
1 | Environment settings |
2 | Server definitions |
3 | Plugin for your deployment terraform init |
4 | State file |
5 | Variables |
vars.tf
all used variables
variables can be set on the commandline (when defined before)
variable "vsphere_server" {
default = "khnum.example.com"
}
variable "vsphere_user" {
default = "cstoettner@example.com"
}
variable "vsphere_password" {
description = "vsphere server password for the environment"
default = ""
}
variable "vsphere_datacenter" {
default = "HVIE"
...
gpn19-server1.tf
resource "vsphere_virtual_machine" "gpn19-server1" {
name = "gpn19-server1"
resource_pool_id = data.vsphere_resource_pool.pool.id
datastore_id = data.vsphere_datastore.datastore.id
num_cpus = 4
memory = 4096
guest_id = data.vsphere_virtual_machine.template.guest_id
scsi_type = data.vsphere_virtual_machine.template.scsi_type
network_interface {
network_id = data.vsphere_network.network.id
adapter_type = data.vsphere_virtual_machine.template.network_interface_types[0]
}
folder = "${var.pana_devops_folder}/${var.project_folder}"
disk {
label = "disk0"
...
# Plan
terraform plan -var "vsphere_password=abc" \
-var "template=stoeps-centos-gpn19" \
-out rebuild.terraform
terraform apply rebuild.terraform
# Delete
terraform destroy -var "vsphere_password=abc"
# Delete one server
terraform taint vsphere_virtual_machine.stoeps-cnx-ldap
terraform plan -var "vsphere_password=abc" \
-var "template=stoeps-centos-gpn19" \
-out rebuild.terraform
terraform apply rebuild.terraform
License: GPLv3+
Installation
Use your package manager and just install it
No Windows version, but should work with Win 10 and integrated Linux
Mutable Software
Clientless
Pure SSH
Just be careful that behavior is idempotent
Running multiple times shouldn’t change the behavior
echo "127.0.0.2 greenlight.example.com" >> /etc/hosts
if (!grep -q 127.0.0.2 /etc/hosts); then
echo "127.0.0.2 greenlight.example.com" >> /etc/hosts
fi
.
├── group_vars
│ └── all
│ └── main.yml
├── inventory
├── roles
│ ├── common
│ │ └── tasks
│ │ └── main.yml
│ └── ldap
│ ├── defaults
│ │ └── main.yml
│ └── tasks
│ └── main.yml
├── site.yml
└── templates
├── base.ldif.j2
├── db.ldif.j2
├── ldap-config.sh.j2
├── monitor.ldif.j2
└── user-ldap.ldif.j2
Define servers and groups
There is always a group all
[gpn19]
gpn19-server1
gpn19-server2
[ldap]
gpn19-server2
site.yml
Definition of tasks running on groups (from inventory)
Simple Playbook
Run role common
on all servers
Install (run role ldap) OpenLDAP on gpn19-server2
---
- hosts: all
roles:
- common
- hosts: ldap
roles:
- role: ldap
Global for this playbook:
group_vars/all/main.yml
Only for a server group:
group_vars/<groupname>/main.yml
---
ldap:
domain: dc=devops,dc=example,dc=com
passwordencrypted: "{SSHA}CdGAzVNlrqgLbKo6pebBxuDBBkxokkHm"
passwordclear: "password"
Better store passwords in an Ansible vault → encrypted! |
Templates for files
Perfect for silent install response files
Copied with Ansible
Variables replaced during deployment
#!/usr/bin/env bash
cd /etc/openldap/slapd.d
ldapmodify -Y EXTERNAL -H ldapi:/// -f db.ldif
ldapmodify -Y EXTERNAL -H ldapi:/// -f monitor.ldif
[...]
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/schema/inetorgperson.ldif
sleep 5
ldapadd -x -w {{ ldap.passwordclear }} -D "cn=ldapadm,{{ ldap.domain }}" -f /etc/openldap/slapd.d/base.ldif
sleep 10
ldapadd -x -w {{ ldap.passwordclear }} -D "cn=ldapadm,{{ ldap.domain }}" -f /etc/openldap/slapd.d/user-ldap.ldif
common/tasks/main.yml
---
- name: Disable Firewall (1)
service:
name=firewalld
state=stopped
enabled=no
- name: Change limits.conf (2)
pam_limits:
domain: root
limit_type: '-'
limit_item: nproc
value: '16384'
- name: Install unzip to support unarchive function of ansible, add xauth (3)
package:
name={{ item }}
state=latest
with_items:
- unzip
- vim
1 | Disable a service (stop and set to disable) |
2 | Change limits.conf |
3 | Install additional packages (independent of distribution) |
ldap/tasks/main.yml
- name: Install system packages for OpenLDAP
package:
name={{ item }}
state=latest
with_items:
- openldap-servers
- openldap-clients
- name: Enable Slapd service
service:
name=slapd
state=restarted
enabled=yes
- name: Initial ldap config, copy templates db.ldif (1)
template: src=db.ldif.j2 dest=/etc/openldap/slapd.d/db.ldif
tags: parse
1 | Use template file, copy, parse variables |
ldap/tasks/main.yml
(2)- name: Copy sample db config (1)
copy:
src: "/usr/share/openldap-servers/DB_CONFIG.example"
dest: "/var/lib/ldap/DB_CONFIG"
remote_src: yes (2)
directory_mode: yes
owner: ldap
group: ldap
- name: Create LDAP Config Script
template:
src: ldap-config.sh.j2
dest: /tmp/ldap-config.sh
mode: 0755
tags: parse
- name: Configure LDAP and Import Users
shell: "/tmp/ldap-config.sh" (3)
- name: Remove config script
file:
path: /tmp/ldap-config.sh
state: absent (4)
1 | Copy remote file on the target to other folder |
2 | Remote → not from templates |
3 | Run shell script |
4 | Remove file |
With IaC, the code itself represents the updated documentation of the infrastructure. No additional documentation is required except setup instructions. Everything is coded and ideally keeps documentation to a minimum.
Codify everything
Document as little as possible
Maintain version control
Continuously test, integrate, and deploy
Make your infrastructure code modular
Make your infrastructure immutable (when possible)
There are many reasons to have in house data centers
Security, Cost, …
Build your own cloud
You find all scripts and files in: https://gitlab.com/stoeps/gpn19-iac