Pets To Cattle

At Arkiom, I led the migration from managed hosting in a datacenter in Utah to AWS. We were having problems with our hosting provider’s lackluster support and performing destructive maintenance without communication. I was working as a developer, but I had previous server admin experience, so I proposed we move. For the next 6 months, I performed cost analysis, provided migration plans, and configured the 15 servers we’d need for all our projects. One product at a time, I turned on the new servers, migrated the DNS, and shut down the old ones. The only hiccups we had were with our Federal clients who whitelisted our IPs. Small wrinkles ironed out, we were fully “in the cloud”.

Post-migration, I quickly realized that managing these pet servers was taking up too much of my time. My development tasks were missing deadlines. At the same time, I was reading the amazing The Practice of Cloud System Administration. Learning the concepts of “immutable servers” and “infrastructure as code” changed my view on managing infrastructure. I set out to convert everything to be managed by software and config files. I settled on AWS OpsWorks, as it handled a lot of the concerns for me, including bootstraping instances and marshalling the config files around. OpsWorks is built upon Chef, so all the recipes had to be written in Ruby. OpsWorks worked well for our small shop, but I could feel the limitations. Had I stayed at that job longer, I would have converted over to Salt or Ansible for config management. Due to the nature of the business, the company’s clients evaporated out from under them. I saw the writing on the wall, so I left the company under amicable terms.

When I got to my current job, they had some deployment patterns around Jenkins, Ansible, and CloudFormation. My job became to adapt and extend those patterns to new services. I spent a lot of time talking to people who worked on this deployment infrastructure, trying to learn all of their reasons for the decisions that were made. I also learned of the things they’d do differently. My main focus is building components for a full deployment pipeline, eventually encompasing the build, test, and deployment aspects of software development. I interact with teams building dozens of products, all across the world. I have team members in Kiev, Ukraine that need support at all hours of the day. Technologies I work with include: Jenkins, Ansible, Docker, and nearly every AWS service, mostly IAM, CloudFormation, EC2, ECS, Lambda, S3, Route53, CodeBuild, and CodePipeline. I work primarily with Linux, but have a history with Windows.

The nature of our deployments allows us to treat our servers as cattle. When one is unhealthy, we terminate it and let the automation replace it with a new, healthy instance. This happens mostly without human intervention, using Autoscaling Groups and health checks. Centralized monitoring and alerting allows our distributed teams to support thousands of servers. Infrastructure as code allows us to put the management of the servers in the hands of the developers. It also allows us to manage user access and permissions using the same infrastructure. This has been a wonderful experience for assisting the hundreds of developers we interact with.

If I were to build out an environment like this from scratch, I’d probably tackle the deployment process first. If the application works well in Docker, I’d push for that to be the basis of our deployment strategy. Working with AWS ECS has been very enjoyable, so I’d definitely try that as the first platform. If not, a local Ansible run is a solid second choice. I’d also look into getting proper secret management in place, either using Hashicorp Vault or AWS Parameter Store. After deployment, I’d decide where the pain points are and build around that. The long term vision would be an assembly line where a commit or tag to a Git repo produces safe, tested code running on immutable infrastructure. The only human interaction would be to commit code.