Personal site and blog of Ilango Rajagopal

Terraform is an open source tool for building and managing cloud infrastructure in a safe and efficient way. It can create servers, networks and more in popular cloud service providers. But before we can dive into Terraform, we should learn what "Infrastructure as Code (IaC)" means.

What is Infrastructure as Code?

Infrastructure as Code or IaC for short, is a method to provision and manage cloud resources by writing and running code. While clicking around web consoles in AWS is fun (it's not), writing code to manage infrastructure has some key benefits.

No more clicking around: Provisioning resources in cloud providers via the web interface involves lots of clicks and often across many services like networking, compute, domains etc. IaC prevents all that and lets us write code that will automate these things.
IaC is declarative: Instead of managing every single details, IaC is declarative — we tell the program what we want instead of how to do it — which makes it human readable and avoids managing all the underlying API calls we have to make.
Reduces time, cost and risk: When doing a series of steps via the web interface, there's a chance of making mistakes like over-provisioning or doing a configuration wrong. It is also time consuming. IaC eliminates those risks, saves time and reduces cost in the long run.
Enables DevOps: Since infrastructure is written as code, it can be committed to source control and enables collaboration across teams.

Introduction to Terraform

Terraform is an IaC tool built and maintained by Hashicorp. We write human-readable code that Terraform runs and changes resources on the cloud provider(s) of our choice. It does it in a predictable way that removes mistakes and saves time. Terraform also keeps track of everything we deployed.

Let's say we already deployed an EC2 instance using Terraform. Now we want to change the name of the EC2 instance. So we update the code and run it again. Since Terraform already knows about the EC2 instance and sees that we want to change the name, it'll do the required API calls in the background to change the name of the instance.

Now, let's say we instead want to move the instance to a different availability zone and update the code to do so. But we cannot move an instance to a different AZ. We have to destroy the current one and re-create another in the given AZ. Terraform handles all that for us.

And Terraform is cloud agnostic. It works with a wide variety of cloud providers from popular ones like AWS and GCP to smaller providers like DO and Heroku. This list is also expanding as Hashicorp is adding more cloud providers. We can also deploy resources to our in-house servers or data centers (if you have one).

Terraform Concepts

Before we dive into writing Terraform code, we have to be familiar with a few Terraform concepts. Putting them together is how we work with Terraform.

Terraform State

State is how Terraform tracks and manages cloud resources. Terraform does it using a file called the state file. By default, the state file is named terraform.tfstate.

With the state file, Terraform keeps track of all the resources deployed to our cloud. It's used to calculate the deployment delta — changes in the resources. This is done so Terraform knows which resources to create from scratch, which ones to change and which ones to destroy.

The state file is a JSON dump of all the metadata related to our deployments and details of all the tracked resources. We can even open the file and inspect it. I would caution against editing this file as it might lead to unintended results in our deployments.

Because the state file is the source of truth for all our deployments, it should not be deleted. It's much harder to codify all the resources we've deployed with Terraform, if we deleted the file. The state file should also be secured from falling into the wrong hands as it contains sensitive information about our infrastructure. We can also backup our state file to a cloud storage service like S3. More details on how to set this up in the initialization section below.

Terraform Providers

Providers are the API layer for different cloud service providers like AWS, GCP etc. This is how Terraform interacts with the cloud services and provisions the required resources. Like I mentioned, we can also write our own providers to connect to in-house servers or data centers.

A sample provider code using AWS. We can also declare many required providers if there's a multi-cloud setup.

terraform {
    required_providers {
        aws = {
            source  = "hashicorp/aws"
            version = "~> 3.45"
        }
    }
}

During initialization with terraform init (which we'll talk about shortly), the provider code is downloaded from Terraform registry. Terraform can then communicate the cloud service. One important thing to note — providers are developed independently of Terraform so they have a different versioning system. A best practice when using providers is to version lock them in our code. If no version is specified, Terraform downloads the latest version of the provider which might break our code.

Resources

Defining what resources we want to change is done through resource blocks. Using the provider we chose, Terraform makes appropriate API calls at runtime to provision resources that are defined in resource blocks.

Here's a resource block that provisions an EC2 instance called "web". _Note: "web" is the local name that Terraform uses for this EC2 instance, not the name of the EC2 instance itself_.

resource "aws_instance" "web" {
    ami           = "ami-asdfghji1234"
    instance_type = "t2.micro"
}

Resource blocks can take arguments define inside {} that defines the properties of the resource. We can also refer to resources in other areas of the code. This makes working with Terraform flexible. In the above example, we can refer to the EC2 instance using aws_instance.web.

Terraform Block

The terraform {} block is a special block used to configure settings that affect Terraform's behavior. Inside this block only constants are allowed and cannot refer to other named objects like resources. This block is only used once in a Terraform project. This is where we can configure our providers, provider versions, regions, state backend etc.

Terraform Workflow

Working with Terraform has three steps: Write, Plan, Apply

Write: We can start of with a local Terraform file or a version controlled file. Version controlling Terraform files allows collaboration across teams and allows for review and iteration.
Plan: Terraform allows us to see ahead of time what our code will deploy. In this phase, we can review the changes and iterate on our cloud infrastructure till we reach the desired state.
Apply: Once the writing and planning phases are done, Terraform will apply those changes to our declared cloud provider(s). In the background, Terraform will make all the necessary API calls to create the cloud resources and keeps track of them in a state file.

Terraform Commands

There are four Terraform commands that we will use the most. init, plan, apply and destroy.

terraform init

terraform init initializes the current working directory. It is the first command we run in a Terraform project. The command does the following things in the background:

It downloads all the required ancillary components and caches them. These are the providers that we've declared in our code that we need for deployments to work. These downloadable resources contain the code to make the necessary API calls to the cloud providers. By default, Terraform will try to download from the Terraform Registry.

Terraform also preps the backend for managing state files. This could be in our local machine or is often the case, a cloud backup.

Sample code to configure an AWS S3 bucket as a remote backend for our state file.

terraform {
    backend "s3" {
        bucket = "mybucket"
        key    = "path/to/my/key"
        region = "us-east-1"
    }
}

The output of an init command looks like this:

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 3.27"...
- Installing hashicorp/aws v3.45.0...
- Installed hashicorp/aws v3.45.0 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

terraform plan

terraform plan allows us to review the changes that Terraform is going to make on our cloud. The plan command makes API calls in the background to the providers we've mentioned but doesn't make any changes to the cloud. It allows us to review the changes to make sure that Terraform is only making the required changes.

We can also save a plan output to a file using the -out argument. Let's say we have the following Terraform code:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.27"
    }
  }

  required_version = ">= 0.14.9"
}

provider "aws" {
  profile = "default"
  region  = "us-west-2"
}

resource "aws_instance" "app_server" {
  ami           = "ami-830c94e3"
  instance_type = "t2.micro"

  tags = {
    Name = "ExampleAppServerInstance"
  }
}

Running terraform plan gives us the following output. _Note: The output is much longer but I've only shown the relevant parts here_.

Terraform used the selected providers to generate the following execution plan. Resource actions are
indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.app_server will be created
  + resource "aws_instance" "app_server" {
      + ami                                  = "ami-830c94e3"
      + arn                                  = (known after apply)
      + associate_public_ip_address          = (known after apply)
      + availability_zone                    = (known after apply)
      + get_password_data                    = false
      + host_id                              = (known after apply)
      + id                                   = (known after apply)
      + instance_state                       = (known after apply)
      + instance_type                        = "t2.micro"
      + ipv6_address_count                   = (known after apply)
      + ipv6_addresses                       = (known after apply)
      + key_name                             = (known after apply)
      + primary_network_interface_id         = (known after apply)
      + private_dns                          = (known after apply)
      + private_ip                           = (known after apply)
      + public_dns                           = (known after apply)
      + public_ip                            = (known after apply)
      + security_groups                      = (known after apply)
      + source_dest_check                    = true
      + subnet_id                            = (known after apply)
      + tags                                 = {
          + "Name" = "ExampleAppServerInstance"
        }
      + tags_all                             = {
          + "Name" = "ExampleAppServerInstance"
        }
      + vpc_security_group_ids               = (known after apply)

      ...
    }

Plan: 1 to add, 0 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly
these actions if you run "terraform apply" now.

Notice how Terraform says "known after apply"? These are values that we can pass as arguments to our resource block above but we don't have to. Terraform will use the defaults that are available in the specified region.

For example, to provision an EC2 instance we need a VPC, a Subnet, routing table etc. AWS creates defaults for each region when we create an account, so Terraform uses those. If we had an existing VPC we can pass it's ID to the resource block. Or we can create a VPC resource block and use it's ID in our "aws_instance" resource block using the dot notation. Terraform is flexible in letting us create the infrastructure that we need.

terraform apply and terraform destroy

The terraform apply command does exactly what it says it will. It applies all the changes it showed us in the plan phase and creates/modifies the resources in our cloud environment. It is during this phase that the state file is also created. Before we execute apply we should make sure to review the plan by ourselves or with our team.

The terraform destroy command looks at the state file and destroys all the resources that are tracked in it. This process is irreversible and should be used with care.

When executing both these commands, Terraform will ask for confirmation before making changes to our cloud environment. We can use the --auto-approve flag to skip this confirmation step. Use this only when you're confident of the changes.

And that's it. These are the must-know concepts in Terraform. There are other advanced concepts that can improve our workflow, make our code modular, run custom scripts on cloud resources. We will cover these in an upcoming post. Do you think I've missed a crucial topic in this post or have a question about Terraform? Tweet me @_ilango.