Use health assessments to detect infrastructure drift

11min
|
Plus
Terraform

HCP Terraform's health assessments monitor your managed infrastructure to check that it still satisfies its intended configuration over its lifecycle. Over time, your resources may change outside of the Terraform workflow. This can be due to service failures or degradation, certificate expirations, or manual modification by other users. Terraform cannot prevent these changes, but health assessments help you detect them quickly so you can resolve them.

Health assessments include two types of checks:

Drift detection verifies that your actual infrastructure settings match those recorded in your workspace's state file.
Continuous validation verifies that your resources still satisfy any checks defined in your configuration.

In this tutorial, you will enable health assessments for a workspace, use an on-demand assessment to detect configuration drift, and review the options for resolving drift.

Tip

Health assessments are available in HCP Terraform Plus Edition. Refer to HCP Terraform pricing for details.

Prerequisites

This tutorial assumes that you are familiar with the Terraform and HCP Terraform workflows. If you are new to Terraform, complete the Get Started tutorials first. If you are new to HCP Terraform, complete the HCP Terraform Get Started tutorials first.

In order to complete this tutorial, you will need:

An HCP Terraform organization with the Plus edition.
An HCP Terraform user account with organization owner permissions and HCP Terraform locally authenticated.
Terraform v1.4+ installed locally.
An AWS account.
An HCP Terraform variable set configured with your AWS credentials.

Clone example repository

Clone the example repository for this tutorial, which contains configuration for AWS networking components, an EC2 instance, and a security group.

$ git clone https://github.com/hashicorp-education/learn-terraform-drift-detection

Change to the repository directory.

$ cd learn-terraform-drift-detection

Open main.tf to review the example configuration. The EC2 instance is configured to act as the single point of SSH ingress traffic to other instances within the network, also known as a bastion host.

Specifically, the aws_security_group.bastion restricts ingress traffic to a specific CIDR block, which can represent an organization's private network.

resource "aws_security_group" "bastion" {
  name   = "bastion_ssh"
  vpc_id = module.vpc.vpc_id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["192.80.0.0/16"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Create infrastructure

First, set your HCP Terraform organization name as an environment variable to configure your HCP Terraform integration.

$ export TF_CLOUD_ORGANIZATION=

Initialize your configuration. As part of initialization, Terraform creates your learn-terraform-drift-detection HCP Terraform workspace.

$ terraform init
Initializing HCP Terraform...
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.14.0 for vpc...
- vpc in .terraform/modules/vpc

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.63.0, ~> 4.10.0"...
- Installing hashicorp/aws v4.10.0...
- Installed hashicorp/aws v4.10.0 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

HCP Terraform has been successfully initialized!

You may now begin working with HCP Terraform. Try running "terraform plan" to
see any changes that are required for your infrastructure.

If you ever set or change modules or Terraform Settings, run "terraform init"
again to reinitialize your working directory.

Now, apply your configuration to create the infrastructure. Respond yes to the prompt to confirm the operation.

$ terraform apply
Running apply in HCP Terraform. Output will stream here. Pressing Ctrl-C
will cancel the remote apply if it's still pending. If the apply started it
will stop streaming the logs, but will not stop the apply running remotely.

Preparing the remote apply...

To view this run in a browser, visit:
https://app.terraform.io/app/hashicorp-training/learn-terraform-drift-detection/runs/run-ShcnayWd11dKtJAZ

Waiting for the plan to start...

Terraform v1.4.0-rc1
on linux_amd64
Initializing plugins and modules...
data.aws_availability_zones.available: Refreshing...
data.aws_ami.amazon_linux: Refreshing...
data.aws_availability_zones.available: Refresh complete after 0s [id=us-east-2]
data.aws_ami.amazon_linux: Refresh complete after 0s [id=ami-097bdef77f0f80fa6]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:
##...
Plan: 22 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + bastion_instance_id = (known after apply)
  + private_subnets     = [
      + null,
      + null,
    ]
  + public_subnets      = [
      + null,
      + null,
    ]
  + vpc_id              = (known after apply)

Do you want to perform these actions in workspace "learn-terraform-drift-detection"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
##...

Apply complete! Resources: 22 added, 0 changed, 0 destroyed.

Outputs:
bastion_instance_id = "i-0b598e9fba4d42618"
private_subnets = [
        "subnet-065785262ade21b94",
        "subnet-00500440348ab07ee",
    ]
public_subnets = [
        "subnet-0584ae2512fe3d054",
        "subnet-01feb9bffaef17b2d",
    ]
vpc_id = "vpc-01199c33a9e9f41bc"

Introduce infrastructure drift

Although best practice is to use Terraform for all infrastructure changes to ensure consistent workflows and change visibility, your organization may occasionally need to make manual changes.

To simulate this, navigate to your security groups in the AWS console.

Find the bastion_ssh security group. Select the Inbound rules tab in the security group details, then click Edit inbound rules.

Edit inbound security group

Delete the 192.80.0.0/16 source CIDR and replace it with 0.0.0.0/0. Then click Save rules.

Update security group source CIDR

Your security group's actual configuration no longer matches the settings recorded in your workspace's state file.

Enable health assessments

HCP Terraform health assessments help detect manual changes such as the one you introduced in the last section, helping you maintain visibility into your actual infrastructure settings. A workspace must satisfy the following prerequisites in order to perform health assessments:

Use Terraform 0.15.4+ to support drift detection, or 1.3.0+ to support both drift detection and continuous validation.
You must perform at least one run in the workspace, and the last run must complete successfully.
The workspace must use remote or agent execution mode.

Assessments use non-actionable, refresh-only plans. These runs compare the actual settings of your infrastructure against the resources tracked in your workspace’s state file. The assessments do not update your state or infrastructure configuration.

You can enable assessments on a specific workspace, or on all workspaces within your organization. Enable assessments on the workspace for this tutorial. First, navigate to your learn-terraform-drift-detection workspace. In the Health section, select Settings. Then, select Enable and click Save settings.

Enable health assessments

Once you enable assessments for a workspace, HCP Terraform will perform assessments once every 24 hours or so. These runs do not interfere with other Terraform operations in the workspace, and any new run for the workspace reschedules the next assessment for 24 hours later. If a run fails, HCP Terraform pauses assessments until you resolve the issue and the workspace performs a successful run.

Trigger an on-demand assessment

You can use on-demand assessments to avoid waiting for scheduled assessments and detect drift sooner.

In the Health section of your learn-terraform-drift-detection workspace, click Start health assessment.

Trigger on-demand health assessment

HCP Terraform performs the assessment and displays the results. As expected, it detects the change you made to your bastion security group's inbound rules.

The assessment also detected changes to default values for some resource fields. The example Terraform configuration does not set values for these fields, but AWS sets default values when it provisions your resources. This is a common occurrence that depends on how the Terraform provider assigns unset values.

HCP Terraform displays the assessment results on the workspace overview page.

Workspace health assessment results

Note

Drift detection only reports on changes to the resource attributes defined in your configuration. To avoid accidental drift, explicitly set any attributes critical to your operations in your configuration, even if you rely on a provider's default value for that attribute.

You can also filter workspaces by assessment result on your organization's workspace landing page.

HCP Terraform workspaces filter by health assessment result

You can additionally configure workspace-specific notifications to alert you if there are specific assessment results.

Reconcile drift

For more complex configurations, drift can introduce unpredictability to operations. If your team deploys a new change to your infrastructure and only identifies the drift during the Terraform run, they may need to interrupt their workflow to decide how to proceed. If a change set includes many changes to resources, the operator needs to carefully review the execution plan to understand the drift.

Health assessments help you proactively detect drift and condition failures, so you can resolve them before they interfere with future operations. In this case, the infrastructure configuration is small, so it is easier to decide how to proceed.

When you identify infrastructure drift, you have two resolution options:

If you wish to keep the change, you can update your configuration to reflect the new setting, then run a Terraform apply to update your state file.
To revert the change back to the original setting in your configuration, run a Terraform apply to overwrite that change.

In this case, you should revert the manual security group change to prevent public ingress traffic to the bastion host.

In your example repository directory, run another terraform apply to update the security group to its initial configuration. Respond yes to the prompt to confirm the operation.

$ terraform apply
##...
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_security_group.bastion will be updated in-place
  ~ resource "aws_security_group" "bastion" {
        id                     = "sg-0f285079371575d16"
      ~ ingress                = [
          - {
              - cidr_blocks      = [
                  - "0.0.0.0/0",
                ]
              - description      = ""
              - from_port        = 22
              - ipv6_cidr_blocks = []
              - prefix_list_ids  = []
              - protocol         = "tcp"
              - security_groups  = []
              - self             = false
              - to_port          = 22
            },
          + {
              + cidr_blocks      = [
                  + "192.80.0.0/16",
                ]
              + description      = ""
              + from_port        = 22
              + ipv6_cidr_blocks = []
              + prefix_list_ids  = []
              + protocol         = "tcp"
              + security_groups  = []
              + self             = false
              + to_port          = 22
            },
        ]
        name                   = "bastion_ssh"
        tags                   = {}
        # (7 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions in workspace "learn-terraform-drift-detection"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes
##...
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Outputs:
bastion_instance_id = "i-0b598e9fba4d42618"
private_subnets = [
        "subnet-065785262ade21b94",
        "subnet-00500440348ab07ee",
    ]
public_subnets = [
        "subnet-0584ae2512fe3d054",
        "subnet-01feb9bffaef17b2d",
    ]
vpc_id = "vpc-01199c33a9e9f41bc"

This run resets the next scheduled assessment for 24 hours later. Trigger another on-demand assessment to confirm that you resolved the infrastructure drift.

This time, HCP Terraform does not detect any resource drift. You reverted your security group settings to the ones specified by your configuration, and Terraform updated the workspace state file to account for the provider-specific default and null value settings.

Destroy infrastructure

Destroy your infrastructure to avoid incurring unnecessary costs. Respond yes to the prompt to confirm the operation.

$ terraform destroy
##...
Plan: 0 to add, 0 to change, 22 to destroy.

Changes to Outputs:
  - bastion_instance_id = "i-0b598e9fba4d42618" -> null
  - private_subnets     = [
      - "subnet-065785262ade21b94",
      - "subnet-00500440348ab07ee",
    ] -> null
  - public_subnets      = [
      - "subnet-0584ae2512fe3d054",
      - "subnet-01feb9bffaef17b2d",
    ] -> null
  - vpc_id              = "vpc-01199c33a9e9f41bc" -> null

Do you really want to destroy all resources in workspace "learn-terraform-drift-detection"?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
##...
Apply complete! Resources: 0 added, 0 changed, 22 destroyed.

Optionally, delete your learn-terraform-drift-detection workspace from HCP Terraform.

Next steps

In this tutorial, you enabled health assessments for a workspace and used an on-demand assessment to detect infrastructure drift. You also learned how health assessments can preserve predictability in your infrastructure operations.

Review the following resources to learn more about how you can ensure infrastructure conformity using Terraform and HCP Terraform.

Configure and use OPA policies for infrastructure compliance, to establish guarantees around your infrastructure configuration and workflows.
Define checks and use continuous validation to verify your infrastructure health.
Use policies to control infrastructure costs.
Use the Snyk HCP Terraform run task to scan your infrastructure for security compliance.

Projects

Vault-backed dynamic credentials