If it ain't broke...

In the previous post, we saw how the Terraform lifecycle rule create_before_destroy can help prevent a deadlock when renaming security groups. In this post, we will see how using the same lifecycle rule in the wrong place will create a problem.

To recap, when renaming a security group, you need to replace

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
}

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
  lifecycle {
    create_before_destroy = true
  }
}

This ensures the following series of steps:

Create a security group with a new name.
Destroy the old security group.
Associate the new security group with the instance.

This made me think - using a lifecycle rule seems like a good practice. Let me use it for the aws_security_group_rule resource also. That was a presumptuous mistake. Let us see how.

We will replicate the same infrastructure setup scenario:

Create an EC2 instance (or any other resource which uses security groups).
Associate one or more security groups to the instance.

The above infra can be created through update-security-group-rule/v1/main.tf

$ terraform init
$ terraform apply

Sample terraform output

aws_instance_test_1 = i-02d50e0a62110bbc6
security_group_test_1 = sg-03cc308342b10ebe5
security_group_test_2 = sg-03c1cbe2eb0ace857

However, for the next step, instead of renaming the security group, we will add one more entry in the cidr_block in our security_group_rule i.e. we will update

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0"] # this line will be changed

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0", "1.1.1.1/32"]

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}

$ terraform apply

# aws_security_group_rule.sg_2_rule_1 must be replaced
+/- resource "aws_security_group_rule" "sg_2_rule_1" {
      ~ cidr_blocks              = [ # forces replacement
            "0.0.0.0/0",
          + "1.1.1.1/32",
        ]
        from_port                = 8080
      ~ id                       = "sgrule-1489633736" -> (known after apply)
         .
         .
   }

aws_security_group_rule.sg_2_rule_1: Creating...

Error: [WARN] A duplicate Security Group rule was found on (sg-03c1cbe2eb0ace857). This may be a side effect of a now-fixed Terraform issue causing two security groups with identical attributes but different source_security_group_ids to overwrite each other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more information and instructions for recovery.

Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

What happened?

1. Initially, the security group had the following rule associated with it:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

2. We tried creating a new rule which has the following entries:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow
ingress   | 8080      | 8080    | 1.1.1.1/32 | allow

3. Because of the lifecycle rule create_before_destroy, Terraform is creating the step-2 rule first, which is having an entry

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

common to both rules. A security group cannot have 2 entries having the exact same rule associated with it (try creating a duplicate entry in the AWS console). Hence it fails with the error

Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

This can be fixed by, you guessed it – removing the lifecycle rule from the security_group_rule block as per update-security-group-rule/v2/main.tf

aws_security_group_rule.sg_2_rule_1: Destroying... [id=sgrule-1489633736]
aws_security_group_rule.sg_2_rule_1: Destruction complete after 0s
aws_security_group_rule.sg_2_rule_1: Creating...
aws_security_group_rule.sg_2_rule_1: Creation complete after 1s [id=sgrule-2162410043]

Lessons learned:

lifecycle rule – create_before_destroy is useful in the aws_security_group block, but harmful in the aws_security_group_rule block.
If it ain't broke, don't fix it. Again.

If it ain't broke...

Contents

Newsletter

Handcrafted Related Posts

Monitor Google Cloud Functions using Pushgateway and Levitate

How to monitor serverless async jobs from Google Cloud Functions with Prometheus Pushgateway and Levitate using the push model

Guide to Service Level Indicators and Setting Service Level Objectives

A guide to set practical Service Level Objectives (SLOs) & Service Level Indicators (SLIs) for your Site Reliability Engineering practices.

SaaS Monitoring with Levitate

How Levitate solves today's challenges of B2B SaaS monitoring, including noisy neighbors by unlocking per-tenant observability