If it ain't broke...

In the previous post, we saw how the Terraform lifecycle rule create_before_destroy can help prevent a deadlock when renaming security groups. In this post, we will see how using the same lifecycle rule in the wrong place will create a problem.

To recap, when renaming a security group, you need to replace

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
}

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
  lifecycle {
    create_before_destroy = true
  }
}

This ensures the following series of steps:

Create a security group with a new name.
Destroy the old security group.
Associate the new security group with the instance.

This made me think - using a lifecycle rule seems like a good practice. Let me use it for the aws_security_group_rule resource also. That was a presumptuous mistake. Let us see how.

We will replicate the same infrastructure setup scenario:

Create an EC2 instance (or any other resource which uses security groups).
Associate one or more security groups to the instance.

The above infra can be created through update-security-group-rule/v1/main.tf

$ terraform init
$ terraform apply

Sample terraform output

aws_instance_test_1 = i-02d50e0a62110bbc6
security_group_test_1 = sg-03cc308342b10ebe5
security_group_test_2 = sg-03c1cbe2eb0ace857

However, for the next step, instead of renaming the security group, we will add one more entry in the cidr_block in our security_group_rule i.e. we will update

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0"] # this line will be changed

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0", "1.1.1.1/32"]

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}

$ terraform apply

# aws_security_group_rule.sg_2_rule_1 must be replaced
+/- resource "aws_security_group_rule" "sg_2_rule_1" {
      ~ cidr_blocks              = [ # forces replacement
            "0.0.0.0/0",
          + "1.1.1.1/32",
        ]
        from_port                = 8080
      ~ id                       = "sgrule-1489633736" -> (known after apply)
         .
         .
   }

aws_security_group_rule.sg_2_rule_1: Creating...

Error: [WARN] A duplicate Security Group rule was found on (sg-03c1cbe2eb0ace857). This may be a side effect of a now-fixed Terraform issue causing two security groups with identical attributes but different source_security_group_ids to overwrite each other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more information and instructions for recovery.

Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

What happened?

1. Initially, the security group had the following rule associated with it:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

2. We tried creating a new rule which has the following entries:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow
ingress   | 8080      | 8080    | 1.1.1.1/32 | allow

3. Because of the lifecycle rule create_before_destroy, Terraform is creating the step-2 rule first, which is having an entry

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

common to both rules. A security group cannot have 2 entries having the exact same rule associated with it (try creating a duplicate entry in the AWS console). Hence it fails with the error

Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

This can be fixed by, you guessed it – removing the lifecycle rule from the security_group_rule block as per update-security-group-rule/v2/main.tf

aws_security_group_rule.sg_2_rule_1: Destroying... [id=sgrule-1489633736]
aws_security_group_rule.sg_2_rule_1: Destruction complete after 0s
aws_security_group_rule.sg_2_rule_1: Creating...
aws_security_group_rule.sg_2_rule_1: Creation complete after 1s [id=sgrule-2162410043]

Lessons learned:

lifecycle rule – create_before_destroy is useful in the aws_security_group block, but harmful in the aws_security_group_rule block.
If it ain't broke, don't fix it. Again.

If it ain't broke...

Contents

Newsletter

Handcrafted Related Posts

Observability—OSS vs Paid vs Managed OSS

The Reliability industry needs a managed, non-vendor lock-in answer to spiraling costs, high cardinality and the toil of managing a tsdb

1979, a nuclear accident and SRE

Deep diving into the 'Normal accident' theory by Charles Perrow, and what it means for SREs

Infrastructure-As-Code-As-Software