If it ain't broke...
A Terraform lifecycle rule in the right place can help prevent a deadlock. But the same lifecycle rule in the wrong place?
If it ain't broke...

In the previous post, we saw how the Terraform lifecycle rule create_before_destroy can help prevent a deadlock when renaming security groups. In this post, we will see how using the same lifecycle rule in the wrong place will create a problem.

To recap, when renaming a security group, you need to replace

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
}

by

resource "aws_security_group" "test_1" {
  name = "test-1-new-name"
  lifecycle {
    create_before_destroy = true
  }
}

This ensures the following series of steps:

  1. Create a security group with the new name.
  2. Destroy the old security group.
  3. Associate the new security group with the instance.

This made me think - using a lifecycle rule seems like a good practice. Let me use it for the aws_security_group_rule resource also. That was a presumptuous mistake. Let us see how.

We will replicate the same infrastructure setup scenario:

  1. Create an EC2 instance (or any other resource which uses security groups).
  2. Associate one or more security groups to the instance.

The above infra can be created through update-security-group-rule/v1/main.tf

$ terraform init
$ terraform apply

Sample terraform output

aws_instance_test_1 = i-02d50e0a62110bbc6
security_group_test_1 = sg-03cc308342b10ebe5
security_group_test_2 = sg-03c1cbe2eb0ace857

However, for the next step, instead of renaming the security group, we will add one more entry in the cidr_block in our security_group_rule i.e. we will update

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0"] # this line will be changed

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}

to

resource "aws_security_group_rule" "sg_2_rule_1" {
  from_port         = 8080
  protocol          = "tcp"
  to_port           = 8080
  security_group_id = aws_security_group.test_2.id

  cidr_blocks = ["0.0.0.0/0", "1.1.1.1/32"]

  lifecycle {
    create_before_destroy = true
  }
  type = "ingress"
}
$ terraform apply
# aws_security_group_rule.sg_2_rule_1 must be replaced
+/- resource "aws_security_group_rule" "sg_2_rule_1" {
      ~ cidr_blocks              = [ # forces replacement
            "0.0.0.0/0",
          + "1.1.1.1/32",
        ]
        from_port                = 8080
      ~ id                       = "sgrule-1489633736" -> (known after apply)
         .
         .
   }
aws_security_group_rule.sg_2_rule_1: Creating...

Error: [WARN] A duplicate Security Group rule was found on (sg-03c1cbe2eb0ace857). This may be a side effect of a now-fixed Terraform issue causing two security groups with identical attributes but different source_security_group_ids to overwrite each other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more information and instructions for recovery.
Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

What happened?

1. Initially, the security group had the following rule associated with it:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

2. We tried creating a new rule which has the following entries:

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow
ingress   | 8080      | 8080    | 1.1.1.1/32 | allow

3. Because of the lifecycle rule create_before_destroy, Terraform is creating the step-2 rule first, which is having an entry

direction | from_port | to_port | source     | rule
ingress   | 8080      | 8080    | 0.0.0.0/0  | allow

common to both rules. A security group cannot have 2 entries having the exact same rule associated with it (try creating a duplicate entry in the AWS console). Hence it fails with the error

Error message: the specified rule "peer: 0.0.0.0/0, TCP, from port: 8080, to port: 8080, ALLOW" already exists

This can be fixed by, you guessed it – removing the lifecycle rule from the security_group_rule block as per update-security-group-rule/v2/main.tf

aws_security_group_rule.sg_2_rule_1: Destroying... [id=sgrule-1489633736]
aws_security_group_rule.sg_2_rule_1: Destruction complete after 0s
aws_security_group_rule.sg_2_rule_1: Creating...
aws_security_group_rule.sg_2_rule_1: Creation complete after 1s [id=sgrule-2162410043]

Lessons learnt:

  1. lifecycle rule – create_before_destroy is useful in the aws_security_group block, but harmful in the aws_security_group_rule block.
  2. If it ain't broke, don't fix it. Again.
Share to:
Twitter
Reddit
Linkedin
#sre #devops

You might also like...

We’ve raised a $11M Series A led by Sequoia Capital India!
We’ve raised a $11M Series A led by Sequoia Capital India!

Change is the only constant in a cloud environment. The number of microservices is constantly growing and each of these is being deployed several times a day or week, all hosted on ephemeral servers. A typical customer request depends on at least 3 internal and 1 external service. It’s

Read ->
Why Service Level Objectives?
Why Service Level Objectives?

Understanding how to measure the health of your servcie, benefits of using SLOs, how to set compliances and much more...

Read ->
How to Improve On-Call Experience!
How to Improve On-Call Experience!

Better practices and tools for management of on-call practices

Read ->

SRE with Last9 is incredibly easy. But don’t just take our word for it.