For the web project I am currently working on, I started needing more disk space on my AWS instance than the free 8gbs that come with t2 instances. Eventually I should probably move to S3 to host static assets like these, but for now I took the opportunity to learn how to attach EBS volumes to my ec2 instances.

I was surprised at how much patching that needed to be done on top of Terraform to properly mount EBS volumes. The providers that are relevant are:

Basically, aws_ebs_volume is what represents an EBS volume, and aws_ebs_volume_attachment is what associates this volume with an aws_instance. This in and of itself is not hard to grasp, but there are several gotchas.

When defining aws_instance, it is also important to specify not only the region but the specific availability zone:

resource "aws_instance" "instance" {
  availability_zone =  "us-east-1a"
  ...
}

resource "aws_ebs_volume" "my_volume" {
  availability_zone =  "us-east-1a"
  size              = 2
  type = "gp2"
}

This is so that, as you can see, the aws instance can be guaranteed to attach to the ebs volume. Now, if you don’t care about what’s on the ebs volume and can blow it out each time the ec2 instance changes, then this is not an issue, and you simply compute the availability zone each time:

resource "aws_instance" "instance" {
  ...
}

resource "aws_ebs_volume" "my_volume" {
  availability_zone =  "${aws_instance.instance.availability_zone}"
  ...
}

This is a long way of saying that if you need a persistent ebs volume across aws instance restarts, then you must specify the availability zone explicitly.

Another issue is that even after attaching the ebs volume to an instance, the volume must actually be mounted to be used. It turns out that you must modify fstab to mount the drives, but you have to do it after the volume attachment finishes attaching. A remote provisioner can be used here:

resource "aws_volume_attachment" "attachment" {
  device_name = "/dev/sdh"
  skip_destroy = true
  volume_id   = "${aws_ebs_volume.my_volume.id}"
  instance_id = "${aws_instance.instance.id}"

  provisioner "remote-exec" {
    script = "mount_drives.sh"
    connection {
      user = "deploy_user"
      private_key = "${file("~/.ssh/id_rsa")}"
      host = "${aws_instance.instance.public_ip}"
    }
  }
}

I picked the name /dev/sdh for my volume, but ubuntu/debian maps this name to /dev/xvdh, and mapping, though always consistent within a linux distro, will be different between distros. AWS Linux AMIs apparently will create symbolic names so that the name that you chose for the volume will be preserved. In any case, here is mount_drives.sh:

In this case, we supply the name we gave in the volume attachment above, and mount it to the MNTPOINT, which in this case for me is /images. This ensures that after this provisioner is run, we will have a usable space at /images.

Which brings us to our last gotcha - since we had terraform doesn’t know anything about mounting, we also have to unmount the volume when we destroy the instance, which brings us to another configuration on the instance. If the ec2 instance gets destroyed, we make sure to unmount the volume first.

resource "aws_instance" "instance" {
  ...
  provisioner "remote-exec" {
    when    = "destroy"
    inline = ["sudo umount -d /dev/xvdh"] # see aws_volume_attachment.attachment.device_name. This gets mapped to /dev/xvdh
    connection {
      user = "deploy_user"
      private_key = "${file("~/.ssh/id_rsa")}"
    }
  }
}

I wish that Terraform supported this kinds of use cases out of the box, but fortunately it is flexible enough that the workarounds can be implemented fairly easily.