Connectionless Ansible Deployment with Terraform via SSM

Deploying Ansible playbooks & roles to EC2 instances via Terraform from anywhere without interruption and without accessing SSH.

TL;DR

GitHub repository with poc here.

Introduction

While this blog usually focuses on Red Team topics, I believe that this approach could benefit anyone using Terraform and Ansible to deploy infrastructure on AWS.

Red Team Operations often require us to spin up and destroy infrastructure on the fly. Doing this manually for every operation would be a time-intensive task, prone to error. Therefore, our team attempts to automate as much as possible using Infrastructure as Code tools like Terraform (deploy infra) and Ansible (configure infra). Projects like Red Baron, give us great examples on how to achieve this. This results in several benefits:

  • Consistency and reliability: Anyone can deploy complex infrastructure components in the same way.

  • Version control: If mistakes happen, we can issue a new version of our Terraform script/Ansible playbook and all future deployments will benefit.

  • Speed: Setting up infra is a matter of minutes, where manual configuration would take hours / days. This is time you can no longer afford to spend as a modern Red Team.

  • Scalability: It's easy to upscale. E.g. deploying additional CDN domains to rotate in front of your C2 redirector is a matter of specifying the number.

  • Cost management: We can define default instance types or usage plans in our code that fit the purpose of the infrastructure component to control costs of our environments. The risk of dangling/forgotten infrastructure also lowers, since we can tear down entire environments in one go.

When designing our own infrastructure automation project (Red Bastion) in 2020, I hit a major limitation: After deploying infrastructure with Terraform, SSH must be accessible to deploy the Ansible playbook. This implies you need to deploy a VPC with VPN or jump host first or give your instance a public IP address and expose SSH. The latter is not even an option if you have services that should not have a public IP.

This means I cannot simply deploy my entire Red Team VPC in one go, it with one VPN/jump box, wait for deployment to complete, configure my VPN client, connect, and then deploy stage 2 over VPN. Surely this also frustrates other people, so let's try to come up with an alternative.

SSM Documents: AWS-ApplyAnsiblePlaybooks

Enter the AWS Systems Manager (SSM). According to the AWS documentation, the Systems manager enables you to manage your infrastructure:

Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and enables you to automate operational tasks across your AWS resources.

One of its features is that it allows you to run "SSM Documents", automation runbooks, on onboarded assets. AWS-ApplyAnsiblePlaybooks is such a runbook, which would fit our use case perfectly. It allows you to run an Ansible playbook from an s3 bucket locally on your asset.

At this point, the following questions arose:

  1. Can we automatically package our playbook via terraform and upload it to s3?

  2. What are the instance requirements?

  3. How do we pass arguments to our Ansible playbook?

We will answer this and more in the following sections.

Custom SSM Documents

Before we continue, the default SSM document comes with certain limitations which you would potentially like to overcome. For example, it only installs the ansible package as shown below. What if our playbook depends on Ansible collections?

sudo pip3 install ansible

Instead of following the next steps, you could also copy our read-to-go json file from GitHub. In that case, simply create a new SSM document and paste the json content there.

We can easily select the default playbook and clone it via the AWS console. Go to Systems Manager > Documents and search for "AWS-ApplyAnsiblePlaybooks". Select the automation runbook and click Actions > Clone document.

We can name the new runbook Custom-ApplyAnsiblePlaybooksWithCollections. Target Type can be left empty.

We can add an additional parameter RequirementsFile, to pass our playbook's requirements.yml file if the InstallCollections parameter is set to "True".

"InstallCollections": {
  "type": "String",
  "description": "(Optional) Whether to install ansible-galaxy collections.",
  "allowedValues": [
    "True",
    "False"
  ],
  "default": "False"
},
"RequirementsFile": {
  "type": "String",
  "description": "(Optional) Path to requirements.yml file",
  "default": "requirements.yml",
  "allowedPattern": "[(a-z_A-Z0-9\\-\\.)\/]+(.yml|.yaml)$"
},

Next, the following shell script can be added to the aws:runShellScript action to validate whether requirements.yml should be parsed.

"if  [[ \"{{InstallCollections}}\" == True ]] ; then",
"   RequirementsFile=\"{{RequirementsFile}}\"",
"   if [ ! -f  \"${RequirementsFile}\" ] ; then",
"      echo \"The specified Requirements file doesn't exist in the downloaded bundle. Please review the relative path and file name.\" >&2",
"      exit 2",
"   fi",
"   ansible-galaxy install -r \"{{RequirementsFile}}\"",
"fi",

Congratulations! We now have our own Custom-ApplyAnsiblePlaybooksWithCollections SSM runbook. Keep in mind that this only exists in the region we created it. In this case, we are using eu-west-1. Therefore, we can only apply it to intances deployed in this region.

Packaging Ansible Playbooks

Next, we must come up with an approach to package our Ansible playbooks.

Ansible folder structure

The role we are about to create, can be found on GitHub.

At DXC Strikeforce, we decided to stay as close as possible to official Ansible role structure recommendations and centralise roles in git repositories. This generally corresponds to the following:

test_role/             # Top-level directory named after the role
├── defaults/          # Default variables for the role (lowest priority)
   └── main.yml
├── files/             # Static files to be transferred to the target hosts
   └── example_file.txt
├── handlers/          # Handlers, triggered by tasks
   └── main.yml
├── meta/              # Metadata about the role (dependencies, author info)
   └── main.yml
├── tasks/             # Main list of tasks to be executed by the role
   └── main.yml
├── templates/         # Jinja2 templates to be populated with variables
   └── example_template.j2
├── tests/             # Test playbooks for the role
   ├── inventory      # Inventory file for testing
   └── test.yml       # Test playbook
├── vars/              # Variables for the role (higher priority than defaults)
   └── main.yml
└── README.md          # Documentation for the role

This approach allows us to recursively include the roles in playbook repositories calling them. E.g., if we would like to apply multiple roles to an instance, we can simply create an Ansible playbook repository as follows:

test-playbook/               # Git repo of the playbook
├── roles/                   # Different roles to include recursively like above
   └── test_role/
├── main.yml                 # main playbook, includes test_role
└── requirements.yml         # optional requirements if ansible-galaxy roles/collections

main.yml contents:

---
- name: "Configure test"
  hosts: localhost
  roles:
    - test_role

Pushing to s3 via Terraform

The final Terraform code can be found on GitHub.

Now that we have our Ansible playbook structure ready, we can try to push it to S3 via Terraform. Normally, we would develop separate Terraform modules and add them to our private registry, but for this proof of concept, we will add everything in one repository.

The following Terraform code will archive and upload the ansible directory of our project to s3 as ansible.zip. The Terraform code to achieve this is shown below.

variables.tf

variable "path_to_ansible_folder" {
  description = "Path to the ansible from which to apply the main.yml playbook."
  type        = string
  default     = "ansible/test-playbook"
}

variable "s3_data_expiration_days" {
  description = "Amount of days to keep the uploaded data in s3. Should be 1 to limit storage cost."
  type        = number
  default     = 1
}

variable "s3_zip_object_key" {
  description = "Name of the s3 bucket object key of the zip file. Normally, this should be ansible.zip."
  type        = string
  default     = "ansible.zip"
}

s3_bucket.tf

// Create S3 bucket for ansible playbook sharing
locals {
  bucket_name_base          = lower(replace("${var.server_name}", "_", "-")) // replace _ with - and lowercase all to be accepted as bucket name
  bucket_name      = substr("${local.bucket_name_base}-${random_string.bucket_randomized.result}", 0, 63) // Add some randomization to bucket name
}

// random string to append to bucket name (prevents issues with destroy and rebuild)
resource "random_string" "bucket_randomized" {
  length  = 16
  special = false
  numeric = true
  upper   = false
}

// Create new s3 bucket
resource "aws_s3_bucket" "ansible" {
  bucket        = local.bucket_name
  force_destroy = true
}

// restrict public bucket access
resource "aws_s3_bucket_public_access_block" "ansible" {
  bucket = aws_s3_bucket.ansible.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Delete objects in bucket after 1 day
resource "aws_s3_bucket_lifecycle_configuration" "ansible" {
  bucket = aws_s3_bucket.ansible.id

  rule {
    id     = "expire-after-${tostring(var.s3_data_expiration_days)}-day"
    status = "Enabled"

    expiration {
      days = var.s3_data_expiration_days // expire bucket contents after one day
    }

    abort_incomplete_multipart_upload {
      days_after_initiation = var.s3_data_expiration_days // if upload failed, also expire data after 1 day
    }
  }
}

upload_ansible_zip.tf

// archive ansible directory
// Every time a file changes in the ansible directory, the zip will be recreated
data "archive_file" "ansible_dir_zip" {
  type        = "zip"
  source_dir  = var.path_to_ansible_folder
  output_path = "${path.module}/${local.bucket_name}.zip" // avoid collisions with same module running for different instances
}

// upload ansible directory as zip
resource "aws_s3_object" "ansible_dir_zip" {
  bucket = aws_s3_bucket.ansible.id
  key    = var.s3_zip_object_key
  source = data.archive_file.ansible_dir_zip.output_path

  etag = filemd5(data.archive_file.ansible_dir_zip.output_path)
}

If we execute the following, Terraform connects to the AWS API to deploy the resources in EC2.

terraform init
terraform plan
terraform apply

We can check the bucket content via the AWS console to confirm our zip file was indeed uploaded.

Bonus: If we would like to update the playbook at any time we can simply change the contents and type terraform apply again. The s3 ansible.zip object will automatically be updated. This allows us a simple method to re-apply a playbook to a previously deployed machine.

EC2 Instance Deployment

At this point, we completed the following items:

  • Successfully created a custom SSM Document automation runbook Custom-ApplyAnsiblePlaybooksWithCollections to run complex Ansible playbooks from s3 bucket.

  • Automated uploading a local Ansible playbook to S3 via Terraform.

The next steps would be to:

  1. Deploy an EC2 instance with Terraform.

  2. Create and assign the correct EC2 instance role to:

    1. Onboard the instance to SSM.

    2. Access the created S3 bucket.

  3. Apply the Custom-ApplyAnsiblePlaybooksWithCollections to the instance via SSM, triggering the Ansible playbook to execute.

Deploy EC2 Instance

Deploying an EC2 instance is both straightforward and well-documented, so we will spend limited time explaining the steps. For our purpose, we will use the following AMI:

ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240701
ami-0932dacac40965a65

This default build of Ubuntu server 22.04 comes with Amazon SSM Agent preinstalled, saving us the hassle of pushing it to the image ourselves. We can go with the cheapest t3a.nano and 8GB of EBS storage, since we do not need much computing power for our purpose.

We will also use data sources to convert VPC and subnet name variables to corresponding IDs, but you could also use the IDs directly. Most of the settings will be defined as defaults in variables.tf.

variables.tf

// EC2 instance
variable "vpc_name" {
  description = "Name of the VPC."
  type        = string
}

variable "server_name" {
  description = "Name of the server."
  type        = string
}

variable "instance_type" {
  description = "Aws EC2 instance type to use."
  default     = "t3a.nano"
  type        = string
}

variable "subnet_name" {
  description = "Name of the subnet to deploy the machine in."
  type        = string
}

variable "ebs_volume_size" {
  description = "EBS size in GB."
  type        = number
  default     = 8
}

variable "delete_ebs_on_termination" {
  description = "Whether to delete the volume on termination. True avoids costs and destroys data after tearing down the environment."
  type        = bool
  default     = true
}

variable "source_dest_check" {
  description = "Whether to set the source_dest_check to enable IP forwarding in AWS. Set to false for VPN server."
  type        = bool
  default     = true
}
variable "private_ip" {
  description = "Private IP address to set. Leave blank to let AWS decide."
  type        = string
  default     = ""
}

variable "ssh_key_local_directory" {
  description = "Directory to store the SSH key."
  type        = string
  default     = "./ssh_keys"
}

// use AMI with Amazon SSM Agent preinstalled
// e.g. Ubuntu server 22.04
variable "ami" {
  description = "AWS AMI to deploy"
  type        = string
  default     = "ami-0932dacac40965a65" // Do not update build on actively used instances or machine will be destroyed
}

main.tf

I have the habit of storing some common resources in the main.tf file. This is just a personal preference.

// fetch the vpc id based on the vpc name
data "aws_vpc" "target_vpc" {
  filter {
    name   = "tag:Name"
    values = [var.vpc_name]
  }
}

// fetch the subnet id based on the subnet name
data "aws_subnet" "target_subnet" {
  vpc_id = data.aws_vpc.target_vpc.id

  filter {
    name   = "tag:Name"
    values = [var.subnet_name]
  }
}

security_group.tf

Our security group will only contain egress rules. Inbound SSH is not required as we will use SSM to push an Ansible playbook to the instance.

// Create the security group
resource "aws_security_group" "ansible_instance" {
  name        = "${var.server_name}_security_group"
  description = "Security group created by Red Bastion"
  vpc_id      = data.aws_vpc.target_vpc.id
}

// no ingress required
// egress can be further restricted to only include egress
// ansible galaxy connections & AWS SSM API connections should be allowed
// The followign allows full egress, but only on ports 53 UDP, 80 & 443 TCP
resource "aws_vpc_security_group_egress_rule" "dns" {
  security_group_id = aws_security_group.ansible_instance.id

  cidr_ipv4   = "0.0.0.0/0"
  from_port   = 53
  ip_protocol = "udp"
  to_port     = 53
}

resource "aws_vpc_security_group_egress_rule" "http" {
  security_group_id = aws_security_group.ansible_instance.id

  cidr_ipv4   = "0.0.0.0/0"
  from_port   = 80
  ip_protocol = "tcp"
  to_port     = 80
}

resource "aws_vpc_security_group_egress_rule" "https" {
  security_group_id = aws_security_group.ansible_instance.id

  cidr_ipv4   = "0.0.0.0/0"
  from_port   = 443
  ip_protocol = "tcp"
  to_port     = 443
}

ssh_keypair.tf

Just to be safe, we will create an SSH keypair for the instance. This will be our backup key in case the connection with the Amazon Systems Manager is broken somehow.

// Generate private key in case we want to authenticate via SSH (should not happen)
// E.g. as backup in case amazon-ssm-agent crashes
resource "tls_private_key" "ssh" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

// write the key to local disk. Can be omitted 
resource "local_file" "foo" {
  content  = tls_private_key.ssh.private_key_pem
  filename = "${var.ssh_key_local_directory}/${var.server_name}.pem"
}

// Add public key to AWS
resource "aws_key_pair" "ssh" {
  key_name   = "ssh_${var.server_name}"
  public_key = tls_private_key.ssh.public_key_openssh
}

ec2_instance.tf

// Create EC2 ec2_instance instance
resource "aws_instance" "ec2_instance" {
  tags = {
    Name = "${var.server_name}"
  }

  root_block_device {
    delete_on_termination = var.delete_ebs_on_termination
    volume_size           = var.ebs_volume_size
    volume_type           = "gp2"

    tags = {
      Name = "${var.server_name}_ebs"
    }
  }

  ami                    = var.ami
  instance_type          = var.instance_type
  key_name               = aws_key_pair.ssh.key_name
  vpc_security_group_ids = [aws_security_group.ansible_instance.id]
  subnet_id              = data.aws_subnet.target_subnet.id
  
  // this will come in to play after adding the instance role
  // leave it out if you would like to deploy an instance without any role
  iam_instance_profile   = aws_iam_instance_profile.ssm_s3.name // Enable SSM and s3 role for instance
  source_dest_check      = var.source_dest_check

  // optionally set a static IP
  private_ip = var.private_ip == "" ? null : var.private_ip

  // disable AWS metadata v1 (unauthenticated)
  // to improve security
  metadata_options {
    http_endpoint = "enabled"
    http_tokens   = "required"
  }
}

Execution

We can now run our terraform code to deploy our EC2 instance.

terraform init
terraform plan
terraform apply

SSM & S3 Instance Role

Next, we should create and assign an EC2 instance role to onboard the machine to SSM, which would enable it to communicate with the Systems Manager. This will allow us to apply SSM documents to the EC2 instance. The role should also be able to read the ansible.zip file from our automatically created s3 bucket.

iam_role.tf

The minimal privileges to onboard an instance to SSM are defined in the default AWS arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore IAM policy. If we attach this policy and a custom S3 policy, we fulfilled the requirements.

We will also grant the s3:PutObject permission on the S3 bucket to allow the machine to upload data as well. This can be useful in case we would like to collect logs or files later on.

locals {
  iam_role_base = lower(replace("${var.server_name}", "_", "-")) // replace _ with - and lowercase all to be accepted as bucket name
}

// Create SSM Role that allows ansible to configure the machine
// EC2 instance should be able to assume the role
resource "aws_iam_role" "ssm_s3" {
  name = substr("${local.iam_role_base}_ssm_s3_role", 0, 63) // name can only be 64 chars max

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

// attach default recommended SSM policy to new role
// Instance with this role is now onboarded to SSM
resource "aws_iam_role_policy_attachment" "ssm_role_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.ssm_s3.name
}

// also allow access to s3 bucket that shares files with instance
resource "aws_iam_policy" "s3_bucket_policy" {
  name = "${local.iam_role_base}_s3_bucket_policy"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action   = ["s3:Get*", "s3:List*", "s3:PutObject"]
        Effect   = "Allow"
        Resource = "arn:aws:s3:::${local.bucket_name}/*"
      }
    ]
  })
}

// attach policy to same role
resource "aws_iam_role_policy_attachment" "s3_role_policy" {
  policy_arn = aws_iam_policy.s3_bucket_policy.arn
  role       = aws_iam_role.ssm_s3.name
}

// create profile to use to assign role to ec2 instance
resource "aws_iam_instance_profile" "ssm_s3" {
  name = "${local.iam_role_base}_ssm_profile"
  role = aws_iam_role.ssm_s3.name
}

Apply Playbook

All that is left now is to apply the Custom-ApplyAnsiblePlaybooksWithCollections to the instance via SSM, triggering the Ansible playbook to be downloaded and executed. This should create /hello-world.txt on the target instance.

variables.tf

// push additional extra_vars as map
variable "ansible_extra_vars" {
  description = "List of key-value pairs of variables to pass to ansible"
  type        = map(string)
}

ssm_ansible_playbook.tf

locals {
  ansible_extra_vars_string = join(" ", [for k, v in var.ansible_extra_vars : "${k}=${v}"])
}

// Associate the document with the instance
// Uses a custom version of AWS-ApplyAnsiblePlaybooks (ApplyAnsiblePlaybooksWithCollections) to also install dependencies with ansible-galaxy
resource "aws_ssm_association" "ansible_playbook_association" {
  name             = "Custom-ApplyAnsiblePlaybooksWithCollections" //based on AWS-ApplyAnsiblePlaybooks
  association_name = "${var.server_name}_playbook_association"

  // targets to run the document on
  targets {
    key    = "InstanceIds"
    values = [aws_instance.ec2_instance.id]

  }

  parameters = {
    SourceType = "S3"
    SourceInfo = jsonencode({
      path = "https://s3.amazonaws.com/${aws_s3_bucket.ansible.id}/ansible.zip"
    })

    // We can use ExtraVariables to pass parameters to the playbook 
    // always include etag to retrigger playbook apply on change of playbook
    // always include s3bucket if you would like to upload custom data
    ExtraVariables      = "SSM=True ${local.ansible_extra_vars_string} s3bucket=${aws_s3_bucket.ansible.id} s3_object_etag=${aws_s3_object.ansible_dir_zip.etag}"
    InstallDependencies = "True"             // if Ansible must still be installed, should be True in most cases unles using own image with Ansible preinstalled
    InstallCollections  = "False"            // can toggle to install ansible-galaxy dependencies
    RequirementsFile    = "requirements.yml" // where to install dependencies from. Should be in the root on ansible.zip
    PlaybookFile        = "main.yml"         // should be in the root on ansible.zip
  }

  // output logs to bucket
  // This means putObject is needed
  output_location {
    s3_bucket_name = aws_s3_bucket.ansible.id
  }

  automation_target_parameter_name = "InstanceId"
  max_concurrency                  = "1"
  max_errors                       = "0"
}

Execution

After expanding our Terraform project with the SSM association, we can apply the changes. We should not forget to add ansible_extra_vars to ensure we can pass parameters to our script.

ansible_extra_vars = {
    testparameter = "test"
}
terraform init
terraform plan
terraform apply

Next, we can validate successfull application via the State Manager.

Additionally, we can start a session via SSM via Fleet Manager and check if the file was indeed added. If we monitor the root directory, we can observe the moment /hello_world.txt is written with the value "hello test" as specified in our Ansible playbook.

Conclusion

Success! We successfully managed to deploy a fresh Ubuntu 22.04 LTS EC2 instance and applied an Ansible role through a playbook with parameters without exposing SSH! This can now easily be replicated to automatically spin up entire private environments from anywhere in the world, without any direct connection.

Troubleshooting

If your machine is not showing up in SSM, I've found that it's usually one of these:

  • Egress traffic does not allow comms with AWS SSM API

  • SSM instance role not applied correctly

  • Deployed VM in a public subnet but forgot to assign a public IP.

  • Chose an AMI that does not have Amazon SSM Agent preinstalled

Ansible Playbook Execution

Your ansible.zip will be expanded under /var/lib/amazon/ssm/<instanceid>/document/orchestration/<orchestrationid>/downloads. When troubleshooting playbook execution, we usually go into this directory and execute manually with the appropriate extra vars.

ansible-playbook main.yml -e "extra_vars1=value1 extravars2=value2"

Known Limitations

Passing extra vars via SSM documents can be a bit tricky, as certain characters are not allowed. A workaround could be to pass a config.json inside the files directory of the Ansible role or use the vars directory to pass parameters and large values.

Last updated