While this blog usually focuses on Red Team topics, I believe that this approach could benefit anyone using Terraform and Ansible to deploy infrastructure on AWS.
Red Team Operations often require us to spin up and destroy infrastructure on the fly. Doing this manually for every operation would be a time-intensive task, prone to error. Therefore, our team attempts to automate as much as possible using Infrastructure as Code tools like Terraform (deploy infra) and Ansible (configure infra). Projects like Red Baron, give us great examples on how to achieve this. This results in several benefits:
Consistency and reliability: Anyone can deploy complex infrastructure components in the same way.
Version control: If mistakes happen, we can issue a new version of our Terraform script/Ansible playbook and all future deployments will benefit.
Speed: Setting up infra is a matter of minutes, where manual configuration would take hours / days. This is time you can no longer afford to spend as a modern Red Team.
Scalability: It's easy to upscale. E.g. deploying additional CDN domains to rotate in front of your C2 redirector is a matter of specifying the number.
Cost management: We can define default instance types or usage plans in our code that fit the purpose of the infrastructure component to control costs of our environments. The risk of dangling/forgotten infrastructure also lowers, since we can tear down entire environments in one go.
When designing our own infrastructure automation project (Red Bastion) in 2020, I hit a major limitation: After deploying infrastructure with Terraform, SSH must be accessible to deploy the Ansible playbook. This implies you need to deploy a VPC with VPN or jump host first or give your instance a public IP address and expose SSH. The latter is not even an option if you have services that should not have a public IP.
This means I cannot simply deploy my entire Red Team VPC in one go, it with one VPN/jump box, wait for deployment to complete, configure my VPN client, connect, and then deploy stage 2 over VPN. Surely this also frustrates other people, so let's try to come up with an alternative.
SSM Documents: AWS-ApplyAnsiblePlaybooks
Enter the AWS Systems Manager (SSM). According to the AWS documentation, the Systems manager enables you to manage your infrastructure:
Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and enables you to automate operational tasks across your AWS resources.
One of its features is that it allows you to run "SSM Documents", automation runbooks, on onboarded assets. AWS-ApplyAnsiblePlaybooks is such a runbook, which would fit our use case perfectly. It allows you to run an Ansible playbook from an s3 bucket locally on your asset.
At this point, the following questions arose:
Can we automatically package our playbook via terraform and upload it to s3?
What are the instance requirements?
How do we pass arguments to our Ansible playbook?
We will answer this and more in the following sections.
Custom SSM Documents
Before we continue, the default SSM document comes with certain limitations which you would potentially like to overcome. For example, it only installs the ansible package as shown below. What if our playbook depends on Ansible collections?
sudo pip3 install ansible
Instead of following the next steps, you could also copy our read-to-go json file from GitHub. In that case, simply create a new SSM document and paste the json content there.
We can easily select the default playbook and clone it via the AWS console. Go to Systems Manager > Documents and search for "AWS-ApplyAnsiblePlaybooks". Select the automation runbook and click Actions > Clone document.
We can name the new runbook Custom-ApplyAnsiblePlaybooksWithCollections. Target Type can be left empty.
We can add an additional parameter RequirementsFile, to pass our playbook's requirements.yml file if the InstallCollections parameter is set to "True".
"InstallCollections": {"type":"String","description":"(Optional) Whether to install ansible-galaxy collections.","allowedValues": ["True","False" ],"default":"False"},"RequirementsFile": {"type":"String","description":"(Optional) Path to requirements.yml file","default":"requirements.yml","allowedPattern":"[(a-z_A-Z0-9\\-\\.)\/]+(.yml|.yaml)$"},
Next, the following shell script can be added to the aws:runShellScript action to validate whether requirements.yml should be parsed.
"if [[ \"{{InstallCollections}}\" == True ]] ; then"," RequirementsFile=\"{{RequirementsFile}}\""," if [ ! -f \"${RequirementsFile}\" ] ; then"," echo \"The specified Requirements file doesn't exist in the downloaded bundle. Please review the relative path and file name.\" >&2"," exit 2"," fi"," ansible-galaxy install -r \"{{RequirementsFile}}\"","fi",
Congratulations! We now have our own Custom-ApplyAnsiblePlaybooksWithCollections SSM runbook. Keep in mind that this only exists in the region we created it. In this case, we are using eu-west-1. Therefore, we can only apply it to intances deployed in this region.
Packaging Ansible Playbooks
Next, we must come up with an approach to package our Ansible playbooks.
Ansible folder structure
The role we are about to create, can be found on GitHub.
At DXC Strikeforce, we decided to stay as close as possible to official Ansible role structure recommendations and centralise roles in git repositories. This generally corresponds to the following:
test_role/# Top-level directory named after the role├──defaults/# Default variables for the role (lowest priority)│└──main.yml├──files/# Static files to be transferred to the target hosts│└──example_file.txt├──handlers/# Handlers, triggered by tasks│└──main.yml├──meta/# Metadata about the role (dependencies, author info)│└──main.yml├──tasks/# Main list of tasks to be executed by the role│└──main.yml├──templates/# Jinja2 templates to be populated with variables│└──example_template.j2├──tests/# Test playbooks for the role│├──inventory# Inventory file for testing│└──test.yml# Test playbook├──vars/# Variables for the role (higher priority than defaults)│└──main.yml└──README.md# Documentation for the role
This approach allows us to recursively include the roles in playbook repositories calling them. E.g., if we would like to apply multiple roles to an instance, we can simply create an Ansible playbook repository as follows:
test-playbook/# Git repo of the playbook├──roles/# Different roles to include recursively like above│└──test_role/├──main.yml# main playbook, includes test_role└──requirements.yml# optional requirements if ansible-galaxy roles/collections
Now that we have our Ansible playbook structure ready, we can try to push it to S3 via Terraform. Normally, we would develop separate Terraform modules and add them to our private registry, but for this proof of concept, we will add everything in one repository.
The following Terraform code will archive and upload the ansible directory of our project to s3 as ansible.zip. The Terraform code to achieve this is shown below.
variables.tf
variable "path_to_ansible_folder" { description ="Path to the ansible from which to apply the main.yml playbook." type =string default ="ansible/test-playbook"}variable "s3_data_expiration_days" { description ="Amount of days to keep the uploaded data in s3. Should be 1 to limit storage cost." type =number default =1}variable "s3_zip_object_key" { description ="Name of the s3 bucket object key of the zip file. Normally, this should be ansible.zip." type =string default ="ansible.zip"}
s3_bucket.tf
// Create S3 bucket for ansible playbook sharinglocals { bucket_name_base =lower(replace("${var.server_name}","_","-"))// replace _ with - and lowercase all to be accepted as bucket name bucket_name =substr("${local.bucket_name_base}-${random_string.bucket_randomized.result}",0,63)// Add some randomization to bucket name}// random string to append to bucket name (prevents issues with destroy and rebuild)resource "random_string" "bucket_randomized" { length =16 special =false numeric =true upper =false}// Create new s3 bucketresource "aws_s3_bucket" "ansible" { bucket = local.bucket_name force_destroy =true}// restrict public bucket accessresource "aws_s3_bucket_public_access_block" "ansible" { bucket = aws_s3_bucket.ansible.id block_public_acls =true block_public_policy =true ignore_public_acls =true restrict_public_buckets =true}# Delete objects in bucket after 1 dayresource "aws_s3_bucket_lifecycle_configuration" "ansible" { bucket = aws_s3_bucket.ansible.idrule { id ="expire-after-${tostring(var.s3_data_expiration_days)}-day" status ="Enabled"expiration { days = var.s3_data_expiration_days // expire bucket contents after one day }abort_incomplete_multipart_upload { days_after_initiation = var.s3_data_expiration_days // if upload failed, also expire data after 1 day } }}
upload_ansible_zip.tf
// archive ansible directory// Every time a file changes in the ansible directory, the zip will be recreateddata "archive_file" "ansible_dir_zip" { type ="zip" source_dir = var.path_to_ansible_folder output_path ="${path.module}/${local.bucket_name}.zip"// avoid collisions with same module running for different instances}// upload ansible directory as zipresource "aws_s3_object" "ansible_dir_zip" { bucket = aws_s3_bucket.ansible.id key = var.s3_zip_object_key source = data.archive_file.ansible_dir_zip.output_path etag =filemd5(data.archive_file.ansible_dir_zip.output_path)}
If we execute the following, Terraform connects to the AWS API to deploy the resources in EC2.
terraforminitterraformplanterraformapply
We can check the bucket content via the AWS console to confirm our zip file was indeed uploaded.
Bonus: If we would like to update the playbook at any time we can simply change the contents and type terraform apply again. The s3 ansible.zip object will automatically be updated. This allows us a simple method to re-apply a playbook to a previously deployed machine.
EC2 Instance Deployment
At this point, we completed the following items:
Successfully created a custom SSM Document automation runbook Custom-ApplyAnsiblePlaybooksWithCollections to run complex Ansible playbooks from s3 bucket.
Automated uploading a local Ansible playbook to S3 via Terraform.
The next steps would be to:
Deploy an EC2 instance with Terraform.
Create and assign the correct EC2 instance role to:
Onboard the instance to SSM.
Access the created S3 bucket.
Apply the Custom-ApplyAnsiblePlaybooksWithCollections to the instance via SSM, triggering the Ansible playbook to execute.
Deploy EC2 Instance
Deploying an EC2 instance is both straightforward and well-documented, so we will spend limited time explaining the steps. For our purpose, we will use the following AMI:
This default build of Ubuntu server 22.04 comes with Amazon SSM Agent preinstalled, saving us the hassle of pushing it to the image ourselves. We can go with the cheapest t3a.nano and 8GB of EBS storage, since we do not need much computing power for our purpose.
We will also use data sources to convert VPC and subnet name variables to corresponding IDs, but you could also use the IDs directly. Most of the settings will be defined as defaults in variables.tf.
variables.tf
// EC2 instancevariable "vpc_name" { description ="Name of the VPC." type =string}variable "server_name" { description ="Name of the server." type =string}variable "instance_type" { description ="Aws EC2 instance type to use." default ="t3a.nano" type =string}variable "subnet_name" { description ="Name of the subnet to deploy the machine in." type =string}variable "ebs_volume_size" { description ="EBS size in GB." type =number default =8}variable "delete_ebs_on_termination" { description ="Whether to delete the volume on termination. True avoids costs and destroys data after tearing down the environment." type =bool default =true}variable "source_dest_check" { description ="Whether to set the source_dest_check to enable IP forwarding in AWS. Set to false for VPN server." type =bool default =true}variable "private_ip" { description ="Private IP address to set. Leave blank to let AWS decide." type =string default =""}variable "ssh_key_local_directory" { description ="Directory to store the SSH key." type =string default ="./ssh_keys"}// use AMI with Amazon SSM Agent preinstalled// e.g. Ubuntu server 22.04variable "ami" { description ="AWS AMI to deploy" type =string default ="ami-0932dacac40965a65"// Do not update build on actively used instances or machine will be destroyed}
main.tf
I have the habit of storing some common resources in the main.tf file. This is just a personal preference.
// fetch the vpc id based on the vpc namedata "aws_vpc" "target_vpc" {filter { name ="tag:Name" values = [var.vpc_name] }}// fetch the subnet id based on the subnet namedata "aws_subnet" "target_subnet" { vpc_id = data.aws_vpc.target_vpc.idfilter { name ="tag:Name" values = [var.subnet_name] }}
security_group.tf
Our security group will only contain egress rules. Inbound SSH is not required as we will use SSM to push an Ansible playbook to the instance.
// Create the security groupresource "aws_security_group" "ansible_instance" { name ="${var.server_name}_security_group" description ="Security group created by Red Bastion" vpc_id = data.aws_vpc.target_vpc.id}// no ingress required// egress can be further restricted to only include egress// ansible galaxy connections & AWS SSM API connections should be allowed// The followign allows full egress, but only on ports 53 UDP, 80 & 443 TCPresource "aws_vpc_security_group_egress_rule" "dns" { security_group_id = aws_security_group.ansible_instance.id cidr_ipv4 ="0.0.0.0/0" from_port =53 ip_protocol ="udp" to_port =53}resource "aws_vpc_security_group_egress_rule" "http" { security_group_id = aws_security_group.ansible_instance.id cidr_ipv4 ="0.0.0.0/0" from_port =80 ip_protocol ="tcp" to_port =80}resource "aws_vpc_security_group_egress_rule" "https" { security_group_id = aws_security_group.ansible_instance.id cidr_ipv4 ="0.0.0.0/0" from_port =443 ip_protocol ="tcp" to_port =443}
ssh_keypair.tf
Just to be safe, we will create an SSH keypair for the instance. This will be our backup key in case the connection with the Amazon Systems Manager is broken somehow.
// Generate private key in case we want to authenticate via SSH (should not happen)// E.g. as backup in case amazon-ssm-agent crashesresource "tls_private_key" "ssh" { algorithm ="RSA" rsa_bits =4096}// write the key to local disk. Can be omitted resource "local_file" "foo" { content = tls_private_key.ssh.private_key_pem filename ="${var.ssh_key_local_directory}/${var.server_name}.pem"}// Add public key to AWSresource "aws_key_pair" "ssh" { key_name ="ssh_${var.server_name}" public_key = tls_private_key.ssh.public_key_openssh}
ec2_instance.tf
// Create EC2 ec2_instance instanceresource "aws_instance" "ec2_instance" { tags = { Name ="${var.server_name}" }root_block_device { delete_on_termination = var.delete_ebs_on_termination volume_size = var.ebs_volume_size volume_type ="gp2" tags = { Name ="${var.server_name}_ebs" } } ami = var.ami instance_type = var.instance_type key_name = aws_key_pair.ssh.key_name vpc_security_group_ids = [aws_security_group.ansible_instance.id] subnet_id = data.aws_subnet.target_subnet.id// this will come in to play after adding the instance role// leave it out if you would like to deploy an instance without any role iam_instance_profile = aws_iam_instance_profile.ssm_s3.name // Enable SSM and s3 role for instance source_dest_check = var.source_dest_check// optionally set a static IP private_ip = var.private_ip ==""?null: var.private_ip// disable AWS metadata v1 (unauthenticated)// to improve securitymetadata_options { http_endpoint ="enabled" http_tokens ="required" }}
Execution
We can now run our terraform code to deploy our EC2 instance.
terraform init
terraform plan
terraform apply
SSM & S3 Instance Role
Next, we should create and assign an EC2 instance role to onboard the machine to SSM, which would enable it to communicate with the Systems Manager. This will allow us to apply SSM documents to the EC2 instance. The role should also be able to read the ansible.zip file from our automatically created s3 bucket.
iam_role.tf
The minimal privileges to onboard an instance to SSM are defined in the default AWS arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore IAM policy. If we attach this policy and a custom S3 policy, we fulfilled the requirements.
We will also grant the s3:PutObject permission on the S3 bucket to allow the machine to upload data as well. This can be useful in case we would like to collect logs or files later on.
locals { iam_role_base =lower(replace("${var.server_name}","_","-"))// replace _ with - and lowercase all to be accepted as bucket name}// Create SSM Role that allows ansible to configure the machine// EC2 instance should be able to assume the roleresource "aws_iam_role" "ssm_s3" { name =substr("${local.iam_role_base}_ssm_s3_role",0,63)// name can only be 64 chars max assume_role_policy =jsonencode({ Version ="2012-10-17" Statement = [ { Action ="sts:AssumeRole" Effect ="Allow" Principal = { Service ="ec2.amazonaws.com" } } ] })}// attach default recommended SSM policy to new role// Instance with this role is now onboarded to SSMresource "aws_iam_role_policy_attachment" "ssm_role_policy" { policy_arn ="arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" role = aws_iam_role.ssm_s3.name}// also allow access to s3 bucket that shares files with instanceresource "aws_iam_policy" "s3_bucket_policy" { name ="${local.iam_role_base}_s3_bucket_policy" policy =jsonencode({ Version ="2012-10-17" Statement = [ { Action = ["s3:Get*", "s3:List*", "s3:PutObject"] Effect ="Allow" Resource ="arn:aws:s3:::${local.bucket_name}/*" } ] })}// attach policy to same roleresource "aws_iam_role_policy_attachment" "s3_role_policy" { policy_arn = aws_iam_policy.s3_bucket_policy.arn role = aws_iam_role.ssm_s3.name}// create profile to use to assign role to ec2 instanceresource "aws_iam_instance_profile" "ssm_s3" { name ="${local.iam_role_base}_ssm_profile" role = aws_iam_role.ssm_s3.name}
Apply Playbook
All that is left now is to apply the Custom-ApplyAnsiblePlaybooksWithCollections to the instance via SSM, triggering the Ansible playbook to be downloaded and executed. This should create /hello-world.txt on the target instance.
variables.tf
// push additional extra_vars as mapvariable "ansible_extra_vars" { description ="List of key-value pairs of variables to pass to ansible" type =map(string)}
ssm_ansible_playbook.tf
locals { ansible_extra_vars_string =join(" ", [for k, v in var.ansible_extra_vars :"${k}=${v}"])}// Associate the document with the instance// Uses a custom version of AWS-ApplyAnsiblePlaybooks (ApplyAnsiblePlaybooksWithCollections) to also install dependencies with ansible-galaxyresource "aws_ssm_association" "ansible_playbook_association" { name ="Custom-ApplyAnsiblePlaybooksWithCollections"//based on AWS-ApplyAnsiblePlaybooks association_name ="${var.server_name}_playbook_association"// targets to run the document ontargets { key ="InstanceIds" values = [aws_instance.ec2_instance.id] } parameters = { SourceType ="S3" SourceInfo =jsonencode({ path ="https://s3.amazonaws.com/${aws_s3_bucket.ansible.id}/ansible.zip" })// We can use ExtraVariables to pass parameters to the playbook // always include etag to retrigger playbook apply on change of playbook// always include s3bucket if you would like to upload custom data ExtraVariables ="SSM=True ${local.ansible_extra_vars_string} s3bucket=${aws_s3_bucket.ansible.id} s3_object_etag=${aws_s3_object.ansible_dir_zip.etag}" InstallDependencies ="True"// if Ansible must still be installed, should be True in most cases unles using own image with Ansible preinstalled InstallCollections ="False"// can toggle to install ansible-galaxy dependencies RequirementsFile ="requirements.yml"// where to install dependencies from. Should be in the root on ansible.zip PlaybookFile ="main.yml"// should be in the root on ansible.zip }// output logs to bucket// This means putObject is neededoutput_location { s3_bucket_name = aws_s3_bucket.ansible.id } automation_target_parameter_name ="InstanceId" max_concurrency ="1" max_errors ="0"}
Execution
After expanding our Terraform project with the SSM association, we can apply the changes. We should not forget to add ansible_extra_vars to ensure we can pass parameters to our script.
ansible_extra_vars = { testparameter ="test"}
terraform init
terraform plan
terraform apply
Next, we can validate successfull application via the State Manager.
Additionally, we can start a session via SSM via Fleet Manager and check if the file was indeed added. If we monitor the root directory, we can observe the moment /hello_world.txt is written with the value "hello test" as specified in our Ansible playbook.
Conclusion
Success! We successfully managed to deploy a fresh Ubuntu 22.04 LTS EC2 instance and applied an Ansible role through a playbook with parameters without exposing SSH! This can now easily be replicated to automatically spin up entire private environments from anywhere in the world, without any direct connection.
Troubleshooting
If your machine is not showing up in SSM, I've found that it's usually one of these:
Egress traffic does not allow comms with AWS SSM API
SSM instance role not applied correctly
Deployed VM in a public subnet but forgot to assign a public IP.
Chose an AMI that does not have Amazon SSM Agent preinstalled
Ansible Playbook Execution
Your ansible.zip will be expanded under /var/lib/amazon/ssm/<instanceid>/document/orchestration/<orchestrationid>/downloads. When troubleshooting playbook execution, we usually go into this directory and execute manually with the appropriate extra vars.
Passing extra vars via SSM documents can be a bit tricky, as certain characters are not allowed. A workaround could be to pass a config.json inside the files directory of the Ansible role or use the vars directory to pass parameters and large values.