Photo by Jordan Harrison on Unsplash
Building a global reverse proxy with on-demand SSL support
...or: How and why to launch your first EC2 instances when you're full-on Serverless at any other day in the year
Motivation
Who needs a reverse proxy with on-demand SSL support? Well, think about services as Hashnode, which also runs this blog, or Fathom and SimpleAnalytics. A feature that all those services have in common? They all are enabling their customers to bring their own, custom domain names. The latter two services use them to bypass adblockers, with the intention that customers can track all their pageviews and events, which potentially wouldn't be possible otherwise because the service's own domain are prone to DNS block lists. Hashnode is using them to enable their customers to host their blogs under their own domain names.
Requirements
What are the functional & non-functional requirements to build such a system? Let's try to recap:
We want to be able to use a custom ("external") domain, e.g.
subdomain.customdomain.tld
to redirect to another domain, such astargetdomain.tld
via aCNAME
DNS recordThe custom domains need to have SSL/TLS support
The custom domains need to be configurable, without changing the underlying infrastructure
We want to make sure that the service will only create and provide the certificates for whitelisted custom domains
We want to optimize the latency of individual requests, thus need to support a scalable and distributed infrastructure
We want to be as Serverless as possible
We want to optimize for infrastructure costs (variable and fixed)
We want to build this on AWS
The service needs to be deployable/updateable/removable via Infrastructure as Code (IaC) in a repeatable manner
Architectural considerations
Looking at the above requirements, what are the implications from an architectural point of view? Which tools and services are already on the market? What does AWS as Public Cloud Provider offer for our use case?
For the main requirements of a reverse proxy server with automatic SSL/TLS support, Caddy seems to be an optimal candidate. As it is written in Go, it can run in Docker containers very well and can be used on numerous operating systems. This means we have the options to either run it on EC2 instances or ECS/Fargate if we decide to run it in containers. The latter would cater to the requirement to run as Serverless as possible. It has modules to store the generated SSL/TLS on-demand certificates in DynamoDB or S3.
Also, the whitelisting of custom domains is possible for those certificates, by providing an additional backend service, which Caddy can ask whether a requested custom domain is allowed to use or not.
A challenge is that none of those modules are contained in the official Caddy builds, meaning that we'd have to build a custom version of Caddy to be able to use those storage backends.
Regarding the requirement of global availability and short response times, AWS Global Accelerator is a viable option, as it can provide a global single static IP address endpoint for multiple, regionally distributed services. In our use case, those services would be our Caddy installations.
Running Caddy itself, as said before, is possible via Containers or EC2 instances / VMs. As the services will run the whole time and assumingly don't need a lot of resources if not under heavy load, we assume that 1 vCPU and 1 GB of memory should be enough.
When projecting this on the necessary infrastructure, the cost comparison between Containers and VMs looks like the following (for simplification, we just compare the fixed costs, ignore variable costs such as egress traffic, and assume the us-east-1
region is used):
Containers
Fargate task with 1 vCPU and 1 GB of memory for each Caddy regional instance
- Price: ($0.04048/vCPU hour + $0.004445/GB hour) * 720 hours (30 days) = $32.35 / 30 days
ALB to make the Fargate tasks available to the outside world
- Price: ($0.0225 per ALB-hour + $0.008 per LCU-hour) * 720 hours (30 days) = $21.96 / 30 days
In combination, it would cost $54.31 to run this setup for 30 days.
EC2 instances / VMs
A t2.micro instance with 1 vCPU and 1 GB f memory for each Caddy regional instance
- Price: $0.0116/hour on-demand * 720 hours (30 days) = $8.35
There's no need for a Load Balancer in front of the EC2 instances, as Global Accelerator can directly use them
- Price: $0
In combination, it would cost $8.35 to run this setup for 30 days.
Additional costs are:
AWS Global Accelerator
- Price: $0.025 / hour * 720 hours (30 days) = $18
DynamoDB table
- Price: (5000 reads/day * $0.25/million + 50 writes/day * $1.25/million) * 30 days = $0.0375 reads + 0.001875 writes = $0.04
Lambda function (128 MB / 0.25 sec avg. duration / 5000 req./day)
- Price: ($0.0000166667 / GB-second + $0.20 per 1M req.) = basically $0
Resulting architecture
Based on the calculated fixed costs, we decided to use EC2 instances instead of Fargate tasks, which will save us a decent amount of money even for one Caddy instance. The estimated costs for running this architecture for 30 days are $26.39.
As one of our requirements was that we can roll out this infrastructure potentially on a global scale, we need to be able to deploy the EC2 instances with the Caddy servers in different AWS regions, as well as having multiple instances in the same region.
Furthermore, we could use DynamoDB Global Tables to achieve a global distribution of the certificates to get faster response times, but deem it as out of scope for this article.
The final architecture:
Implementation
To implement the described architecture, we must take several steps. First of all, we must build a custom version of Caddy that includes the DynamoDB module, which then enables us to use DynamoDB as certificate store.
Custom Caddy build
This can be achieved via a custom build process leveraging Docker images of AmazonLinux 2, as found at tobilg/aws-caddy-build.
build.sh (parametrized custom build of Caddy via Docker)
#!/usr/bin/env bash
set -e
# Set OS (first script argument)
OS=${1:-linux}
# Set Caddy version (second script argument)
CADDY_VERSION=${2:-v2.6.2}
# Create release folders
mkdir -p $PWD/releases $PWD/temp_release
# Run build
docker build --build-arg OS=$OS --build-arg CADDY_VERSION=$CADDY_VERSION -t custom-caddy-build .
# Copy release from image to temporary folder
docker run -v $PWD/temp_release:/opt/mount --rm -ti custom-caddy-build bash -c "cp /tmp/caddy-build/* /opt/mount/"
# Copy release to releases
cp $PWD/temp_release/* $PWD/releases/
# Cleanup
rm -rf $PWD/temp_release
Dockerfile (will build Caddy with the DynamoDb and S3 modules)
FROM amazonlinux:2
ARG CADDY_VERSION=v2.6.2
ARG OS=linux
# Install dependencies
RUN yum update -y && \
yum install golang -y
RUN GOBIN=/usr/local/bin/ go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
RUN mkdir -p /tmp/caddy-build && \
GOOS=${OS} xcaddy build ${CADDY_VERSION} --with github.com/ss098/certmagic-s3 --with github.com/silinternational/certmagic-storage-dynamodb/v3 --output /tmp/caddy-build/aws_caddy_${CADDY_VERSION}_${OS}
That's it for the custom Caddy build. You don't need to build this yourself, as the further steps use the release I built and uploaded to GitHub.
Reverse Proxy Service
The implementation of the reverse proxy service can be found at tobilg/global-reverse-proxy.
Just clone it via git clone
https://github.com/tobilg/global-reverse-proxy.git
to your local machine, and configure it as described below.
Prerequisites
Serverless Framework
You need to have a recent (>=3.1.2) version of the Serverless Framework installed globally on your machine. If you haven't, you can run npm i -g serverless
to install it.
Valid AWS credentials
The Serverless Framework relies on already configured AWS credentials. Please refer to the docs to learn how to set them up on your local machine.
EC2 key already configured
If you want to interact with the deployed EC2 instance(s), you need to add your existing public SSH key or create a new one. Please have a look at the AWS docs to learn how you can do that.
Please also note the name you have given to the newly created key, as you will have to update the configuration of the proxy server(s) stack.
Infrastructure as Code overview
The infrastructure consists of three different stacks:
A stack for the domain whitelisting service, and the certificate table in DynamoDB
A stack for the proxy server(s) itself, which can be deployed multiple times if you want high (global) availability and fast latencies
A stack for the Global Accelerator, and the according DNS records
Most important parts
The main functionality, the reverse proxy based on Caddy, is deployed via an EC2 instance. Its configuration, the so-called Caddyfile, is, together with the CloudFormation resource for the EC2 instance, the most important part.
This configuration enables the reverse proxy, the on-demand TLS feature and DynamoDB storage for certificates. It's automatically parametrized via the generated /etc/caddy/environment
file (see ec2.yml below). There's a systemctl
service for Caddy generated, based on our configuration derived from the serverless.yml, as well.
{
admin off
on_demand_tls {
ask {$DOMAIN_SERVICE_ENDPOINT}
}
storage dynamodb {$TABLE_NAME} {
aws_region {$TABLE_REGION}
}
}
:80 {
respond /health "Im healthy" 200
}
:443 {
tls {$LETSENCRYPT_EMAIL_ADDRESS} {
on_demand
}
reverse_proxy https://{$TARGET_DOMAIN} {
header_up Host {$TARGET_DOMAIN}
header_up User-Custom-Domain {host}
header_up X-Forwarded-Port {server_port}
health_timeout 5s
}
}
ec2.yml (extract)
The interesting part is the UserData script, which is run automatically when the EC2 instance starts. It does the following:
Download the custom Caddy build with DynamoDB support
Prepare a group and a user for Caddy
Create the
caddy.service
file forsystemctl
Create the
Caddyfile
(as outlined above)Create the environment file (
/etc/caddy/environment
)Enable & reload the
systemctl
service
Resources:
EC2Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: '${self:custom.ec2.instanceType}'
KeyName: '${self:custom.ec2.keyName}'
SecurityGroups:
- !Ref 'InstanceSecurityGroup'
ImageId: 'ami-0b5eea76982371e91' # Amazon Linux 2 AMI
IamInstanceProfile: !Ref 'InstanceProfile'
UserData: !Base64
'Fn::Join':
- ''
- - |
#!/bin/bash -xe
- |
sudo wget -O /usr/bin/caddy "https://github.com/tobilg/aws-caddy-build/raw/main/releases/aws_caddy_v2.6.2_linux"
- |
sudo chmod +x /usr/bin/caddy
- |
sudo groupadd --system caddy
- |
sudo useradd --system --gid caddy --create-home --home-dir /var/lib/caddy --shell /usr/sbin/nologin --comment "Caddy web server" caddy
- |
sudo mkdir -p /etc/caddy
- |
sudo echo -e '${file(./configs.js):caddyService}' | sudo tee /etc/systemd/system/caddy.service
- |
sudo printf '${file(./configs.js):caddyFile}' | sudo tee /etc/caddy/Caddyfile
- |
sudo echo -e "TABLE_REGION=${self:custom.caddy.dynamoDBTableRegion}\nTABLE_NAME=${self:custom.caddy.dynamoDBTableName}\nDOMAIN_SERVICE_ENDPOINT=${self:custom.caddy.domainServiceEndpoint}\nLETSENCRYPT_EMAIL_ADDRESS=${self:custom.caddy.letsEncryptEmailAddress}\nTARGET_DOMAIN=${self:custom.caddy.targetDomainName}" | sudo tee /etc/caddy/environment
- |
sudo systemctl daemon-reload
- |
sudo systemctl enable caddy
- |
sudo systemctl start --now caddy
The Global Accelerator CloudFormation resource wires the EC2 instance(s) to its kind-of global load balancer. This is then referenced by the dns-record.yml, which assigns the configured domain name to the Global Accelerator.
Resources:
Accelerator:
Type: AWS::GlobalAccelerator::Accelerator
Properties:
Name: 'External-Accelerator'
Enabled: true
Listener:
Type: AWS::GlobalAccelerator::Listener
Properties:
AcceleratorArn:
Ref: Accelerator
Protocol: TCP
ClientAffinity: NONE
PortRanges:
- FromPort: 443
ToPort: 443
EndpointGroup1:
Type: AWS::GlobalAccelerator::EndpointGroup
Properties:
EndpointConfigurations:
- EndpointId: '${self:custom.ec2.instance1.id}'
Weight: 1
EndpointGroupRegion: '${self:custom.ec2.instance1.region}'
HealthCheckIntervalSeconds: 30
HealthCheckPath: '/health'
HealthCheckPort: 80
HealthCheckProtocol: 'HTTP'
ListenerArn: !Ref 'Listener'
ThresholdCount: 3
Detailed configuration
Stack configurations
Please configure the following values for the different stacks:
The target domain name where you want your reverse proxy to send the requests to (targetDomainName)
The email address to use for automatic certificate generation via LetsEncrypt (letsEncryptEmailAddress)
The domain name of the proxy service itself, which is then used by GlobalAccelerator (domain)
Optionally: The current IP address from which you want to use the EC2 instance(s) via SSH from (sshClientIPAddress). If you want to use SSH, you'll need to uncomment the respective SecurityGroup settings
Whitelisted domain configuration
You need to make sure that not everyone can use your reverse proxy with every domain. Therefore, you need to configure the whitelist of domains that you be used by Caddy's on-demand TLS feature.
This is done with the Domain Verifier Lambda function, which is deployed at a Function URL endpoint.
The configuration can be changed here before deploying the service.
HINT: To use this dynamically, as you'd probably wish in a production setting, you could rewrite the Lambda function to read the custom domains from a DynamoDB table, and have another Lambda function run recurrently to issue DNS checks for the CNAME entries the customers would need to make (see below).
DNS / Nameserver configurations
If you use an external domain provider, such as Namecheap or GoDaddy, make such that you point the DNS settings at your domain's configuration to those which are assigned to your HostedZone by Amazon. You can look these up in the AWS Console or via the AWS CLI.
CNAME configuration for proxying
You also need to add CNAME records to the domains you want to proxy for, e.g. if your proxy service domain is external.mygreatproxyservice.com
, you need to add a CNAME record to your existing domain (e.g. test.myexistingdomain.com
) to redirect to the proxy service domain:
CNAME test.myexistingdomain.com external.mygreatproxyservice.com
Passing options during deployment
When running sls deploy
for each stack, you can specify the following options to customize the deployments:
--stage
: This will configure the so-called stage, which is part of the stack name (default:prd
)--region
: This will configure the AWS region where the stack is deployed (default:us-east-1
)
Deployment
You need to follow a specific deployment order to be able to run the overall service:
Domain whitelisting service:
cd domain-service-stack && sls deploy && cd ..
Proxy server(s):
cd proxy-server-stack && sls deploy && cd ..
Global Accelerator & HostedZone / DNS :
cd accelerator-stack && sls deploy && cd ..
Removal
To remove the individual stacks, you can run sls remove
in the individual subfolders.
Wrapping up
We were able to build a POC for a (potentially) globally distributed reverse proxy service, with on-demand TLS support. We decided against using Fargate, and for using EC2 due to cost reasons. This prioritized costs higher, than running as Serverless as possible. It's possible that, in another setting / environment / experience, you might come to another conclusion, which is completely fine.
For a more production-like setup, you'd probably need to amend the Domain Verifier Lambda function, so that it dynamically looks up the custom domains that are configured e.g. by your customers via a UI, and stored in another DynamoDB table via another Lambda function. Deleting or updating those custom domains should probably be possible, too.
Furthermore, you should then write an additional Lambda function that recurrently checks each stored custom domain if:
The CNAME records point to your
external.$YOUR_DOMAIN_NAME.tld
, and updates the status accordinglyPerforms a check via HTTPS whether an actual redirect from the custom domain to your domain is possible