Deployment Architecture
Softly runs on AWS for backend services and Cloudflare for DNS, SSL, and static site hosting. All AWS resources are managed by Terraform.
This guide explains every piece of the infrastructure in plain language so you can understand how the whole system fits together, even if you have never deployed anything before.
1. The Big Picture
Here is the full path a request takes from a user's phone to the database and back:
User's Phone/Browser
|
v
+-------------+
| Cloudflare | <-- DNS + SSL + CDN + DDoS protection
| (Front Door)|
+------+-------+
|
+-----+-----------------------+
| |
v v
+----------+ +--------------+
| Pages | | AWS (London)|
| (Static) | | |
+----------+ | +--------+ |
| Landing | | | ALB | |
| Docs | | +---+----+ |
| Web App | | | |
+----------+ | +---v----+ |
| | ECS | |
| | (Rails)| |
| +---+----+ |
| | |
| +---v------+|
| | RDS ||
| |(Postgres)||
| +---+------+|
| | |
| +---v----+ |
| | Redis | |
| +--------+ |
+--------------+Two paths exist depending on what the user is requesting:
- Static content (landing page, docs, web app shell) goes to Cloudflare Pages. These are pre-built HTML/CSS/JS files served instantly from Cloudflare's edge network, which has servers all around the world.
- API requests (login, fetch data, upload files) go to AWS in London, where the Rails backend lives.
2. What Is Each Service?
Cloudflare -- The Security Guard and Receptionist
What it is: A service that sits between the internet and our servers, handling security and performance.
Why we need it: Without Cloudflare, our servers would be directly exposed to the internet. Anyone could attack them, and users far from London would experience slow load times.
Our setup: Every request to mysoftly.app first hits Cloudflare. It handles DNS (translating "api.mysoftly.app" into an IP address), SSL (encrypting traffic so nobody can snoop on it), CDN (caching content closer to users), and DDoS protection (blocking floods of fake traffic).
Analogy: Think of a nightclub bouncer who also gives directions. They check IDs (SSL), turn away troublemakers (DDoS protection), and point guests to the right room (DNS routing).
VPC (Virtual Private Cloud) -- The Private Office Building
What it is: A private, isolated network within AWS where all our resources live.
Why we need it: Without a VPC, our database and servers would be on the open internet. A VPC is like putting a fence around everything so only authorized traffic can get in.
Our setup: Our VPC uses the IP range 10.0.0.0/16, which gives us 65,536 private IP addresses to work with. Everything inside this range is "inside the building."
Analogy: Imagine renting an entire office building. The public cannot wander in freely. You decide which rooms have windows facing the street (public subnets) and which rooms are interior-only (private subnets).
Subnets -- Rooms Inside the Building
What they are: Subdivisions of the VPC. Each subnet is a smaller network within the larger VPC.
Why we have 4 (2 public + 2 private):
We have two of each because they sit in different Availability Zones (AZs). An AZ is a physically separate data centre. If one data centre loses power or catches fire, the other keeps running.
- Public subnets (10.0.1.0 and 10.0.2.0) are like rooms with windows facing the street. They can receive traffic directly from the internet. The ALB (load balancer) lives here.
- Private subnets (10.0.3.0 and 10.0.4.0) are like secure interior rooms with no windows. The database, Redis, and application containers live here. Nobody from the internet can reach them directly.
Why this matters: Your database should never be directly accessible from the internet. Period. Putting it in a private subnet guarantees that.
ALB (Application Load Balancer) -- The Receptionist
What it is: A service that receives incoming web requests and forwards them to healthy application containers.
Why we need it: If we have multiple copies of the app running (for reliability or to handle more traffic), something needs to decide which copy handles each request. The ALB also checks if containers are healthy and stops sending traffic to broken ones.
Our setup: The ALB accepts HTTP traffic only (not HTTPS) because Cloudflare already handled the SSL encryption upstream. It sits in the public subnets and forwards requests to ECS containers in the private subnets.
Analogy: A receptionist at a company's front desk. When a visitor arrives, the receptionist checks which employees are available and sends the visitor to the right desk. If an employee is sick (unhealthy container), the receptionist stops sending visitors to them.
ECS Fargate -- The Self-Managing Office Worker
What it is: The service that runs our Rails application inside Docker containers, without us having to manage any servers.
Why we need it: Our Rails app needs to run somewhere. Fargate means AWS handles the underlying computer (the server hardware, the operating system, security patches). We just say "run this container" and AWS figures out the rest.
Key concepts:
- Docker container -- A lightweight, self-contained package that includes the app code, Ruby, Rails, and everything needed to run. Think of it as a lunchbox with everything inside, so the worker does not need to visit the cafeteria.
- Docker image -- The blueprint for creating a container. Stored in ECR (see below). Like the recipe card used to pack the lunchbox.
- Fargate vs EC2 -- With EC2, you rent a computer and manage it yourself (install updates, monitor disk space, fix crashes). With Fargate, AWS manages the computer. You just define what to run and how much CPU/memory it needs.
Our setup: 512 CPU units (half a vCPU) and 1024 MiB memory per task. Auto-scales from 1 to 3 containers.
Analogy: Instead of hiring a full-time office manager and buying desks, you use a co-working space. You show up, do your work, and the co-working space handles the building maintenance, cleaning, and internet.
Auto-scaling -- Hiring Temp Workers When It Gets Busy
What it is: A rule that automatically adds or removes containers based on how busy the app is.
Why we need it: Traffic is unpredictable. At 3 AM, one container is plenty. During peak hours, we might need three. Auto-scaling handles this without human intervention.
Our setup:
- Minimum: 1 container (always at least one running)
- Maximum: 3 containers
- Scale-up trigger: CPU usage exceeds 70%
- Scale-down: When load drops, extra containers are terminated
Analogy: A coffee shop with one barista during quiet mornings. When the lunch rush hits and the line gets long (CPU > 70%), the manager calls in a second barista. If the rush continues, a third is called in. When things calm down, the extra baristas go home.
RDS PostgreSQL -- The Filing Cabinet
What it is: A managed relational database where all application data is stored (users, sessions, documents, etc.).
Why we need it: Every app needs to store data somewhere persistent. "Managed" means AWS handles backups, security patches, hardware failures, and software updates. We just read and write data.
Our setup: PostgreSQL 16 on a db.t3.micro instance (the smallest/cheapest option). Encrypted at rest (data on disk is scrambled so even if someone stole the hard drive, they could not read it). Automated daily backups with 1-day retention.
Analogy: A filing cabinet in the secure interior room of the office. Only the office workers (ECS containers) can access it. The building management (AWS) handles fireproofing, making backup copies, and replacing rusty drawers.
ElastiCache Redis -- The Sticky Note Board
What it is: A fast, in-memory data store used for temporary data that needs to be accessed quickly.
Why we need it: Two main uses:
- Background jobs (Solid Queue) -- When the app needs to do something slow (send an email, process a file), it puts a job on the Redis queue and a background worker picks it up. This way the user does not wait.
- Caching -- Storing frequently accessed data in memory so we do not hit the database every time.
Our setup: Redis 7 on a cache.t3.micro instance.
Analogy: A whiteboard or sticky note board in the office. Need to remember something quickly? Stick a note on it. Need to tell a coworker to do something later? Leave them a note. Much faster than opening the filing cabinet (database) every time.
S3 -- The Warehouse
What it is: Object storage for files. Virtually unlimited capacity at very low cost.
Why we need it: Users upload documents (passport photos, ID scans, etc.) via Active Storage. These files need to be stored somewhere durable and cheap. S3 stores them reliably with 99.999999999% durability (that is eleven 9s).
Our setup: Bucket name: softly-storage-production
Analogy: A warehouse with infinite shelves. You label each box (file) and can retrieve it anytime. The warehouse company guarantees they will never lose your stuff.
ECR (Elastic Container Registry) -- The App Locker
What it is: A private Docker image registry where we store built versions of our app.
Why we need it: When we build a new version of the Rails app, we package it as a Docker image and push it to ECR. When ECS needs to run the app, it pulls the latest image from ECR.
Our setup: Registry: 239732221658.dkr.ecr.eu-west-2.amazonaws.com/softly-app
Analogy: A locker at the office entrance. Every time you update the instruction manual (new app version), you put the latest copy in the locker. When a new worker starts (new container), they grab the manual from the locker.
SSM Parameter Store -- The Safe for Passwords
What it is: A secure place to store configuration values and secrets (database passwords, API keys, encryption keys).
Why we need it: Hardcoding secrets in code is a security disaster. If someone reads the code, they get all your passwords. SSM stores them encrypted and only authorized services can read them at runtime.
Our setup: All secrets live under the /softly/production/ path. ECS containers read them when they start up.
Analogy: A locked safe in the office. Only employees with the right key (IAM role) can open it. The database password, API keys, and other sensitive info live inside.
SES (Simple Email Service) -- The Post Office
What it is: AWS's email sending service for transactional emails (password resets, verification emails, etc.).
Why we need it: The app needs to send emails. SES handles deliverability, DKIM signing (proving emails really come from us), and compliance.
Our setup: DKIM-configured for mysoftly.app. Currently in sandbox mode (can only send to verified addresses until AWS approves production access).
Analogy: A post office that handles all outgoing mail. It stamps each letter with an official seal (DKIM) so the recipient knows it is not forged.
NAT Gateway -- The Mailroom
What it is: A network device that allows resources in private subnets to access the internet (to download updates, call external APIs) without being accessible from the internet.
Why we need it: ECS containers in private subnets sometimes need to reach the internet (pull Docker images, call third-party APIs like Anthropic or Deepgram). But we do not want the internet to reach them directly. The NAT Gateway solves this one-way problem.
How it works: Outbound traffic from private subnets goes through the NAT Gateway (which lives in a public subnet). The NAT rewrites the source IP so responses come back to the NAT, which forwards them to the original sender in the private subnet.
Analogy: A mailroom in a secure building. People inside can send letters out, and replies come back through the mailroom. But someone on the street cannot just walk into the building through the mailroom.
IAM Roles -- ID Badges
What it is: AWS's permission system. Each service gets a "role" that defines exactly what it is allowed to do.
Why we need it: Without IAM, every service could access everything. That is dangerous. IAM enforces the principle of least privilege: each service only gets the permissions it actually needs.
Our setup:
- ECS Execution Role: Can pull images from ECR and read secrets from SSM. Nothing else.
- ECS Task Role: Can read/write files in S3 and send emails via SES. Nothing else.
Analogy: ID badges in an office building. The delivery driver's badge opens the loading dock but not the executive floor. The accountant's badge opens the finance room but not the server room. Everyone gets exactly the access they need, no more.
CloudWatch -- The Security Cameras
What it is: AWS's monitoring and logging service. It records everything that happens in our infrastructure.
Why we need it: When something breaks at 2 AM, you need logs to figure out what went wrong. CloudWatch stores container logs, metrics (CPU usage, memory), and can trigger alarms.
Our setup: ECS container logs are sent to CloudWatch automatically. You can view them in the AWS console or via the CLI.
Analogy: Security cameras in every room of the building. They record everything. When something goes wrong, you rewind the tape to see what happened.
3. Network Architecture
Here is how the VPC is laid out with its subnets across two Availability Zones:
+-- VPC (10.0.0.0/16) -----------------------------------------+
| |
| Availability Zone A Availability Zone B |
| +------------------+ +------------------+ |
| | Public Subnet | | Public Subnet | |
| | (10.0.1.0/24) | | (10.0.2.0/24) | |
| | | | | |
| | ALB <-----------+--------+--> ALB | |
| | NAT Gateway | | | |
| +------------------+ +------------------+ |
| +------------------+ +------------------+ |
| | Private Subnet | | Private Subnet | |
| | (10.0.3.0/24) | | (10.0.4.0/24) | |
| | | | | |
| | ECS Tasks | | ECS Tasks | |
| | RDS (primary) | | RDS (standby) | |
| | Redis | | | |
| +------------------+ +------------------+ |
| |
| Internet Gateway (front door to the internet) |
+---------------------------------------------------------------+Why two Availability Zones?
Each AZ is a physically separate data centre in the London region. If one data centre has a power outage, hardware failure, or even a natural disaster, the other AZ keeps running. Your app stays online.
Why public vs private subnets?
The rule is simple: anything that needs to receive traffic directly from the internet goes in a public subnet. Everything else goes in a private subnet.
- Public subnets hold the ALB (which receives requests from Cloudflare) and the NAT Gateway (which handles outbound internet access for private resources).
- Private subnets hold the app containers, database, and Redis. These should never be directly reachable from the internet.
How traffic flows
Internet --> Cloudflare --> ALB (public subnet) --> ECS (private subnet) --> RDS (private subnet)
--> Redis (private subnet)Each arrow is a security boundary. Traffic can only move forward through authorized paths.
4. Request Flow (Step by Step)
Here is exactly what happens when a user opens the Softly app on their phone:
1. Phone "GET api.mysoftly.app/v1/me"
|
2. DNS Lookup Cloudflare resolves api.mysoftly.app
| Returns a Cloudflare proxy IP (not AWS directly)
|
3. Cloudflare Terminates SSL (decrypts HTTPS)
| Checks for DDoS / malicious traffic
| Forwards plain HTTP to the ALB in AWS
|
4. ALB Receives the request in the public subnet
| Checks which ECS containers are healthy
| Forwards to a healthy container
|
5. ECS (Rails) Container in the private subnet processes the request
| Authenticates the user (checks JWT token)
| Queries the database, reads from cache
|
6. RDS + Redis Database returns user data
| Redis returns cached values or queues background jobs
|
7. Response Rails builds JSON response
| Response flows back: ECS --> ALB --> Cloudflare --> PhoneThe whole round trip typically takes 50-200ms depending on what the request does.
5. Scaling Explained
Auto-scaling adjusts the number of running containers based on CPU usage:
Normal traffic: [ Container 1 ] CPU: 30%
|
(traffic increases)
|
CPU hits 70%: [ Container 1 ] [ Container 2 ] CPU: 45% each
|
(even more traffic)
|
CPU hits 70% again:[ Container 1 ] [ Container 2 ] [ Container 3 ] (max)
|
(traffic drops)
|
Back to normal: [ Container 1 ] CPU: 25%Key points:
- Minimum 1 container is always running (the app is never fully off)
- Maximum 3 containers (cost control -- we cap the upper limit)
- Scale-up happens when average CPU across all containers exceeds 70%
- Scale-down happens gradually as load decreases (AWS waits to make sure traffic really dropped before terminating containers)
- Each new container takes about 1-2 minutes to start up and begin serving traffic
6. Security Layers
Security is applied in layers, like concentric walls around a castle. An attacker would need to breach every layer to reach the data:
Layer 1: Cloudflare DDoS protection, WAF, SSL termination
Layer 2: ALB SG Security group only accepts traffic from Cloudflare IPs
Layer 3: ECS SG Security group only accepts traffic from the ALB
Layer 4: RDS SG Security group only accepts traffic from ECS
Layer 5: Private subnet No direct internet access to database or app
Layer 6: SSM Secrets encrypted at rest, accessed via IAM roles
Layer 7: IAM Least privilege -- each service has minimum permissions"SG" stands for Security Group, which is like a firewall rule attached to a specific resource. Each security group says "only accept connections from this source."
The chain works like this: the database only talks to ECS. ECS only talks to the ALB. The ALB only talks to Cloudflare. If any layer is bypassed, the next layer blocks the attack.
7. Infrastructure Overview
| Layer | Provider | Purpose |
|---|---|---|
| Backend API | AWS ECS Fargate (eu-west-2) | Rails 8.1 container |
| Database | AWS RDS PostgreSQL 16 | Primary data store |
| Cache / Queues | AWS ElastiCache Redis 7 | Solid Queue + Rails cache |
| File Storage | AWS S3 | Active Storage (document vault) |
| AWS SES | Transactional emails (password reset, DKIM) | |
| Secrets | AWS SSM Parameter Store | All production secrets |
| DNS / SSL | Cloudflare | Zone management, Full SSL mode |
| Static Sites | Cloudflare Pages | Landing, web app, docs |
| IaC | Terraform | All AWS resource definitions |
| Domain | GoDaddy (registered) | mysoftly.app, DNS delegated to Cloudflare |
Domain Mapping
| URL | Service | Hosting |
|---|---|---|
| mysoftly.app | Landing page (Astro) | Cloudflare Pages |
| api.mysoftly.app | Rails API | AWS ECS Fargate via ALB |
| app.mysoftly.app | Web app (Next.js) | Cloudflare Pages (static only for now) |
| docs.mysoftly.app | Documentation (VitePress) | Cloudflare Pages |
AWS Resources (Terraform-managed)
Compute
ECS Fargate cluster:
softly-production- 512 CPU units, 1024 MiB memory
- Auto-scaling: 1-3 tasks
- Container image from ECR
ECR registry:
239732221658.dkr.ecr.eu-west-2.amazonaws.com/softly-app
Data
- RDS PostgreSQL 16:
db.t3.micro, encrypted at rest, 1-day automated backups - ElastiCache Redis 7:
cache.t3.micro, used by Solid Queue and Rails cache - S3:
softly-storage-productionbucket for Active Storage file uploads
Networking
- VPC: 2 public + 2 private subnets across 2 Availability Zones
- NAT Gateway: single NAT for private subnet internet access
- ALB: HTTP only (Cloudflare terminates SSL upstream)
Security / IAM
- ECS Execution Role: ECR pull + SSM parameter read
- ECS Task Role: S3 access + SES send
- SES: DKIM-configured for mysoftly.app (sandbox mode -- needs production access request)
Observability
- CloudWatch: ECS container logs
Cloudflare Setup
- DNS: mysoftly.app zone managed by Cloudflare
- SSL: Full mode -- Cloudflare terminates SSL for all subdomains
- Pages: Landing page, web app, and docs hosted on Cloudflare Pages (free tier)
- Proxy: All traffic proxied through Cloudflare (DDoS protection, CDN)
Terraform State
| Resource | Value |
|---|---|
| S3 bucket | softly-terraform-state |
| DynamoDB lock table | softly-terraform-locks |
| State file | production/terraform.tfstate |
CI/CD Pipelines
- Backend: Push to main -> GitHub Actions -> Build Docker image -> Push to ECR -> Update ECS service
- Infrastructure: Push to main (infrastructure changes) -> GitHub Actions -> Terraform plan/apply
- Static sites: Manual deploy via
wrangler pages deploy
How to Deploy
Backend (Rails API)
cd backend
docker build --platform linux/amd64 -t 239732221658.dkr.ecr.eu-west-2.amazonaws.com/softly-app:latest .
aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin 239732221658.dkr.ecr.eu-west-2.amazonaws.com
docker push 239732221658.dkr.ecr.eu-west-2.amazonaws.com/softly-app:latest
aws ecs update-service --cluster softly-production --service softly-production --force-new-deployment --region eu-west-2Landing Page (Astro)
cd landing && npm run build && wrangler pages deploy dist --project-name=softly-landing --commit-dirty=true --branch=mainDocumentation (VitePress)
cd docs && npm run build && wrangler pages deploy .vitepress/dist --project-name=softly-docs --commit-dirty=true --branch=mainInfrastructure (Terraform)
cd infrastructure && terraform plan && terraform applyEnvironment Variables (SSM Parameter Store)
| Parameter | Purpose |
|---|---|
/softly/production/database_url | PostgreSQL connection string |
/softly/production/rails_master_key | Rails credentials decryption |
/softly/production/anthropic_api_key | Claude API access |
/softly/production/deepgram_api_key | Speech-to-text |
/softly/production/revenuecat_api_key | Subscription management (placeholder) |
/softly/production/new_relic_license_key | APM monitoring (placeholder) |
Credentials and Auth
| Tool | Location | Details |
|---|---|---|
| AWS | ~/.aws/credentials | IAM user: softly-deploy, account 239732221658 |
| Cloudflare API | ~/.cloudflare/token | Custom API token with DNS/Zone/Pages edit |
| Wrangler CLI | ~/.wrangler | OAuth token for Cloudflare Pages deploys |
| Docker/ECR | Login before push | aws ecr get-login-password (see deploy commands above) |
Estimated Monthly Cost
| Service | Cost |
|---|---|
| ECS Fargate | ~$10-15 |
| RDS db.t3.micro | ~$15 |
| ElastiCache cache.t3.micro | ~$13 |
| NAT Gateway | ~$4 + data transfer |
| S3 | <$1 |
| ALB | ~$5 |
| Cloudflare (Pages, DNS, SSL) | Free |
| Total | ~$50-60/month |