/infra/ ├── main.tf # Root module: VPC, ECS, RDS, ElastiCache, S3, Secrets, CloudWatch ├── variables.tf # Input variables with validation ├── outputs.tf # Output values (endpoints, ARNs, URLs) ├── modules/ │ ├── vpc/main.tf # VPC, subnets, IGW, NAT GW, security groups │ ├── ecs/main.tf # ECS cluster, task definitions, services, ALB, auto-scaling │ ├── rds/main.tf # RDS PostgreSQL with automated backups │ ├── elasticache/main.tf # ElastiCache Redis with replication │ ├── s3/main.tf # S3 buckets: state, artifacts, logs │ ├── secrets/main.tf # AWS Secrets Manager │ └── cloudwatch/main.tf # Dashboards, alarms, notifications ├── environments/ │ ├── staging/main.tf # Staging environment config │ └── production/main.tf # Production environment config └── scripts/ ├── rollback.sh # ECS service rollback (AWS) ├── rollback-compose.sh # Docker Compose rollback (local/staging) └── rollback-migration.sh # Database migration rollback ## Quick Start ### Prerequisites - Terraform >= 1.5.0 - AWS CLI configured with appropriate credentials - AWS account with ECS, RDS, ElastiCache permissions ### Initialize ```bash cd infra/environments/staging terraform init terraform plan -var-file=terraform.tfvars.example terraform apply -var-file=terraform.tfvars.example ``` ### Deploy via CI/CD - Push to `main` → deploys to staging - Create a release → deploys to production - Health check failure → automatic rollback ## Architecture ### Networking - VPC with public/private subnets across multiple AZs - NAT Gateway for outbound traffic from private subnets - Security groups: ECS → RDS (5432), ECS → ElastiCache (6379) ### Compute - ECS Fargate for serverless container orchestration - Application Load Balancer with health checks - Auto-scaling: CPU-based scaling (70% target) - Production: 3 replicas per service, min 2, max 10 ### Data - RDS PostgreSQL 16.2 with Multi-AZ (production) - Automated daily backups, 7-14 day retention - ElastiCache Redis 7.0 with replication - S3 with versioning and lifecycle policies ### Secrets - AWS Secrets Manager for all credentials - ECS task execution role with SecretsManagerReadOnly - DB credentials auto-rotated via RDS integration ### Monitoring - CloudWatch dashboards: CPU, memory, ALB metrics - Alarms: CPU >80%, memory >85%, 5xx >10/min, RDS storage <500MB - Container Insights enabled for ECS - Logs: 30-day retention (production), 7-day (staging) ### Backup Strategy - RDS: automated snapshots every 24h, 7-14 day retention - RDS: Multi-AZ for automatic failover (production) - ElastiCache: daily snapshots, 1-7 day retention - S3: versioning enabled, non-current versions expire after 30 days - Terraform state: S3 with versioning + DynamoDB locking ## Rollback See **[ROLLBACK.md](./ROLLBACK.md)** for the complete rollback runbook, including: - ECS service rollback (automated + manual) - Docker Compose rollback (local / staging) - Database migration rollback (Drizzle) - Blue-green deployment rollback - RDS point-in-time recovery - Automated rollback triggers and health checks - Emergency rollback runbook - Testing checklist ### Quick Reference ```bash # ECS service rollback (AWS) ./infra/scripts/rollback.sh [--verify] # Docker Compose rollback (local/staging) ./infra/scripts/rollback-compose.sh # Database migration rollback ./infra/scripts/rollback-migration.sh [--migration ] ``` ## GitHub Secrets Required | Secret | Description | |--------|-------------| | AWS_ACCESS_KEY_ID | IAM user with ECS, RDS, ElastiCache permissions | | AWS_SECRET_ACCESS_KEY | IAM secret key | | HIBP_API_KEY | Have I Been Pwned API key | | RESEND_API_KEY | Resend email API key | | SENTRY_DSN | Sentry error tracking DSN | | DATADOG_API_KEY | Datadog monitoring API key | | GITHUB_TOKEN | Auto-provided, needs write:packages scope |