- infra/ROLLBACK.md: comprehensive rollback runbook with ECS, Docker Compose, database migration, blue-green, and emergency rollback procedures - infra/scripts/rollback.sh: enhanced ECS rollback with validation, logging, health verification, and per-service rollback support - infra/scripts/rollback-compose.sh: Docker Compose rollback for local/staging - infra/scripts/rollback-migration.sh: Drizzle migration rollback with AWS Secrets Manager integration - infra/scripts/test-rollback.sh: automated test suite (51 tests) - Updated infra/README.md to reference ROLLBACK.md Co-Authored-By: Paperclip <noreply@paperclip.ing>
114 lines
4.0 KiB
Markdown
114 lines
4.0 KiB
Markdown
/infra/
|
|
├── main.tf # Root module: VPC, ECS, RDS, ElastiCache, S3, Secrets, CloudWatch
|
|
├── variables.tf # Input variables with validation
|
|
├── outputs.tf # Output values (endpoints, ARNs, URLs)
|
|
├── modules/
|
|
│ ├── vpc/main.tf # VPC, subnets, IGW, NAT GW, security groups
|
|
│ ├── ecs/main.tf # ECS cluster, task definitions, services, ALB, auto-scaling
|
|
│ ├── rds/main.tf # RDS PostgreSQL with automated backups
|
|
│ ├── elasticache/main.tf # ElastiCache Redis with replication
|
|
│ ├── s3/main.tf # S3 buckets: state, artifacts, logs
|
|
│ ├── secrets/main.tf # AWS Secrets Manager
|
|
│ └── cloudwatch/main.tf # Dashboards, alarms, notifications
|
|
├── environments/
|
|
│ ├── staging/main.tf # Staging environment config
|
|
│ └── production/main.tf # Production environment config
|
|
└── scripts/
|
|
├── rollback.sh # ECS service rollback (AWS)
|
|
├── rollback-compose.sh # Docker Compose rollback (local/staging)
|
|
└── rollback-migration.sh # Database migration rollback
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
- Terraform >= 1.5.0
|
|
- AWS CLI configured with appropriate credentials
|
|
- AWS account with ECS, RDS, ElastiCache permissions
|
|
|
|
### Initialize
|
|
```bash
|
|
cd infra/environments/staging
|
|
terraform init
|
|
terraform plan -var-file=terraform.tfvars.example
|
|
terraform apply -var-file=terraform.tfvars.example
|
|
```
|
|
|
|
### Deploy via CI/CD
|
|
- Push to `main` → deploys to staging
|
|
- Create a release → deploys to production
|
|
- Health check failure → automatic rollback
|
|
|
|
## Architecture
|
|
|
|
### Networking
|
|
- VPC with public/private subnets across multiple AZs
|
|
- NAT Gateway for outbound traffic from private subnets
|
|
- Security groups: ECS → RDS (5432), ECS → ElastiCache (6379)
|
|
|
|
### Compute
|
|
- ECS Fargate for serverless container orchestration
|
|
- Application Load Balancer with health checks
|
|
- Auto-scaling: CPU-based scaling (70% target)
|
|
- Production: 3 replicas per service, min 2, max 10
|
|
|
|
### Data
|
|
- RDS PostgreSQL 16.2 with Multi-AZ (production)
|
|
- Automated daily backups, 7-14 day retention
|
|
- ElastiCache Redis 7.0 with replication
|
|
- S3 with versioning and lifecycle policies
|
|
|
|
### Secrets
|
|
- AWS Secrets Manager for all credentials
|
|
- ECS task execution role with SecretsManagerReadOnly
|
|
- DB credentials auto-rotated via RDS integration
|
|
|
|
### Monitoring
|
|
- CloudWatch dashboards: CPU, memory, ALB metrics
|
|
- Alarms: CPU >80%, memory >85%, 5xx >10/min, RDS storage <500MB
|
|
- Container Insights enabled for ECS
|
|
- Logs: 30-day retention (production), 7-day (staging)
|
|
|
|
### Backup Strategy
|
|
- RDS: automated snapshots every 24h, 7-14 day retention
|
|
- RDS: Multi-AZ for automatic failover (production)
|
|
- ElastiCache: daily snapshots, 1-7 day retention
|
|
- S3: versioning enabled, non-current versions expire after 30 days
|
|
- Terraform state: S3 with versioning + DynamoDB locking
|
|
|
|
## Rollback
|
|
|
|
See **[ROLLBACK.md](./ROLLBACK.md)** for the complete rollback runbook, including:
|
|
|
|
- ECS service rollback (automated + manual)
|
|
- Docker Compose rollback (local / staging)
|
|
- Database migration rollback (Drizzle)
|
|
- Blue-green deployment rollback
|
|
- RDS point-in-time recovery
|
|
- Automated rollback triggers and health checks
|
|
- Emergency rollback runbook
|
|
- Testing checklist
|
|
|
|
### Quick Reference
|
|
|
|
```bash
|
|
# ECS service rollback (AWS)
|
|
./infra/scripts/rollback.sh <environment> <service|all> [--verify]
|
|
|
|
# Docker Compose rollback (local/staging)
|
|
./infra/scripts/rollback-compose.sh <previous_tag>
|
|
|
|
# Database migration rollback
|
|
./infra/scripts/rollback-migration.sh <environment> [--migration <name>]
|
|
```
|
|
|
|
## GitHub Secrets Required
|
|
| Secret | Description |
|
|
|--------|-------------|
|
|
| AWS_ACCESS_KEY_ID | IAM user with ECS, RDS, ElastiCache permissions |
|
|
| AWS_SECRET_ACCESS_KEY | IAM secret key |
|
|
| HIBP_API_KEY | Have I Been Pwned API key |
|
|
| RESEND_API_KEY | Resend email API key |
|
|
| SENTRY_DSN | Sentry error tracking DSN |
|
|
| DATADOG_API_KEY | Datadog monitoring API key |
|
|
| GITHUB_TOKEN | Auto-provided, needs write:packages scope |
|