Files
ShieldAI/infra
Senior Engineer a0799c0647 Add Terraform AWS infrastructure and enhanced CI/CD pipeline (FRE-4574)
- Terraform modules: VPC, ECS Fargate, RDS PostgreSQL, ElastiCache Redis, S3, Secrets Manager, CloudWatch
- Multi-environment support: staging and production configs
- ECS auto-scaling: CPU-based scaling with configurable min/max
- CI/CD: pnpm caching, Docker Buildx, Trivy security scanning, Terraform plan on PR
- Deploy: ECS service updates with automatic rollback on health check failure
- Backup: automated RDS snapshots, S3 versioning, ElastiCache snapshots
- Monitoring: CloudWatch dashboards, CPU/memory/5xx alarms
- Rollback script for manual service rollback
- Infrastructure documentation with architecture overview
2026-05-08 02:54:39 -04:00
..

/infra/ ├── main.tf # Root module: VPC, ECS, RDS, ElastiCache, S3, Secrets, CloudWatch ├── variables.tf # Input variables with validation ├── outputs.tf # Output values (endpoints, ARNs, URLs) ├── modules/ │ ├── vpc/main.tf # VPC, subnets, IGW, NAT GW, security groups │ ├── ecs/main.tf # ECS cluster, task definitions, services, ALB, auto-scaling │ ├── rds/main.tf # RDS PostgreSQL with automated backups │ ├── elasticache/main.tf # ElastiCache Redis with replication │ ├── s3/main.tf # S3 buckets: state, artifacts, logs │ ├── secrets/main.tf # AWS Secrets Manager │ └── cloudwatch/main.tf # Dashboards, alarms, notifications ├── environments/ │ ├── staging/main.tf # Staging environment config │ └── production/main.tf # Production environment config └── scripts/ └── rollback.sh # Manual rollback script

Quick Start

Prerequisites

  • Terraform >= 1.5.0
  • AWS CLI configured with appropriate credentials
  • AWS account with ECS, RDS, ElastiCache permissions

Initialize

cd infra/environments/staging
terraform init
terraform plan -var-file=terraform.tfvars.example
terraform apply -var-file=terraform.tfvars.example

Deploy via CI/CD

  • Push to main → deploys to staging
  • Create a release → deploys to production
  • Health check failure → automatic rollback

Architecture

Networking

  • VPC with public/private subnets across multiple AZs
  • NAT Gateway for outbound traffic from private subnets
  • Security groups: ECS → RDS (5432), ECS → ElastiCache (6379)

Compute

  • ECS Fargate for serverless container orchestration
  • Application Load Balancer with health checks
  • Auto-scaling: CPU-based scaling (70% target)
  • Production: 3 replicas per service, min 2, max 10

Data

  • RDS PostgreSQL 16.2 with Multi-AZ (production)
  • Automated daily backups, 7-14 day retention
  • ElastiCache Redis 7.0 with replication
  • S3 with versioning and lifecycle policies

Secrets

  • AWS Secrets Manager for all credentials
  • ECS task execution role with SecretsManagerReadOnly
  • DB credentials auto-rotated via RDS integration

Monitoring

  • CloudWatch dashboards: CPU, memory, ALB metrics
  • Alarms: CPU >80%, memory >85%, 5xx >10/min, RDS storage <500MB
  • Container Insights enabled for ECS
  • Logs: 30-day retention (production), 7-day (staging)

Backup Strategy

  • RDS: automated snapshots every 24h, 7-14 day retention
  • RDS: Multi-AZ for automatic failover (production)
  • ElastiCache: daily snapshots, 1-7 day retention
  • S3: versioning enabled, non-current versions expire after 30 days
  • Terraform state: S3 with versioning + DynamoDB locking

Rollback

Automatic (CI/CD)

The deploy workflow triggers automatic rollback when health checks fail:

deploy-ecs → health-check (failure) → rollback

Manual

# Rollback specific service
cd infra/scripts
./rollback.sh staging api

# Rollback all services
./rollback.sh staging all

Database Migration Rollback

# Run previous migration
DATABASE_URL=$(aws secretsmanager get-secret-value \
  --secret-id shieldai-staging-db-password \
  --query 'SecretString' --output json | jq -r '.host')

npx prisma migrate resolve --applied <migration_name>
npx prisma migrate deploy

GitHub Secrets Required

Secret Description
AWS_ACCESS_KEY_ID IAM user with ECS, RDS, ElastiCache permissions
AWS_SECRET_ACCESS_KEY IAM secret key
HIBP_API_KEY Have I Been Pwned API key
RESEND_API_KEY Resend email API key
SENTRY_DSN Sentry error tracking DSN
DATADOG_API_KEY Datadog monitoring API key
GITHUB_TOKEN Auto-provided, needs write:packages scope