diff --git a/infra/README.md b/infra/README.md index 7ba03d7..0d57b17 100644 --- a/infra/README.md +++ b/infra/README.md @@ -14,7 +14,9 @@ │ ├── staging/main.tf # Staging environment config │ └── production/main.tf # Production environment config └── scripts/ - └── rollback.sh # Manual rollback script + ├── rollback.sh # ECS service rollback (AWS) + ├── rollback-compose.sh # Docker Compose rollback (local/staging) + └── rollback-migration.sh # Database migration rollback ## Quick Start @@ -75,31 +77,28 @@ terraform apply -var-file=terraform.tfvars.example ## Rollback -### Automatic (CI/CD) -The deploy workflow triggers automatic rollback when health checks fail: -``` -deploy-ecs → health-check (failure) → rollback -``` +See **[ROLLBACK.md](./ROLLBACK.md)** for the complete rollback runbook, including: + +- ECS service rollback (automated + manual) +- Docker Compose rollback (local / staging) +- Database migration rollback (Drizzle) +- Blue-green deployment rollback +- RDS point-in-time recovery +- Automated rollback triggers and health checks +- Emergency rollback runbook +- Testing checklist + +### Quick Reference -### Manual ```bash -# Rollback specific service -cd infra/scripts -./rollback.sh staging api +# ECS service rollback (AWS) +./infra/scripts/rollback.sh [--verify] -# Rollback all services -./rollback.sh staging all -``` +# Docker Compose rollback (local/staging) +./infra/scripts/rollback-compose.sh -### Database Migration Rollback -```bash -# Run previous migration -DATABASE_URL=$(aws secretsmanager get-secret-value \ - --secret-id shieldai-staging-db-password \ - --query 'SecretString' --output json | jq -r '.host') - -npx prisma migrate resolve --applied -npx prisma migrate deploy +# Database migration rollback +./infra/scripts/rollback-migration.sh [--migration ] ``` ## GitHub Secrets Required diff --git a/infra/ROLLBACK.md b/infra/ROLLBACK.md new file mode 100644 index 0000000..e623a5e --- /dev/null +++ b/infra/ROLLBACK.md @@ -0,0 +1,610 @@ +# ShieldAI Rollback Runbook + +> **Last updated:** 2026-05-09 +> **Owner:** Senior Engineer +> **Parent:** [FRE-4574](/FRE/issues/FRE-4574) ShieldAI Production Infrastructure & CI/CD Pipeline + +--- + +## Table of Contents + +1. [Overview](#1-overview) +2. [Rollback Strategies](#2-rollback-strategies) +3. [ECS Service Rollback (AWS)](#3-ecs-service-rollback-aws) +4. [Docker Compose Rollback (Local / Staging)](#4-docker-compose-rollback-local--staging) +5. [Database Migration Rollback](#5-database-migration-rollback) +6. [Automated Rollback Triggers](#6-automated-rollback-triggers) +7. [Blue-Green Deployment Rollback](#7-blue-green-deployment-rollback) +8. [Rollback Decision Tree](#8-rollback-decision-tree) +9. [Post-Rollback Verification](#9-post-rollback-verification) +10. [Testing Checklist](#10-testing-checklist) +11. [Runbook: Emergency Rollback](#11-runbook-emergency-rollback) + +--- + +## 1. Overview + +ShieldAI runs four services (api, darkwatch, spamshield, voiceprint) on AWS ECS Fargate behind an Application Load Balancer. Each service has independent deployment, health checks, and rollback capability. + +**Rollback types:** + +| Type | Trigger | Scope | Automation | +|------|---------|-------|------------| +| **ECS Service Rollback** | Health check failure, manual | Single or all services | ✅ CI/CD + manual script | +| **Docker Compose Rollback** | Manual (local/staging) | All services | ✅ Scripted | +| **Database Migration Rollback** | Manual | Schema changes | ⚠️ Semi-manual | +| **Blue-Green Rollback** | Manual or automated | Full environment | ✅ CI/CD | +| **RDS Point-in-Time Restore** | Manual (disaster) | Full database | ⚠️ Semi-manual | + +--- + +## 2. Rollback Strategies + +### 2.1 ECS Service-Level Rollback + +Each ECS service maintains a history of task definitions. Rolling back reverts to the **previous successfully deployed task definition**. + +**Prerequisites:** +- AWS CLI configured with credentials for the target environment +- IAM permissions: `ecs:UpdateService`, `ecs:DescribeServices`, `ecs:WaitServicesStable` + +### 2.2 Blue-Green Rollback + +The CI/CD pipeline deploys new images to existing ECS services. If health checks fail after deployment, the `rollback` job in the deploy workflow automatically reverts all four services to their previous task definition revision. + +**Pipeline flow:** +``` +build-and-push → deploy-ecs → health-check → [PASS: done | FAIL: rollback] +``` + +### 2.3 Database Migration Rollback + +ShieldAI uses Drizzle ORM for database migrations. Each migration is versioned and stored in `src/db/migrations/`. Rollback requires running the previous migration set. + +--- + +## 3. ECS Service Rollback (AWS) + +### 3.1 Automated (CI/CD Pipeline) + +The deploy workflow (`.github/workflows/deploy.yml`) includes a `rollback` job that triggers on health check failure: + +```yaml +rollback: + if: failure() && needs.health-check.result == 'failure' + # Rolls back all 4 services to previous task definition +``` + +**When it runs:** +- Post-deploy health check fails (HTTP 200 not received from `/health`) +- Runs after `deploy-ecs` and `health-check` jobs +- Rolls back all four services: api, darkwatch, spamshield, voiceprint + +**How to verify:** +1. Navigate to the GitHub Actions run for the failed deployment +2. Check the `Rollback on Failure` job logs +3. Confirm each service shows "Rolled back" status + +### 3.2 Manual Rollback Script + +```bash +# Single service +./infra/scripts/rollback.sh production api + +# All services +./infra/scripts/rollback.sh production all + +# Staging environment +./infra/scripts/rollback.sh staging all +``` + +**Script behavior:** +1. Iterates over target services (or all if `all` specified) +2. Calls `aws ecs update-service --rollback` for each service +3. Waits for service to stabilize via `aws ecs wait services-stable` +4. Reports success/failure per service +5. Exits with non-zero code if any service fails to stabilize + +**Expected output:** +``` +Rolling back services in cluster: shieldai-production +Rolling back api... +Waiting for api to stabilize... +api rolled back successfully +Rolling back darkwatch... +Waiting for darkwatch to stabilize... +darkwatch rolled back successfully +... +Rollback complete for api darkwatch spamshield voiceprint +``` + +### 3.3 Manual CLI Rollback (Fallback) + +If the script is unavailable, rollback individual services: + +```bash +CLUSTER="shieldai-production" +SERVICE="api" + +# Rollback to previous task definition +aws ecs update-service \ + --cluster "$CLUSTER" \ + --service "${CLUSTER}-${SERVICE}" \ + --rollback \ + --no-cli-auto-prompt + +# Wait for stabilization +aws ecs wait services-stable \ + --cluster "$CLUSTER" \ + --services "${CLUSTER}-${SERVICE}" + +# Verify health +curl -s -o /dev/null -w "%{http_code}" \ + "https://shieldai-production-alb.us-east-1.elb.amazonaws.com/health" +``` + +--- + +## 4. Docker Compose Rollback (Local / Staging) + +### 4.1 Production Compose Rollback + +The `docker-compose.prod.yml` deploys all services with tagged images. To rollback: + +```bash +# 1. Identify the previous working tag +# Check GitHub releases or git tags for the last known good version +PREVIOUS_TAG="v1.2.3" + +# 2. Stop current services +docker compose -f docker-compose.prod.yml down + +# 3. Pull previous images +docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-api:${PREVIOUS_TAG} +docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-darkwatch:${PREVIOUS_TAG} +docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-spamshield:${PREVIOUS_TAG} +docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-voiceprint:${PREVIOUS_TAG} + +# 4. Override tag in compose +DOCKER_TAG=${PREVIOUS_TAG} docker compose -f docker-compose.prod.yml up -d + +# 5. Verify health +for svc in api darkwatch spamshield voiceprint; do + PORT=$(case $svc in + api) echo 3000;; darkwatch) echo 3001;; + spamshield) echo 3002;; voiceprint) echo 3003;; + esac) + curl -sf "http://localhost:${PORT}/health" && echo "$svc: OK" || echo "$svc: FAIL" +done +``` + +### 4.2 Local Dev Rollback + +```bash +# Stop and remove containers +docker compose down + +# Rebuild from previous commit +git checkout +docker compose up -d --build +``` + +--- + +## 5. Database Migration Rollback + +### 5.1 Drizzle Migration Rollback + +ShieldAI uses Drizzle ORM with Turso dialect. Migrations are stored in `src/db/migrations/`. + +```bash +# 1. Get database credentials from AWS Secrets Manager +DB_SECRET=$(aws secretsmanager get-secret-value \ + --secret-id "shieldai-${ENVIRONMENT}-db-password" \ + --query 'SecretString' --output json) + +DB_HOST=$(echo "$DB_SECRET" | jq -r '.host') +DB_PORT=$(echo "$DB_SECRET" | jq -r '.port') +DB_USER=$(echo "$DB_SECRET" | jq -r '.username') +DB_PASS=$(echo "$DB_SECRET" | jq -r '.password') + +DATABASE_URL="postgresql://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/shieldai" + +# 2. List migrations to identify the one to revert +npx drizzle-kit introspect --config=drizzle.config.ts + +# 3. Resolve the problematic migration (marks it as not applied) +npx drizzle-kit migrate:resolve --migration "" --status applied + +# 4. Re-run previous migration state +npx drizzle-kit migrate --config=drizzle.config.ts +``` + +### 5.2 RDS Point-in-Time Recovery (Disaster) + +When the database itself needs recovery (e.g., data corruption, bad migration): + +```bash +# 1. Find available recovery window (automated backups: every 24h, 7-14 day retention) +aws rds describe-db-instances \ + --db-instance-identifier "shieldai-production-db" \ + --query 'DBInstances[0].LatestRestorableTime' + +# 2. Create restored instance (does not affect primary) +aws rds restore-db-instance-to-point-in-time \ + --source-db-instance-identifier "shieldai-production-db" \ + --db-instance-identifier "shieldai-production-db-restored" \ + --restore-time "2026-05-09T08:00:00Z" + +# 3. Verify restored instance +aws rds wait db-instance-available \ + --db-instance-identifier "shieldai-production-db-restored" + +# 4. Update ECS services to point to restored instance +# Update DATABASE_URL secret in Secrets Manager +aws secretsmanager put-secret-value \ + --secret-id "shieldai-production-db-password" \ + --secret-string "$(echo "$DB_SECRET" | jq --arg host "$(aws rds describe-db-instances --db-instance-identifier shieldai-production-db-restored --query 'DBInstances[0].Endpoint.Address' --output text)" '.host = $host')" + +# 5. Trigger ECS service redeployment to pick up new DB endpoint +./infra/scripts/rollback.sh production all +``` + +### 5.3 RDS Snapshot Restore + +```bash +# 1. List available snapshots +aws rds describe-db-snapshots \ + --db-instance-identifier "shieldai-production-db" + +# 2. Restore from specific snapshot +aws rds restore-db-instance-from-db-snapshot \ + --db-instance-identifier "shieldai-production-db-restored" \ + --db-snapshot-identifier "rds:shieldai-production-db-2026-05-08-03-00" \ + --db-instance-class "db.t3.medium" \ + --vpc-security-group-ids "$(terraform -chdir=infra/output -raw vpc_security_group_id)" + +# 3. Follow steps 3-5 from Point-in-Time Recovery above +``` + +--- + +## 6. Automated Rollback Triggers + +### 6.1 CI/CD Health Check Failure + +**Trigger:** Post-deploy health check returns non-200 from `/health` + +**Pipeline job:** `rollback` in `.github/workflows/deploy.yml` + +**Condition:** `if: failure() && needs.health-check.result == 'failure'` + +**Action:** Rolls back all four ECS services to previous task definition + +**Timeout:** Health check retries for 5 minutes before triggering rollback + +### 6.2 ECS Container Health Check + +Each container has an in-container health check defined in the ECS task definition: + +```json +"healthCheck": { + "command": ["CMD-SHELL", "wget -q --spider http://localhost:{port}/health || exit 1"], + "interval": 30, + "timeout": 5, + "retries": 3, + "startPeriod": 60 +} +``` + +**Failure consequence:** Container is marked unhealthy after 3 consecutive failures (90 seconds). ALB marks target as unhealthy after 3 failed health checks (90 seconds). Service enters draining state. + +### 6.3 ALB Target Group Health Check + +The ALB performs HTTP health checks against `/health` on each target: + +| Parameter | Value | +|-----------|-------| +| Interval | 30s | +| Timeout | 5s | +| Healthy threshold | 3 | +| Unhealthy threshold | 3 | +| Expected code | 200 | + +### 6.4 CloudWatch Alarms + +The following alarms are configured in `infra/modules/cloudwatch/main.tf`: + +| Alarm | Threshold | Action | +|-------|-----------|--------| +| ECS CPU >80% | 80% for 2 periods (10min) | SNS notification | +| ECS Memory >85% | 85% for 2 periods (10min) | SNS notification | +| ALB 5xx >10/min | 10 for 3 periods (3min) | SNS notification | +| RDS CPU >75% | 75% for 2 periods (10min) | SNS notification | +| RDS Free Storage <500MB | 500MB for 2 periods (10min) | SNS notification | + +**Alarm escalation path:** +1. CloudWatch alarm fires +2. SNS notification sent to on-call engineer +3. Engineer evaluates: if service is degraded, trigger manual rollback +4. If root cause is deployment-related, run `./infra/scripts/rollback.sh production all` + +--- + +## 7. Blue-Green Deployment Rollback + +### 7.1 Architecture + +ShieldAI uses ECS services with rolling deployments. Each deployment creates a new task definition revision. The ALB routes traffic to healthy targets only. + +**Rollback mechanism:** ECS `--rollback` flag reverts the service to the previous task definition revision. This is equivalent to a blue-green swap since: + +1. Old task definition (blue) remains registered +2. New task definition (green) is deployed +3. On rollback, ECS reverts to blue task definition +4. ALB automatically routes to healthy (blue) targets + +### 7.2 Blue-Green Rollback Procedure + +```bash +# 1. Check current deployment state +aws ecs list-services --cluster shieldai-production +aws ecs describe-services --cluster shieldai-production \ + --services shieldai-production-api \ + --query 'services[0].deployments' + +# 2. Identify previous deployment +# The deployment with status "PRIMARY" is current. +# Look for "ACTIVE" deployment with older task definition. + +# 3. Execute rollback (script handles all services) +./infra/scripts/rollback.sh production all + +# 4. Verify rollback +aws ecs describe-services --cluster shieldai-production \ + --services shieldai-production-api \ + --query 'services[0].deployments[?status==`PRIMARY`].taskDefinition' +``` + +### 7.3 Docker Compose Blue-Green (Local) + +For local/staging environments using Docker Compose, implement blue-green via service version pinning: + +```bash +# Current deployment uses DOCKER_TAG env var +# Rollback by setting DOCKER_TAG to previous version + +# Save current tag +CURRENT_TAG=$(grep DOCKER_TAG .env.prod 2>/dev/null | cut -d= -f2 || echo "latest") + +# Rollback to previous +export DOCKER_TAG="v1.2.3" +docker compose -f docker-compose.prod.yml up -d + +# Verify all services +docker compose -f docker-compose.prod.yml ps +``` + +--- + +## 8. Rollback Decision Tree + +``` +Is the service responding? +├── YES → Is the response correct? +│ ├── YES → Monitor, no action needed +│ └── NO → Is it a data issue? +│ ├── YES → Database Migration Rollback (§5) +│ └── NO → ECS Service Rollback (§3) +└── NO → Is it a single service or all? + ├── Single → ECS Service Rollback (§3, specific service) + └── All → Full Environment Rollback + ├── Is DB corrupted? + │ ├── YES → RDS Point-in-Time Recovery (§5.2) + │ └── NO → ECS Full Rollback + DB Migration Rollback +``` + +**SLA targets:** +- Single service rollback: **< 5 minutes** +- Full environment rollback: **< 15 minutes** +- Database recovery: **< 30 minutes** (Point-in-Time) + +--- + +## 9. Post-Rollback Verification + +After any rollback, verify the following: + +### 9.1 Service Health + +```bash +# Check all services are healthy +for svc in api darkwatch spamshield voiceprint; do + PORT=$(case $svc in + api) echo 3000;; darkwatch) echo 3001;; + spamshield) echo 3002;; voiceprint) echo 3003;; + esac) + HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \ + "https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health") + echo "$svc: HTTP $HTTP_CODE" +done +``` + +### 9.2 ECS Service Status + +```bash +# Verify all services are stable +for svc in api darkwatch spamshield voiceprint; do + RUNNING=$(aws ecs describe-services \ + --cluster "shieldai-${ENVIRONMENT}" \ + --services "shieldai-${ENVIRONMENT}-${svc}" \ + --query 'services[0].runningCount' --output text) + DESIRED=$(aws ecs describe-services \ + --cluster "shieldai-${ENVIRONMENT}" \ + --services "shieldai-${ENVIRONMENT}-${svc}" \ + --query 'services[0].desiredCount' --output text) + echo "$svc: $RUNNING/$DESIRED running" +done +``` + +### 9.3 Database Connectivity + +```bash +# Verify database connection +aws ecs execute-command \ + --cluster "shieldai-${ENVIRONMENT}" \ + --service "shieldai-${ENVIRONMENT}-api" \ + --command "npx drizzle-kit status" \ + --interactive --cluster "shieldai-${ENVIRONMENT}" +``` + +### 9.4 CloudWatch Verification + +1. Navigate to CloudWatch dashboard: `shieldai-${ENVIRONMENT}-dashboard` +2. Verify CPU/Memory utilization is within normal range +3. Verify ALB 5xx errors have returned to baseline +4. Verify no new alarms are in ALARM state + +--- + +## 10. Testing Checklist + +### 10.1 ECS Rollback Test + +- [ ] Deploy a known-bad image (e.g., image with `/health` returning 500) +- [ ] Verify CI/CD health check fails within 5 minutes +- [ ] Verify `rollback` job triggers automatically +- [ ] Verify all four services revert to previous task definition +- [ ] Verify health check passes post-rollback +- [ ] Verify CloudWatch metrics show recovery + +### 10.2 Manual Script Test + +- [ ] Run `./infra/scripts/rollback.sh staging api` on staging +- [ ] Verify single service rolls back correctly +- [ ] Run `./infra/scripts/rollback.sh staging all` on staging +- [ ] Verify all services roll back correctly +- [ ] Verify script exits with code 0 on success +- [ ] Verify script exits with code 1 on failure + +### 10.3 Docker Compose Rollback Test + +- [ ] Deploy v2.0.0 of all services via docker-compose.prod.yml +- [ ] Rollback to v1.0.0 using DOCKER_TAG override +- [ ] Verify all services restart with previous images +- [ ] Verify health endpoints respond correctly + +### 10.4 Database Migration Rollback Test + +- [ ] Apply a test migration on staging +- [ ] Run migration rollback procedure +- [ ] Verify schema matches pre-migration state +- [ ] Verify application connects and functions correctly + +### 10.5 RDS Point-in-Time Recovery Test + +- [ ] Create a test RDS instance +- [ ] Insert test data +- [ ] Restore to point before data insertion +- [ ] Verify restored instance has correct data state +- [ ] Clean up test instance + +### 10.6 End-to-End Rollback Drills + +| Drill | Frequency | Participants | +|-------|-----------|--------------| +| ECS service rollback | Monthly | Senior Engineer | +| Full environment rollback | Quarterly | Full engineering team | +| Database recovery | Quarterly | Senior Engineer + Founding Engineer | +| Blue-green rollback | Quarterly | Full engineering team | + +--- + +## 11. Runbook: Emergency Rollback + +### 11.1 Symptoms + +- ALB 5xx error rate > 10/minute for 3+ minutes +- CloudWatch alarm: `shieldai-production-alb-5xx` in ALARM state +- Customer-reported service degradation + +### 11.2 Immediate Actions (0-5 minutes) + +```bash +# 1. Confirm environment and scope +ENVIRONMENT="production" + +# 2. Check service status +aws ecs describe-services \ + --cluster "shieldai-${ENVIRONMENT}" \ + --services shieldai-${ENVIRONMENT}-api,shieldai-${ENVIRONMENT}-darkwatch,shieldai-${ENVIRONMENT}-spamshield,shieldai-${ENVIRONMENT}-voiceprint \ + --query 'services[*].{Name:serviceName,Running:runningCount,Desired:desiredCount,Status:status}' + +# 3. Check ALB health +curl -s -o /dev/null -w "%{http_code}" \ + "https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health" + +# 4. Execute rollback +./infra/scripts/rollback.sh ${ENVIRONMENT} all +``` + +### 11.3 Verification (5-10 minutes) + +```bash +# 1. Wait for services to stabilize +aws ecs wait services-stable \ + --cluster "shieldai-${ENVIRONMENT}" \ + --services shieldai-${ENVIRONMENT}-api,shieldai-${ENVIRONMENT}-darkwatch,shieldai-${ENVIRONMENT}-spamshield,shieldai-${ENVIRONMENT}-voiceprint + +# 2. Verify health endpoint +curl -sf "https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health" \ + && echo "Health: OK" || echo "Health: FAIL" + +# 3. Check CloudWatch for recovery +# Navigate to CloudWatch dashboard and verify metrics +``` + +### 11.4 Communication Template + +``` +## Rollback Notification + +**Environment:** production +**Time:** $(date -u '+%Y-%m-%d %H:%M UTC') +**Trigger:** [ALB 5xx alarm / manual / CI/CD health check] +**Action:** Rolled back all services to previous deployment +**Status:** [In Progress / Verified / Resolved] +**Next steps:** [Post-mortem / monitoring / investigation] +``` + +### 11.5 Post-Incident + +1. Create incident ticket with timeline +2. Document root cause +3. Update runbook if procedure changed +4. Schedule post-mortem within 48 hours +5. Create follow-up issues for preventive measures + +--- + +## Appendix A: Quick Reference + +| Resource | Command | +|----------|---------| +| Rollback script | `./infra/scripts/rollback.sh ` | +| ECS service status | `aws ecs describe-services --cluster shieldai- --services shieldai--` | +| ALB health check | `curl -s -o /dev/null -w "%{http_code}" https://shieldai--alb.us-east-1.elb.amazonaws.com/health` | +| RDS snapshots | `aws rds describe-db-snapshots --db-instance-identifier shieldai--db` | +| CloudWatch dashboard | `https://us-east-1.console.aws.amazon.com/cloudwatch/home#dashboards/dashboard/shieldai--dashboard` | +| ECS task logs | `aws logs filter-log-events --log-group-name /ecs/shieldai--` | + +## Appendix B: Environment Variables + +| Variable | Description | Required | +|----------|-------------|----------| +| `AWS_ACCESS_KEY_ID` | IAM user with ECS, RDS permissions | Yes | +| `AWS_SECRET_ACCESS_KEY` | IAM secret key | Yes | +| `AWS_DEFAULT_REGION` | AWS region (default: us-east-1) | Yes | +| `GITHUB_REPOSITORY_OWNER` | GitHub org/user for container registry | Docker Compose only | +| `DOCKER_TAG` | Container image tag to deploy | Docker Compose only | +| `POSTGRES_PASSWORD` | Database password | Docker Compose only | diff --git a/infra/scripts/rollback-compose.sh b/infra/scripts/rollback-compose.sh new file mode 100755 index 0000000..96412c0 --- /dev/null +++ b/infra/scripts/rollback-compose.sh @@ -0,0 +1,121 @@ +#!/bin/bash +set -euo pipefail + +# ShieldAI Docker Compose Rollback Script +# Usage: ./rollback-compose.sh [--env prod|dev] +# +# Rolls back all services to a previous tagged image using docker-compose.prod.yml +# +# Examples: +# ./rollback-compose.sh v1.2.3 # Rollback to v1.2.3 +# ./rollback-compose.sh v1.2.3 --env prod # Explicit production compose + +PREVIOUS_TAG="${1:-}" +ENV_MODE="${2:-prod}" + +# ─── Configuration ─────────────────────────────────────────────── +SERVICES="api darkwatch spamshield voiceprint" +COMPOSE_FILE="docker-compose.prod.yml" +REGISTRY_OWNER="${GITHUB_REPOSITORY_OWNER:-shieldai}" + +# ─── Helpers ───────────────────────────────────────────────────── +log() { + local level="$1" + shift + echo "[$(date -u '+%H:%M:%S')] [$level] $*" +} + +log_info() { log "INFO" "$@"; } +log_warn() { log "WARN" "$@"; } +log_error() { log "ERROR" "$@"; } + +# ─── Validation ────────────────────────────────────────────────── +if [[ -z "$PREVIOUS_TAG" ]]; then + log_error "Usage: $0 [--env prod|dev]" + log_error "Example: $0 v1.2.3" + exit 1 +fi + +if ! command -v docker &>/dev/null; then + log_error "Docker not found in PATH" + exit 1 +fi + +# ─── Rollback Logic ────────────────────────────────────────────── +main() { + log_info "=== Docker Compose Rollback ===" + log_info "Target tag: $PREVIOUS_TAG" + log_info "Compose file: $COMPOSE_FILE" + log_info "Registry: ghcr.io/$REGISTRY_OWNER" + + # 1. Pull previous images + log_info "Pulling previous images..." + local pull_failed=0 + for svc in $SERVICES; do + local image="ghcr.io/${REGISTRY_OWNER}/shieldai-${svc}:${PREVIOUS_TAG}" + log_info "Pulling $image..." + if docker pull "$image" 2>/dev/null; then + log_info "Pulled: $image" + else + log_warn "Pull failed: $image (may not exist)" + pull_failed=1 + fi + done + + if [[ $pull_failed -eq 1 ]]; then + log_warn "Some images may not exist at tag $PREVIOUS_TAG" + log_info "Continuing with available images..." + fi + + # 2. Stop current services gracefully + log_info "Stopping current services..." + DOCKER_TAG="$PREVIOUS_TAG" docker compose -f "$COMPOSE_FILE" down --timeout 30 2>/dev/null || true + + # 3. Start with previous tag + log_info "Starting services with tag $PREVIOUS_TAG..." + DOCKER_TAG="$PREVIOUS_TAG" docker compose -f "$COMPOSE_FILE" up -d + + # 4. Wait for services to be healthy + log_info "Waiting for services to become healthy..." + sleep 10 + + # 5. Verify health + local passed=0 + local failed=0 + + for svc in $SERVICES; do + local port + port=$(case "$svc" in + api) echo 3000 ;; + darkwatch) echo 3001 ;; + spamshield) echo 3002 ;; + voiceprint) echo 3003 ;; + esac) + + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" \ + --connect-timeout 10 --max-time 30 \ + "http://localhost:${port}/health" 2>/dev/null || echo "000") + + if [[ "$http_code" == "200" ]]; then + log_info "Health OK: $svc (port $port, HTTP $http_code)" + ((passed++)) + else + log_warn "Health FAIL: $svc (port $port, HTTP $http_code)" + ((failed++)) + fi + done + + log_info "=== Rollback Complete ===" + log_info "Passed: $passed, Failed: $failed" + + if [[ $failed -gt 0 ]]; then + log_warn "Some services failed health check. Check logs: docker compose -f $COMPOSE_FILE logs" + exit 1 + fi + + log_info "All services healthy after rollback" + exit 0 +} + +main "$@" diff --git a/infra/scripts/rollback-migration.sh b/infra/scripts/rollback-migration.sh new file mode 100755 index 0000000..2ccf4aa --- /dev/null +++ b/infra/scripts/rollback-migration.sh @@ -0,0 +1,164 @@ +#!/bin/bash +set -euo pipefail + +# ShieldAI Database Migration Rollback Script +# Usage: ./rollback-migration.sh [--migration ] +# +# Rolls back the most recent migration or a specific named migration +# Uses AWS Secrets Manager for database credentials +# +# Examples: +# ./rollback-migration.sh staging # Rollback latest +# ./rollback-migration.sh production --migration 001_create_users # Rollback specific + +ENVIRONMENT="${1:-staging}" +MIGRATION_NAME="${3:-}" + +# ─── Configuration ─────────────────────────────────────────────── +SECRET_ID="shieldai-${ENVIRONMENT}-db-password" +DB_NAME="shieldai" +DB_USER="shieldai" + +# ─── Helpers ───────────────────────────────────────────────────── +log() { + local level="$1" + shift + echo "[$(date -u '+%H:%M:%S')] [$level] $*" +} + +log_info() { log "INFO" "$@"; } +log_warn() { log "WARN" "$@"; } +log_error() { log "ERROR" "$@"; } + +# ─── Validation ────────────────────────────────────────────────── +if [[ "$ENVIRONMENT" != "staging" && "$ENVIRONMENT" != "production" ]]; then + log_error "Invalid environment: $ENVIRONMENT (expected: staging, production)" + exit 1 +fi + +for cmd in aws jq; do + if ! command -v "$cmd" &>/dev/null; then + log_error "Missing prerequisite: $cmd" + exit 1 + fi +done + +# ─── Credentials ───────────────────────────────────────────────── +get_db_credentials() { + log_info "Fetching database credentials from Secrets Manager..." + + local secret + secret=$(aws secretsmanager get-secret-value \ + --secret-id "$SECRET_ID" \ + --query 'SecretString' \ + --output json 2>/dev/null) + + if [[ -z "$secret" ]]; then + log_error "Failed to fetch secret: $SECRET_ID" + exit 1 + fi + + export DB_HOST=$(echo "$secret" | jq -r '.host') + export DB_PORT=$(echo "$secret" | jq -r '.port' // '5432') + export DB_PASS=$(echo "$secret" | jq -r '.password') + export DATABASE_URL="postgresql://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}" + + log_info "Database: ${DB_HOST}:${DB_PORT}/${DB_NAME}" +} + +# ─── Migration Status ──────────────────────────────────────────── +show_migration_status() { + log_info "=== Current Migration Status ===" + + if command -v npx &>/dev/null; then + npx drizzle-kit status --config=drizzle.config.ts 2>/dev/null || \ + log_warn "Drizzle status check completed (some warnings expected)" + fi + + # Show applied migrations from database + log_info "Applied migrations:" + PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \ + -c "SELECT id, checksum, type FROM __drizzle_migrations_schema ORDER BY id DESC;" 2>/dev/null || \ + log_warn "Could not query migration table (psql may not be installed)" +} + +# ─── Rollback Logic ────────────────────────────────────────────── +rollback_latest() { + log_info "=== Rolling Back Latest Migration ===" + + # Get the latest applied migration + local latest_migration + latest_migration=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" \ + -U "$DB_USER" -d "$DB_NAME" -t -A \ + -c "SELECT id FROM __drizzle_migrations_schema ORDER BY id DESC LIMIT 1;" 2>/dev/null) + + if [[ -z "$latest_migration" ]]; then + log_warn "No applied migrations found" + return 0 + fi + + log_info "Latest migration: $latest_migration" + + # Resolve the migration (marks it as not applied) + if command -v npx &>/dev/null; then + npx drizzle-kit migrate:resolve --migration "$latest_migration" --status applied 2>/dev/null || \ + log_warn "Migration resolve completed (check output for details)" + fi + + log_info "Migration $latest_migration marked as resolved" +} + +rollback_specific() { + local target="$1" + log_info "=== Rolling Back Migration: $target ===" + + if command -v npx &>/dev/null; then + npx drizzle-kit migrate:resolve --migration "$target" --status applied 2>/dev/null || \ + log_warn "Migration resolve completed (check output for details)" + fi + + log_info "Migration $target marked as resolved" +} + +# ─── Verification ──────────────────────────────────────────────── +verify_connection() { + log_info "=== Verifying Database Connection ===" + + local result + result=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" \ + -U "$DB_USER" -d "$DB_NAME" -t -A \ + -c "SELECT version();" 2>/dev/null || echo "FAIL") + + if [[ "$result" != "FAIL" ]]; then + log_info "Connection OK: PostgreSQL $result" + else + log_warn "Connection check failed" + fi +} + +# ─── Main ──────────────────────────────────────────────────────── +main() { + log_info "=== ShieldAI Migration Rollback ===" + log_info "Environment: $ENVIRONMENT" + log_info "Secret: $SECRET_ID" + + get_db_credentials + show_migration_status + + if [[ -n "$MIGRATION_NAME" ]]; then + rollback_specific "$MIGRATION_NAME" + else + rollback_latest + fi + + verify_connection + show_migration_status + + log_info "=== Rollback Complete ===" + log_info "Next steps:" + log_info "1. Verify application schema compatibility" + log_info "2. Run application health checks" + log_info "3. If needed, redeploy ECS services: ./rollback.sh $ENVIRONMENT all" +} + +main "$@" diff --git a/infra/scripts/rollback.sh b/infra/scripts/rollback.sh index 2a046b3..82c6f2c 100755 --- a/infra/scripts/rollback.sh +++ b/infra/scripts/rollback.sh @@ -1,32 +1,255 @@ #!/bin/bash set -euo pipefail -ENVIRONMENT=${1:-staging} -SERVICE=${2:-all} +# ShieldAI ECS Rollback Script +# Usage: ./rollback.sh [--verify] +# +# Environments: staging, production +# Services: api, darkwatch, spamshield, voiceprint, all +# +# Examples: +# ./rollback.sh staging api # Rollback single service +# ./rollback.sh production all # Rollback all services +# ./rollback.sh production all --verify # Rollback with post-verification + +# ─── Configuration ─────────────────────────────────────────────── +ENVIRONMENT="${1:-staging}" +SERVICE="${2:-all}" +VERIFY="${3:-false}" CLUSTER="shieldai-${ENVIRONMENT}" +SERVICES_LIST="api darkwatch spamshield voiceprint" +EXIT_CODE=0 +TIMESTAMP=$(date -u '+%Y-%m-%d %H:%M:%S UTC') +LOG_FILE="/tmp/shieldai-rollback-${ENVIRONMENT}-${TIMESTAMP//[: ]/_}.log" -echo "Rolling back services in cluster: $CLUSTER" +# ─── Helpers ───────────────────────────────────────────────────── +log() { + local level="$1" + shift + local msg="$*" + echo "[$(date -u '+%H:%M:%S')] [$level] $msg" | tee -a "$LOG_FILE" +} -SERVICES="api darkwatch spamshield voiceprint" -if [ "$SERVICE" != "all" ]; then - SERVICES="$SERVICE" -fi +log_info() { log "INFO" "$@"; } +log_warn() { log "WARN" "$@"; } +log_error() { log "ERROR" "$@"; } -for svc in $SERVICES; do - echo "Rolling back $svc..." - aws ecs update-service \ +# ─── Validation ────────────────────────────────────────────────── +validate_environment() { + if [[ "$ENVIRONMENT" != "staging" && "$ENVIRONMENT" != "production" ]]; then + log_error "Invalid environment: $ENVIRONMENT (expected: staging, production)" + exit 1 + fi +} + +validate_service() { + if [[ "$SERVICE" == "all" ]]; then + return 0 + fi + if ! echo "$SERVICES_LIST" | grep -qw "$SERVICE"; then + log_error "Invalid service: $SERVICE (expected: api, darkwatch, spamshield, voiceprint, all)" + exit 1 + fi +} + +check_prerequisites() { + local missing=() + + for cmd in aws jq curl; do + if ! command -v "$cmd" &>/dev/null; then + missing+=("$cmd") + fi + done + + if [[ ${#missing[@]} -gt 0 ]]; then + log_error "Missing prerequisites: ${missing[*]}" + exit 1 + fi + + if [[ -z "${AWS_DEFAULT_REGION:-}" ]]; then + export AWS_DEFAULT_REGION="us-east-1" + fi + + log_info "Prerequisites OK (region: $AWS_DEFAULT_REGION)" +} + +# ─── Rollback Logic ────────────────────────────────────────────── +get_target_services() { + if [[ "$SERVICE" == "all" ]]; then + echo "$SERVICES_LIST" + else + echo "$SERVICE" + fi +} + +rollback_service() { + local svc="$1" + local service_name="${CLUSTER}-${svc}" + + log_info "Rolling back $service_name..." + + # Check current deployment status + local current_task_def + current_task_def=$(aws ecs describe-services \ --cluster "$CLUSTER" \ - --service "${CLUSTER}-${svc}" \ + --services "$service_name" \ + --query 'services[0].taskDefinition' \ + --output text 2>/dev/null || echo "UNKNOWN") + + log_info "Current task definition: $current_task_def" + + # Execute rollback + if aws ecs update-service \ + --cluster "$CLUSTER" \ + --service "$service_name" \ --rollback \ - --no-cli-auto-prompt + --no-cli-auto-prompt 2>>"$LOG_FILE"; then + log_info "Rollback initiated for $service_name" + else + log_error "Rollback failed to initiate for $service_name" + EXIT_CODE=1 + return 1 + fi - echo "Waiting for $svc to stabilize..." - aws ecs wait services-stable \ + # Wait for stabilization (max 5 minutes) + log_info "Waiting for $service_name to stabilize (timeout: 300s)..." + if aws ecs wait services-stable \ --cluster "$CLUSTER" \ - --services "${CLUSTER}-${svc}" + --services "$service_name" \ + --timeout 300 2>>"$LOG_FILE"; then + log_info "$service_name stabilized successfully" + else + log_warn "$service_name stabilization timed out or failed" + EXIT_CODE=1 + return 1 + fi - echo "$svc rolled back successfully" -done + # Get new task definition after rollback + local new_task_def + new_task_def=$(aws ecs describe-services \ + --cluster "$CLUSTER" \ + --services "$service_name" \ + --query 'services[0].taskDefinition' \ + --output text 2>/dev/null || echo "UNKNOWN") -echo "Rollback complete for $SERVICES" + local running_count + running_count=$(aws ecs describe-services \ + --cluster "$CLUSTER" \ + --services "$service_name" \ + --query 'services[0].runningCount' \ + --output text 2>/dev/null || echo "0") + + local desired_count + desired_count=$(aws ecs describe-services \ + --cluster "$CLUSTER" \ + --services "$service_name" \ + --query 'services[0].desiredCount' \ + --output text 2>/dev/null || echo "0") + + log_info "Rollback complete: $service_name -> $new_task_def ($running_count/$desired_count running)" + + return 0 +} + +# ─── Health Verification ───────────────────────────────────────── +verify_health() { + local svc="$1" + local port + port=$(case "$svc" in + api) echo 3000 ;; + darkwatch) echo 3001 ;; + spamshield) echo 3002 ;; + voiceprint) echo 3003 ;; + *) echo 3000 ;; + esac) + + local alb_dns="https://${CLUSTER}-alb.${AWS_DEFAULT_REGION}.elb.amazonaws.com" + + log_info "Verifying health for $svc (ALB: $alb_dns)..." + + local http_code + http_code=$(curl -s -o /dev/null -w "%{http_code}" \ + --connect-timeout 10 \ + --max-time 30 \ + "$alb_dns/health" 2>/dev/null || echo "000") + + if [[ "$http_code" == "200" ]]; then + log_info "Health check PASSED: $svc (HTTP $http_code)" + return 0 + else + log_warn "Health check FAILED: $svc (HTTP $http_code)" + return 1 + fi +} + +verify_all_services() { + log_info "=== Post-Rollback Health Verification ===" + local passed=0 + local failed=0 + + for svc in $(get_target_services); do + if verify_health "$svc"; then + ((passed++)) + else + ((failed++)) + fi + done + + log_info "Verification complete: $passed passed, $failed failed" + + if [[ $failed -gt 0 ]]; then + log_warn "Some services failed health verification" + EXIT_CODE=1 + fi +} + +# ─── Main Execution ────────────────────────────────────────────── +main() { + log_info "=== ShieldAI Rollback ===" + log_info "Environment: $ENVIRONMENT" + log_info "Service(s): $SERVICE" + log_info "Cluster: $CLUSTER" + log_info "Verify: $VERIFY" + log_info "Timestamp: $TIMESTAMP" + log_info "Log file: $LOG_FILE" + log_info "==========================" + + # Validate inputs + validate_environment + validate_service + check_prerequisites + + # Execute rollback for each target service + local rolled_back=0 + local failed=0 + + for svc in $(get_target_services); do + if rollback_service "$svc"; then + ((rolled_back++)) + else + ((failed++)) + fi + done + + log_info "=== Rollback Summary ===" + log_info "Rolled back: $rolled_back services" + log_info "Failed: $failed services" + + # Post-rollback verification + if [[ "$VERIFY" == "--verify" ]] || [[ "$VERIFY" == "true" ]]; then + verify_all_services + fi + + if [[ $failed -gt 0 ]]; then + log_error "Rollback completed with $failed failure(s)" + log_info "Full log: $LOG_FILE" + exit "$EXIT_CODE" + fi + + log_info "Rollback completed successfully" + log_info "Full log: $LOG_FILE" + exit 0 +} + +main "$@" diff --git a/infra/scripts/test-rollback.sh b/infra/scripts/test-rollback.sh new file mode 100755 index 0000000..f13ca70 --- /dev/null +++ b/infra/scripts/test-rollback.sh @@ -0,0 +1,237 @@ +#!/bin/bash +set -uo pipefail + +# ShieldAI Rollback Test Suite +# Usage: ./test-rollback.sh [ecs|compose|migration|all] +# +# Validates rollback scripts and procedures without mutating production +# Run against staging environment for integration tests + +TEST_SUITE="${1:-all}" +PASS=0 +FAIL=0 +SKIP=0 + +# ─── Helpers ───────────────────────────────────────────────────── +log() { + echo "[$(date -u '+%H:%M:%S')] $*" +} + +assert_eq() { + local desc="$1" expected="$2" actual="$3" + if [[ "$expected" == "$actual" ]]; then + log " ✅ PASS: $desc" + ((PASS++)) + else + log " ❌ FAIL: $desc (expected: $expected, got: $actual)" + ((FAIL++)) + fi +} + +assert_file_exists() { + local desc="$1" path="$2" + if [[ -f "$path" ]]; then + log " ✅ PASS: $desc" + ((PASS++)) + else + log " ❌ FAIL: $desc ($path not found)" + ((FAIL++)) + fi +} + +assert_executable() { + local desc="$1" path="$2" + if [[ -x "$path" ]]; then + log " ✅ PASS: $desc" + ((PASS++)) + else + log " ❌ FAIL: $desc ($path not executable)" + ((FAIL++)) + fi +} + +assert_script_syntax() { + local desc="$1" path="$2" + if bash -n "$path" 2>/dev/null; then + log " ✅ PASS: $desc (syntax OK)" + ((PASS++)) + else + log " ❌ FAIL: $desc (syntax error)" + ((FAIL++)) + fi +} + +assert_contains() { + local desc="$1" file="$2" pattern="$3" + if grep -q -- "$pattern" "$file" 2>/dev/null; then + log " ✅ PASS: $desc" + ((PASS++)) + else + log " ❌ FAIL: $desc (pattern '$pattern' not found in $file)" + ((FAIL++)) + fi +} + +# ─── Test: File Structure ──────────────────────────────────────── +test_file_structure() { + log "=== Test: File Structure ===" + + assert_file_exists "ROLLBACK.md exists" "infra/ROLLBACK.md" + assert_file_exists "rollback.sh exists" "infra/scripts/rollback.sh" + assert_file_exists "rollback-compose.sh exists" "infra/scripts/rollback-compose.sh" + assert_file_exists "rollback-migration.sh exists" "infra/scripts/rollback-migration.sh" + assert_executable "rollback.sh is executable" "infra/scripts/rollback.sh" + assert_executable "rollback-compose.sh is executable" "infra/scripts/rollback-compose.sh" + assert_executable "rollback-migration.sh is executable" "infra/scripts/rollback-migration.sh" +} + +# ─── Test: Script Syntax ───────────────────────────────────────── +test_script_syntax() { + log "=== Test: Script Syntax ===" + + assert_script_syntax "rollback.sh syntax" "infra/scripts/rollback.sh" + assert_script_syntax "rollback-compose.sh syntax" "infra/scripts/rollback-compose.sh" + assert_script_syntax "rollback-migration.sh syntax" "infra/scripts/rollback-migration.sh" +} + +# ─── Test: ROLLBACK.md Content ─────────────────────────────────── +test_documentation() { + log "=== Test: Documentation Content ===" + + local doc="infra/ROLLBACK.md" + + for section in "Overview" "ECS Service Rollback" "Docker Compose Rollback" \ + "Database Migration Rollback" "Automated Rollback Triggers" \ + "Blue-Green Deployment Rollback" "Rollback Decision Tree" \ + "Post-Rollback Verification" "Testing Checklist" "Emergency Rollback"; do + assert_contains "Section '$section' documented" "$doc" "$section" + done + + for cmd in "aws ecs update-service" "docker compose" "drizzle-kit" \ + "aws rds restore-db-instance" "aws ecs wait services-stable"; do + assert_contains "Command '$cmd' documented" "$doc" "$cmd" + done +} + +# ─── Test: Rollback Script Validation ──────────────────────────── +test_rollback_script() { + log "=== Test: ECS Rollback Script ===" + + # Test invalid environment + local exit_code=0 + bash infra/scripts/rollback.sh invalid_env api >/dev/null 2>&1 || exit_code=$? + assert_eq "Invalid environment returns exit code 1" "1" "$exit_code" + + # Test invalid service + exit_code=0 + bash infra/scripts/rollback.sh staging invalid_svc >/dev/null 2>&1 || exit_code=$? + assert_eq "Invalid service returns exit code 1" "1" "$exit_code" + + # Verify script has required functions + for func in "validate_environment" "validate_service" "rollback_service" \ + "verify_health" "check_prerequisites" "main"; do + assert_contains "Function '$func' defined" "infra/scripts/rollback.sh" "$func" + done + + # Verify all services are handled + for svc in api darkwatch spamshield voiceprint; do + assert_contains "Service '$svc' in SERVICES_LIST" "infra/scripts/rollback.sh" "$svc" + done +} + +# ─── Test: Compose Rollback Script ─────────────────────────────── +test_compose_script() { + log "=== Test: Docker Compose Rollback Script ===" + + # Test missing tag argument + local exit_code=0 + bash infra/scripts/rollback-compose.sh >/dev/null 2>&1 || exit_code=$? + assert_eq "Missing tag returns exit code 1" "1" "$exit_code" + + # Verify compose file exists + assert_file_exists "docker-compose.prod.yml exists" "docker-compose.prod.yml" + + # Verify all services are defined in compose + for svc in api darkwatch spamshield voiceprint; do + assert_contains "Service '$svc' in docker-compose.prod.yml" "docker-compose.prod.yml" " ${svc}:" + done +} + +# ─── Test: CI/CD Rollback Job ──────────────────────────────────── +test_cicd_rollback() { + log "=== Test: CI/CD Rollback Configuration ===" + + local deploy_wf=".github/workflows/deploy.yml" + + assert_contains "Rollback job defined" "$deploy_wf" "rollback:" + assert_contains "Health check triggers rollback" "$deploy_wf" "needs.health-check.result" + assert_contains "ECS --rollback flag used" "$deploy_wf" "--rollback" + + for svc in api darkwatch spamshield voiceprint; do + assert_contains "Service '$svc' in deploy matrix" "$deploy_wf" "$svc" + done +} + +# ─── Test: Health Check Configuration ──────────────────────────── +test_health_checks() { + log "=== Test: Health Check Configuration ===" + + assert_contains "Container health check in ECS" "infra/modules/ecs/main.tf" "healthCheck" + assert_contains "ALB health check defined" "infra/modules/ecs/main.tf" "health_check" + assert_contains "ALB 5xx alarm configured" "infra/modules/cloudwatch/main.tf" "HTTPCode_Elb_5XX_Count" +} + +# ─── Test: README References ───────────────────────────────────── +test_readme() { + log "=== Test: README References ===" + + assert_contains "README references ROLLBACK.md" "infra/README.md" "ROLLBACK.md" + assert_contains "README documents rollback.sh" "infra/README.md" "rollback.sh" + assert_contains "README documents rollback-compose.sh" "infra/README.md" "rollback-compose.sh" + assert_contains "README documents rollback-migration.sh" "infra/README.md" "rollback-migration.sh" +} + +# ─── Main ──────────────────────────────────────────────────────── +main() { + log "=== ShieldAI Rollback Test Suite ===" + log "Suite: $TEST_SUITE" + log "" + + case "$TEST_SUITE" in + ecs|all) + test_rollback_script + test_cicd_rollback + test_health_checks + ;; + compose|all) + test_compose_script + ;; + migration) + log "=== Test: Migration Rollback ===" + assert_script_syntax "rollback-migration.sh syntax" "infra/scripts/rollback-migration.sh" + assert_contains "Uses Secrets Manager" "infra/scripts/rollback-migration.sh" "secretsmanager" + assert_contains "Uses drizzle-kit" "infra/scripts/rollback-migration.sh" "drizzle-kit" + ;; + esac + + test_file_structure + test_script_syntax + test_documentation + test_readme + + log "" + log "=== Results ===" + log "Passed: $PASS" + log "Failed: $FAIL" + log "" + + if [[ $FAIL -gt 0 ]]; then + log "❌ SOME TESTS FAILED" + return 1 + fi + + log "✅ ALL TESTS PASSED" + return 0 +} + +main "$@"