feat: establish unified project foundation with root config cleanup
- Archive legacy packages/, services/, server/ directories - Update pnpm workspace to web + browser-ext - Simplify root package.json scripts to delegate to web/ - Update turbo.json for new workspace structure - Remove obsolete root config files (vite, tsconfig, etc.) - Add .nvmrc, .editorconfig for consistent dev environment - Update CI workflow to remove references to deleted packages - Add missing dependencies (@tailwindcss/vite, tailwindcss) to web - Add test and lint scripts to web package - Verify pnpm install, build, and dev work correctly
This commit is contained in:
9
infra/.gitignore
vendored
9
infra/.gitignore
vendored
@@ -1,9 +0,0 @@
|
||||
.terraform/
|
||||
*.tfstate
|
||||
*.tfstate.backup
|
||||
*.tfvars
|
||||
.terraform.lock.hcl
|
||||
override.tf
|
||||
override.tf.json
|
||||
*_override.tf
|
||||
*_override.tf.json
|
||||
113
infra/README.md
113
infra/README.md
@@ -1,113 +0,0 @@
|
||||
/infra/
|
||||
├── main.tf # Root module: VPC, ECS, RDS, ElastiCache, S3, Secrets, CloudWatch
|
||||
├── variables.tf # Input variables with validation
|
||||
├── outputs.tf # Output values (endpoints, ARNs, URLs)
|
||||
├── modules/
|
||||
│ ├── vpc/main.tf # VPC, subnets, IGW, NAT GW, security groups
|
||||
│ ├── ecs/main.tf # ECS cluster, task definitions, services, ALB, auto-scaling
|
||||
│ ├── rds/main.tf # RDS PostgreSQL with automated backups
|
||||
│ ├── elasticache/main.tf # ElastiCache Redis with replication
|
||||
│ ├── s3/main.tf # S3 buckets: state, artifacts, logs
|
||||
│ ├── secrets/main.tf # AWS Secrets Manager
|
||||
│ └── cloudwatch/main.tf # Dashboards, alarms, notifications
|
||||
├── environments/
|
||||
│ ├── staging/main.tf # Staging environment config
|
||||
│ └── production/main.tf # Production environment config
|
||||
└── scripts/
|
||||
├── rollback.sh # ECS service rollback (AWS)
|
||||
├── rollback-compose.sh # Docker Compose rollback (local/staging)
|
||||
└── rollback-migration.sh # Database migration rollback
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
- Terraform >= 1.5.0
|
||||
- AWS CLI configured with appropriate credentials
|
||||
- AWS account with ECS, RDS, ElastiCache permissions
|
||||
|
||||
### Initialize
|
||||
```bash
|
||||
cd infra/environments/staging
|
||||
terraform init
|
||||
terraform plan -var-file=terraform.tfvars.example
|
||||
terraform apply -var-file=terraform.tfvars.example
|
||||
```
|
||||
|
||||
### Deploy via CI/CD
|
||||
- Push to `main` → deploys to staging
|
||||
- Create a release → deploys to production
|
||||
- Health check failure → automatic rollback
|
||||
|
||||
## Architecture
|
||||
|
||||
### Networking
|
||||
- VPC with public/private subnets across multiple AZs
|
||||
- NAT Gateway for outbound traffic from private subnets
|
||||
- Security groups: ECS → RDS (5432), ECS → ElastiCache (6379)
|
||||
|
||||
### Compute
|
||||
- ECS Fargate for serverless container orchestration
|
||||
- Application Load Balancer with health checks
|
||||
- Auto-scaling: CPU-based scaling (70% target)
|
||||
- Production: 3 replicas per service, min 2, max 10
|
||||
|
||||
### Data
|
||||
- RDS PostgreSQL 16.2 with Multi-AZ (production)
|
||||
- Automated daily backups, 7-14 day retention
|
||||
- ElastiCache Redis 7.0 with replication
|
||||
- S3 with versioning and lifecycle policies
|
||||
|
||||
### Secrets
|
||||
- AWS Secrets Manager for all credentials
|
||||
- ECS task execution role with SecretsManagerReadOnly
|
||||
- DB credentials auto-rotated via RDS integration
|
||||
|
||||
### Monitoring
|
||||
- CloudWatch dashboards: CPU, memory, ALB metrics
|
||||
- Alarms: CPU >80%, memory >85%, 5xx >10/min, RDS storage <500MB
|
||||
- Container Insights enabled for ECS
|
||||
- Logs: 30-day retention (production), 7-day (staging)
|
||||
|
||||
### Backup Strategy
|
||||
- RDS: automated snapshots every 24h, 7-14 day retention
|
||||
- RDS: Multi-AZ for automatic failover (production)
|
||||
- ElastiCache: daily snapshots, 1-7 day retention
|
||||
- S3: versioning enabled, non-current versions expire after 30 days
|
||||
- Terraform state: S3 with versioning + DynamoDB locking
|
||||
|
||||
## Rollback
|
||||
|
||||
See **[ROLLBACK.md](./ROLLBACK.md)** for the complete rollback runbook, including:
|
||||
|
||||
- ECS service rollback (automated + manual)
|
||||
- Docker Compose rollback (local / staging)
|
||||
- Database migration rollback (Drizzle)
|
||||
- Blue-green deployment rollback
|
||||
- RDS point-in-time recovery
|
||||
- Automated rollback triggers and health checks
|
||||
- Emergency rollback runbook
|
||||
- Testing checklist
|
||||
|
||||
### Quick Reference
|
||||
|
||||
```bash
|
||||
# ECS service rollback (AWS)
|
||||
./infra/scripts/rollback.sh <environment> <service|all> [--verify]
|
||||
|
||||
# Docker Compose rollback (local/staging)
|
||||
./infra/scripts/rollback-compose.sh <previous_tag>
|
||||
|
||||
# Database migration rollback
|
||||
./infra/scripts/rollback-migration.sh <environment> [--migration <name>]
|
||||
```
|
||||
|
||||
## GitHub Secrets Required
|
||||
| Secret | Description |
|
||||
|--------|-------------|
|
||||
| AWS_ACCESS_KEY_ID | IAM user with ECS, RDS, ElastiCache permissions |
|
||||
| AWS_SECRET_ACCESS_KEY | IAM secret key |
|
||||
| HIBP_API_KEY | Have I Been Pwned API key |
|
||||
| RESEND_API_KEY | Resend email API key |
|
||||
| SENTRY_DSN | Sentry error tracking DSN |
|
||||
| DATADOG_API_KEY | Datadog monitoring API key |
|
||||
| GITHUB_TOKEN | Auto-provided, needs write:packages scope |
|
||||
@@ -1,611 +0,0 @@
|
||||
# ShieldAI Rollback Runbook
|
||||
|
||||
> **Last updated:** 2026-05-12
|
||||
> **Owner:** Senior Engineer
|
||||
> **Parent:** [FRE-4574](/FRE/issues/FRE-4574) ShieldAI Production Infrastructure & CI/CD Pipeline
|
||||
> **Reviewed by:** Code Reviewer (FRE-4808) on 2026-05-12
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#1-overview)
|
||||
2. [Rollback Strategies](#2-rollback-strategies)
|
||||
3. [ECS Service Rollback (AWS)](#3-ecs-service-rollback-aws)
|
||||
4. [Docker Compose Rollback (Local / Staging)](#4-docker-compose-rollback-local--staging)
|
||||
5. [Database Migration Rollback](#5-database-migration-rollback)
|
||||
6. [Automated Rollback Triggers](#6-automated-rollback-triggers)
|
||||
7. [Blue-Green Deployment Rollback](#7-blue-green-deployment-rollback)
|
||||
8. [Rollback Decision Tree](#8-rollback-decision-tree)
|
||||
9. [Post-Rollback Verification](#9-post-rollback-verification)
|
||||
10. [Testing Checklist](#10-testing-checklist)
|
||||
11. [Runbook: Emergency Rollback](#11-runbook-emergency-rollback)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
ShieldAI runs four services (api, darkwatch, spamshield, voiceprint) on AWS ECS Fargate behind an Application Load Balancer. Each service has independent deployment, health checks, and rollback capability.
|
||||
|
||||
**Rollback types:**
|
||||
|
||||
| Type | Trigger | Scope | Automation |
|
||||
|------|---------|-------|------------|
|
||||
| **ECS Service Rollback** | Health check failure, manual | Single or all services | ✅ CI/CD + manual script |
|
||||
| **Docker Compose Rollback** | Manual (local/staging) | All services | ✅ Scripted |
|
||||
| **Database Migration Rollback** | Manual | Schema changes | ⚠️ Semi-manual |
|
||||
| **Blue-Green Rollback** | Manual or automated | Full environment | ✅ CI/CD |
|
||||
| **RDS Point-in-Time Restore** | Manual (disaster) | Full database | ⚠️ Semi-manual |
|
||||
|
||||
---
|
||||
|
||||
## 2. Rollback Strategies
|
||||
|
||||
### 2.1 ECS Service-Level Rollback
|
||||
|
||||
Each ECS service maintains a history of task definitions. Rolling back reverts to the **previous successfully deployed task definition**.
|
||||
|
||||
**Prerequisites:**
|
||||
- AWS CLI configured with credentials for the target environment
|
||||
- IAM permissions: `ecs:UpdateService`, `ecs:DescribeServices`, `ecs:WaitServicesStable`
|
||||
|
||||
### 2.2 Blue-Green Rollback
|
||||
|
||||
The CI/CD pipeline deploys new images to existing ECS services. If health checks fail after deployment, the `rollback` job in the deploy workflow automatically reverts all four services to their previous task definition revision.
|
||||
|
||||
**Pipeline flow:**
|
||||
```
|
||||
build-and-push → deploy-ecs → health-check → [PASS: done | FAIL: rollback]
|
||||
```
|
||||
|
||||
### 2.3 Database Migration Rollback
|
||||
|
||||
ShieldAI uses Drizzle ORM for database migrations. Each migration is versioned and stored in `src/db/migrations/`. Rollback requires running the previous migration set.
|
||||
|
||||
---
|
||||
|
||||
## 3. ECS Service Rollback (AWS)
|
||||
|
||||
### 3.1 Automated (CI/CD Pipeline)
|
||||
|
||||
The deploy workflow (`.github/workflows/deploy.yml`) includes a `rollback` job that triggers on health check failure:
|
||||
|
||||
```yaml
|
||||
rollback:
|
||||
if: failure() && needs.health-check.result == 'failure'
|
||||
# Rolls back all 4 services to previous task definition
|
||||
```
|
||||
|
||||
**When it runs:**
|
||||
- Post-deploy health check fails (HTTP 200 not received from `/health`)
|
||||
- Runs after `deploy-ecs` and `health-check` jobs
|
||||
- Rolls back all four services: api, darkwatch, spamshield, voiceprint
|
||||
|
||||
**How to verify:**
|
||||
1. Navigate to the GitHub Actions run for the failed deployment
|
||||
2. Check the `Rollback on Failure` job logs
|
||||
3. Confirm each service shows "Rolled back" status
|
||||
|
||||
### 3.2 Manual Rollback Script
|
||||
|
||||
```bash
|
||||
# Single service
|
||||
./infra/scripts/rollback.sh production api
|
||||
|
||||
# All services
|
||||
./infra/scripts/rollback.sh production all
|
||||
|
||||
# Staging environment
|
||||
./infra/scripts/rollback.sh staging all
|
||||
```
|
||||
|
||||
**Script behavior:**
|
||||
1. Iterates over target services (or all if `all` specified)
|
||||
2. Calls `aws ecs update-service --rollback` for each service
|
||||
3. Waits for service to stabilize via `aws ecs wait services-stable`
|
||||
4. Reports success/failure per service
|
||||
5. Exits with non-zero code if any service fails to stabilize
|
||||
|
||||
**Expected output:**
|
||||
```
|
||||
Rolling back services in cluster: shieldai-production
|
||||
Rolling back api...
|
||||
Waiting for api to stabilize...
|
||||
api rolled back successfully
|
||||
Rolling back darkwatch...
|
||||
Waiting for darkwatch to stabilize...
|
||||
darkwatch rolled back successfully
|
||||
...
|
||||
Rollback complete for api darkwatch spamshield voiceprint
|
||||
```
|
||||
|
||||
### 3.3 Manual CLI Rollback (Fallback)
|
||||
|
||||
If the script is unavailable, rollback individual services:
|
||||
|
||||
```bash
|
||||
CLUSTER="shieldai-production"
|
||||
SERVICE="api"
|
||||
|
||||
# Rollback to previous task definition
|
||||
aws ecs update-service \
|
||||
--cluster "$CLUSTER" \
|
||||
--service "${CLUSTER}-${SERVICE}" \
|
||||
--rollback \
|
||||
--no-cli-auto-prompt
|
||||
|
||||
# Wait for stabilization
|
||||
aws ecs wait services-stable \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "${CLUSTER}-${SERVICE}"
|
||||
|
||||
# Verify health
|
||||
curl -s -o /dev/null -w "%{http_code}" \
|
||||
"https://shieldai-production-alb.us-east-1.elb.amazonaws.com/health"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Docker Compose Rollback (Local / Staging)
|
||||
|
||||
### 4.1 Production Compose Rollback
|
||||
|
||||
The `docker-compose.prod.yml` deploys all services with tagged images. To rollback:
|
||||
|
||||
```bash
|
||||
# 1. Identify the previous working tag
|
||||
# Check GitHub releases or git tags for the last known good version
|
||||
PREVIOUS_TAG="v1.2.3"
|
||||
|
||||
# 2. Stop current services
|
||||
docker compose -f docker-compose.prod.yml down
|
||||
|
||||
# 3. Pull previous images
|
||||
docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-api:${PREVIOUS_TAG}
|
||||
docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-darkwatch:${PREVIOUS_TAG}
|
||||
docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-spamshield:${PREVIOUS_TAG}
|
||||
docker pull ghcr.io/${GITHUB_REPOSITORY_OWNER}/shieldai-voiceprint:${PREVIOUS_TAG}
|
||||
|
||||
# 4. Override tag in compose
|
||||
DOCKER_TAG=${PREVIOUS_TAG} docker compose -f docker-compose.prod.yml up -d
|
||||
|
||||
# 5. Verify health
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
PORT=$(case $svc in
|
||||
api) echo 3000;; darkwatch) echo 3001;;
|
||||
spamshield) echo 3002;; voiceprint) echo 3003;;
|
||||
esac)
|
||||
curl -sf "http://localhost:${PORT}/health" && echo "$svc: OK" || echo "$svc: FAIL"
|
||||
done
|
||||
```
|
||||
|
||||
### 4.2 Local Dev Rollback
|
||||
|
||||
```bash
|
||||
# Stop and remove containers
|
||||
docker compose down
|
||||
|
||||
# Rebuild from previous commit
|
||||
git checkout <previous-commit>
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Database Migration Rollback
|
||||
|
||||
### 5.1 Drizzle Migration Rollback
|
||||
|
||||
ShieldAI uses Drizzle ORM with Turso dialect. Migrations are stored in `src/db/migrations/`.
|
||||
|
||||
```bash
|
||||
# 1. Get database credentials from AWS Secrets Manager
|
||||
DB_SECRET=$(aws secretsmanager get-secret-value \
|
||||
--secret-id "shieldai-${ENVIRONMENT}-db-password" \
|
||||
--query 'SecretString' --output json)
|
||||
|
||||
DB_HOST=$(echo "$DB_SECRET" | jq -r '.host')
|
||||
DB_PORT=$(echo "$DB_SECRET" | jq -r '.port')
|
||||
DB_USER=$(echo "$DB_SECRET" | jq -r '.username')
|
||||
DB_PASS=$(echo "$DB_SECRET" | jq -r '.password')
|
||||
|
||||
DATABASE_URL="postgresql://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/shieldai"
|
||||
|
||||
# 2. List migrations to identify the one to revert
|
||||
npx drizzle-kit introspect --config=drizzle.config.ts
|
||||
|
||||
# 3. Resolve the problematic migration (marks it as not applied)
|
||||
npx drizzle-kit migrate:resolve --migration "<migration_name>" --status applied
|
||||
|
||||
# 4. Re-run previous migration state
|
||||
npx drizzle-kit migrate --config=drizzle.config.ts
|
||||
```
|
||||
|
||||
### 5.2 RDS Point-in-Time Recovery (Disaster)
|
||||
|
||||
When the database itself needs recovery (e.g., data corruption, bad migration):
|
||||
|
||||
```bash
|
||||
# 1. Find available recovery window (automated backups: every 24h, 7-14 day retention)
|
||||
aws rds describe-db-instances \
|
||||
--db-instance-identifier "shieldai-production-db" \
|
||||
--query 'DBInstances[0].LatestRestorableTime'
|
||||
|
||||
# 2. Create restored instance (does not affect primary)
|
||||
aws rds restore-db-instance-to-point-in-time \
|
||||
--source-db-instance-identifier "shieldai-production-db" \
|
||||
--db-instance-identifier "shieldai-production-db-restored" \
|
||||
--restore-time "2026-05-09T08:00:00Z"
|
||||
|
||||
# 3. Verify restored instance
|
||||
aws rds wait db-instance-available \
|
||||
--db-instance-identifier "shieldai-production-db-restored"
|
||||
|
||||
# 4. Update ECS services to point to restored instance
|
||||
# Update DATABASE_URL secret in Secrets Manager
|
||||
aws secretsmanager put-secret-value \
|
||||
--secret-id "shieldai-production-db-password" \
|
||||
--secret-string "$(echo "$DB_SECRET" | jq --arg host "$(aws rds describe-db-instances --db-instance-identifier shieldai-production-db-restored --query 'DBInstances[0].Endpoint.Address' --output text)" '.host = $host')"
|
||||
|
||||
# 5. Trigger ECS service redeployment to pick up new DB endpoint
|
||||
./infra/scripts/rollback.sh production all
|
||||
```
|
||||
|
||||
### 5.3 RDS Snapshot Restore
|
||||
|
||||
```bash
|
||||
# 1. List available snapshots
|
||||
aws rds describe-db-snapshots \
|
||||
--db-instance-identifier "shieldai-production-db"
|
||||
|
||||
# 2. Restore from specific snapshot
|
||||
aws rds restore-db-instance-from-db-snapshot \
|
||||
--db-instance-identifier "shieldai-production-db-restored" \
|
||||
--db-snapshot-identifier "rds:shieldai-production-db-2026-05-08-03-00" \
|
||||
--db-instance-class "db.t3.medium" \
|
||||
--vpc-security-group-ids "$(terraform -chdir=infra/output -raw vpc_security_group_id)"
|
||||
|
||||
# 3. Follow steps 3-5 from Point-in-Time Recovery above
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Automated Rollback Triggers
|
||||
|
||||
### 6.1 CI/CD Health Check Failure
|
||||
|
||||
**Trigger:** Post-deploy health check returns non-200 from `/health`
|
||||
|
||||
**Pipeline job:** `rollback` in `.github/workflows/deploy.yml`
|
||||
|
||||
**Condition:** `if: failure() && needs.health-check.result == 'failure'`
|
||||
|
||||
**Action:** Rolls back all four ECS services to previous task definition
|
||||
|
||||
**Timeout:** Health check retries for 5 minutes before triggering rollback
|
||||
|
||||
### 6.2 ECS Container Health Check
|
||||
|
||||
Each container has an in-container health check defined in the ECS task definition:
|
||||
|
||||
```json
|
||||
"healthCheck": {
|
||||
"command": ["CMD-SHELL", "wget -q --spider http://localhost:{port}/health || exit 1"],
|
||||
"interval": 30,
|
||||
"timeout": 5,
|
||||
"retries": 3,
|
||||
"startPeriod": 60
|
||||
}
|
||||
```
|
||||
|
||||
**Failure consequence:** Container is marked unhealthy after 3 consecutive failures (90 seconds). ALB marks target as unhealthy after 3 failed health checks (90 seconds). Service enters draining state.
|
||||
|
||||
### 6.3 ALB Target Group Health Check
|
||||
|
||||
The ALB performs HTTP health checks against `/health` on each target:
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Interval | 30s |
|
||||
| Timeout | 5s |
|
||||
| Healthy threshold | 3 |
|
||||
| Unhealthy threshold | 3 |
|
||||
| Expected code | 200 |
|
||||
|
||||
### 6.4 CloudWatch Alarms
|
||||
|
||||
The following alarms are configured in `infra/modules/cloudwatch/main.tf`:
|
||||
|
||||
| Alarm | Threshold | Action |
|
||||
|-------|-----------|--------|
|
||||
| ECS CPU >80% | 80% for 2 periods (10min) | SNS notification |
|
||||
| ECS Memory >85% | 85% for 2 periods (10min) | SNS notification |
|
||||
| ALB 5xx >10/min | 10 for 3 periods (3min) | SNS notification |
|
||||
| RDS CPU >75% | 75% for 2 periods (10min) | SNS notification |
|
||||
| RDS Free Storage <500MB | 500MB for 2 periods (10min) | SNS notification |
|
||||
|
||||
**Alarm escalation path:**
|
||||
1. CloudWatch alarm fires
|
||||
2. SNS notification sent to on-call engineer
|
||||
3. Engineer evaluates: if service is degraded, trigger manual rollback
|
||||
4. If root cause is deployment-related, run `./infra/scripts/rollback.sh production all`
|
||||
|
||||
---
|
||||
|
||||
## 7. Blue-Green Deployment Rollback
|
||||
|
||||
### 7.1 Architecture
|
||||
|
||||
ShieldAI uses ECS services with rolling deployments. Each deployment creates a new task definition revision. The ALB routes traffic to healthy targets only.
|
||||
|
||||
**Rollback mechanism:** ECS `--rollback` flag reverts the service to the previous task definition revision. This is equivalent to a blue-green swap since:
|
||||
|
||||
1. Old task definition (blue) remains registered
|
||||
2. New task definition (green) is deployed
|
||||
3. On rollback, ECS reverts to blue task definition
|
||||
4. ALB automatically routes to healthy (blue) targets
|
||||
|
||||
### 7.2 Blue-Green Rollback Procedure
|
||||
|
||||
```bash
|
||||
# 1. Check current deployment state
|
||||
aws ecs list-services --cluster shieldai-production
|
||||
aws ecs describe-services --cluster shieldai-production \
|
||||
--services shieldai-production-api \
|
||||
--query 'services[0].deployments'
|
||||
|
||||
# 2. Identify previous deployment
|
||||
# The deployment with status "PRIMARY" is current.
|
||||
# Look for "ACTIVE" deployment with older task definition.
|
||||
|
||||
# 3. Execute rollback (script handles all services)
|
||||
./infra/scripts/rollback.sh production all
|
||||
|
||||
# 4. Verify rollback
|
||||
aws ecs describe-services --cluster shieldai-production \
|
||||
--services shieldai-production-api \
|
||||
--query 'services[0].deployments[?status==`PRIMARY`].taskDefinition'
|
||||
```
|
||||
|
||||
### 7.3 Docker Compose Blue-Green (Local)
|
||||
|
||||
For local/staging environments using Docker Compose, implement blue-green via service version pinning:
|
||||
|
||||
```bash
|
||||
# Current deployment uses DOCKER_TAG env var
|
||||
# Rollback by setting DOCKER_TAG to previous version
|
||||
|
||||
# Save current tag
|
||||
CURRENT_TAG=$(grep DOCKER_TAG .env.prod 2>/dev/null | cut -d= -f2 || echo "latest")
|
||||
|
||||
# Rollback to previous
|
||||
export DOCKER_TAG="v1.2.3"
|
||||
docker compose -f docker-compose.prod.yml up -d
|
||||
|
||||
# Verify all services
|
||||
docker compose -f docker-compose.prod.yml ps
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Rollback Decision Tree
|
||||
|
||||
```
|
||||
Is the service responding?
|
||||
├── YES → Is the response correct?
|
||||
│ ├── YES → Monitor, no action needed
|
||||
│ └── NO → Is it a data issue?
|
||||
│ ├── YES → Database Migration Rollback (§5)
|
||||
│ └── NO → ECS Service Rollback (§3)
|
||||
└── NO → Is it a single service or all?
|
||||
├── Single → ECS Service Rollback (§3, specific service)
|
||||
└── All → Full Environment Rollback
|
||||
├── Is DB corrupted?
|
||||
│ ├── YES → RDS Point-in-Time Recovery (§5.2)
|
||||
│ └── NO → ECS Full Rollback + DB Migration Rollback
|
||||
```
|
||||
|
||||
**SLA targets:**
|
||||
- Single service rollback: **< 5 minutes**
|
||||
- Full environment rollback: **< 15 minutes**
|
||||
- Database recovery: **< 30 minutes** (Point-in-Time)
|
||||
|
||||
---
|
||||
|
||||
## 9. Post-Rollback Verification
|
||||
|
||||
After any rollback, verify the following:
|
||||
|
||||
### 9.1 Service Health
|
||||
|
||||
```bash
|
||||
# Check all services are healthy
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
PORT=$(case $svc in
|
||||
api) echo 3000;; darkwatch) echo 3001;;
|
||||
spamshield) echo 3002;; voiceprint) echo 3003;;
|
||||
esac)
|
||||
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
|
||||
"https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health")
|
||||
echo "$svc: HTTP $HTTP_CODE"
|
||||
done
|
||||
```
|
||||
|
||||
### 9.2 ECS Service Status
|
||||
|
||||
```bash
|
||||
# Verify all services are stable
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
RUNNING=$(aws ecs describe-services \
|
||||
--cluster "shieldai-${ENVIRONMENT}" \
|
||||
--services "shieldai-${ENVIRONMENT}-${svc}" \
|
||||
--query 'services[0].runningCount' --output text)
|
||||
DESIRED=$(aws ecs describe-services \
|
||||
--cluster "shieldai-${ENVIRONMENT}" \
|
||||
--services "shieldai-${ENVIRONMENT}-${svc}" \
|
||||
--query 'services[0].desiredCount' --output text)
|
||||
echo "$svc: $RUNNING/$DESIRED running"
|
||||
done
|
||||
```
|
||||
|
||||
### 9.3 Database Connectivity
|
||||
|
||||
```bash
|
||||
# Verify database connection
|
||||
aws ecs execute-command \
|
||||
--cluster "shieldai-${ENVIRONMENT}" \
|
||||
--service "shieldai-${ENVIRONMENT}-api" \
|
||||
--command "npx drizzle-kit status" \
|
||||
--interactive --cluster "shieldai-${ENVIRONMENT}"
|
||||
```
|
||||
|
||||
### 9.4 CloudWatch Verification
|
||||
|
||||
1. Navigate to CloudWatch dashboard: `shieldai-${ENVIRONMENT}-dashboard`
|
||||
2. Verify CPU/Memory utilization is within normal range
|
||||
3. Verify ALB 5xx errors have returned to baseline
|
||||
4. Verify no new alarms are in ALARM state
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing Checklist
|
||||
|
||||
### 10.1 ECS Rollback Test
|
||||
|
||||
- [ ] Deploy a known-bad image (e.g., image with `/health` returning 500)
|
||||
- [ ] Verify CI/CD health check fails within 5 minutes
|
||||
- [ ] Verify `rollback` job triggers automatically
|
||||
- [ ] Verify all four services revert to previous task definition
|
||||
- [ ] Verify health check passes post-rollback
|
||||
- [ ] Verify CloudWatch metrics show recovery
|
||||
|
||||
### 10.2 Manual Script Test
|
||||
|
||||
- [ ] Run `./infra/scripts/rollback.sh staging api` on staging
|
||||
- [ ] Verify single service rolls back correctly
|
||||
- [ ] Run `./infra/scripts/rollback.sh staging all` on staging
|
||||
- [ ] Verify all services roll back correctly
|
||||
- [ ] Verify script exits with code 0 on success
|
||||
- [ ] Verify script exits with code 1 on failure
|
||||
|
||||
### 10.3 Docker Compose Rollback Test
|
||||
|
||||
- [ ] Deploy v2.0.0 of all services via docker-compose.prod.yml
|
||||
- [ ] Rollback to v1.0.0 using DOCKER_TAG override
|
||||
- [ ] Verify all services restart with previous images
|
||||
- [ ] Verify health endpoints respond correctly
|
||||
|
||||
### 10.4 Database Migration Rollback Test
|
||||
|
||||
- [ ] Apply a test migration on staging
|
||||
- [ ] Run migration rollback procedure
|
||||
- [ ] Verify schema matches pre-migration state
|
||||
- [ ] Verify application connects and functions correctly
|
||||
|
||||
### 10.5 RDS Point-in-Time Recovery Test
|
||||
|
||||
- [ ] Create a test RDS instance
|
||||
- [ ] Insert test data
|
||||
- [ ] Restore to point before data insertion
|
||||
- [ ] Verify restored instance has correct data state
|
||||
- [ ] Clean up test instance
|
||||
|
||||
### 10.6 End-to-End Rollback Drills
|
||||
|
||||
| Drill | Frequency | Participants |
|
||||
|-------|-----------|--------------|
|
||||
| ECS service rollback | Monthly | Senior Engineer |
|
||||
| Full environment rollback | Quarterly | Full engineering team |
|
||||
| Database recovery | Quarterly | Senior Engineer + Founding Engineer |
|
||||
| Blue-green rollback | Quarterly | Full engineering team |
|
||||
|
||||
---
|
||||
|
||||
## 11. Runbook: Emergency Rollback
|
||||
|
||||
### 11.1 Symptoms
|
||||
|
||||
- ALB 5xx error rate > 10/minute for 3+ minutes
|
||||
- CloudWatch alarm: `shieldai-production-alb-5xx` in ALARM state
|
||||
- Customer-reported service degradation
|
||||
|
||||
### 11.2 Immediate Actions (0-5 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Confirm environment and scope
|
||||
ENVIRONMENT="production"
|
||||
|
||||
# 2. Check service status
|
||||
aws ecs describe-services \
|
||||
--cluster "shieldai-${ENVIRONMENT}" \
|
||||
--services shieldai-${ENVIRONMENT}-api,shieldai-${ENVIRONMENT}-darkwatch,shieldai-${ENVIRONMENT}-spamshield,shieldai-${ENVIRONMENT}-voiceprint \
|
||||
--query 'services[*].{Name:serviceName,Running:runningCount,Desired:desiredCount,Status:status}'
|
||||
|
||||
# 3. Check ALB health
|
||||
curl -s -o /dev/null -w "%{http_code}" \
|
||||
"https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health"
|
||||
|
||||
# 4. Execute rollback
|
||||
./infra/scripts/rollback.sh ${ENVIRONMENT} all
|
||||
```
|
||||
|
||||
### 11.3 Verification (5-10 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Wait for services to stabilize
|
||||
aws ecs wait services-stable \
|
||||
--cluster "shieldai-${ENVIRONMENT}" \
|
||||
--services shieldai-${ENVIRONMENT}-api,shieldai-${ENVIRONMENT}-darkwatch,shieldai-${ENVIRONMENT}-spamshield,shieldai-${ENVIRONMENT}-voiceprint
|
||||
|
||||
# 2. Verify health endpoint
|
||||
curl -sf "https://shieldai-${ENVIRONMENT}-alb.us-east-1.elb.amazonaws.com/health" \
|
||||
&& echo "Health: OK" || echo "Health: FAIL"
|
||||
|
||||
# 3. Check CloudWatch for recovery
|
||||
# Navigate to CloudWatch dashboard and verify metrics
|
||||
```
|
||||
|
||||
### 11.4 Communication Template
|
||||
|
||||
```
|
||||
## Rollback Notification
|
||||
|
||||
**Environment:** production
|
||||
**Time:** $(date -u '+%Y-%m-%d %H:%M UTC')
|
||||
**Trigger:** [ALB 5xx alarm / manual / CI/CD health check]
|
||||
**Action:** Rolled back all services to previous deployment
|
||||
**Status:** [In Progress / Verified / Resolved]
|
||||
**Next steps:** [Post-mortem / monitoring / investigation]
|
||||
```
|
||||
|
||||
### 11.5 Post-Incident
|
||||
|
||||
1. Create incident ticket with timeline
|
||||
2. Document root cause
|
||||
3. Update runbook if procedure changed
|
||||
4. Schedule post-mortem within 48 hours
|
||||
5. Create follow-up issues for preventive measures
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Quick Reference
|
||||
|
||||
| Resource | Command |
|
||||
|----------|---------|
|
||||
| Rollback script | `./infra/scripts/rollback.sh <env> <service\|all>` |
|
||||
| ECS service status | `aws ecs describe-services --cluster shieldai-<env> --services shieldai-<env>-<svc>` |
|
||||
| ALB health check | `curl -s -o /dev/null -w "%{http_code}" https://shieldai-<env>-alb.us-east-1.elb.amazonaws.com/health` |
|
||||
| RDS snapshots | `aws rds describe-db-snapshots --db-instance-identifier shieldai-<env>-db` |
|
||||
| CloudWatch dashboard | `https://us-east-1.console.aws.amazon.com/cloudwatch/home#dashboards/dashboard/shieldai-<env>-dashboard` |
|
||||
| ECS task logs | `aws logs filter-log-events --log-group-name /ecs/shieldai-<env>-<svc>` |
|
||||
|
||||
## Appendix B: Environment Variables
|
||||
|
||||
| Variable | Description | Required |
|
||||
|----------|-------------|----------|
|
||||
| `AWS_ACCESS_KEY_ID` | IAM user with ECS, RDS permissions | Yes |
|
||||
| `AWS_SECRET_ACCESS_KEY` | IAM secret key | Yes |
|
||||
| `AWS_DEFAULT_REGION` | AWS region (default: us-east-1) | Yes |
|
||||
| `GITHUB_REPOSITORY_OWNER` | GitHub org/user for container registry | Docker Compose only |
|
||||
| `DOCKER_TAG` | Container image tag to deploy | Docker Compose only |
|
||||
| `POSTGRES_PASSWORD` | Database password | Docker Compose only |
|
||||
@@ -1,57 +0,0 @@
|
||||
terraform {
|
||||
backend "s3" {
|
||||
bucket = "shieldai-production-terraform-state"
|
||||
key = "production/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "shieldai-terraform-locks"
|
||||
}
|
||||
}
|
||||
|
||||
module "shieldai" {
|
||||
source = "../.."
|
||||
|
||||
environment = "production"
|
||||
aws_region = "us-east-1"
|
||||
project_name = "shieldai"
|
||||
vpc_cidr = "10.1.0.0/16"
|
||||
az_count = 3
|
||||
|
||||
db_instance_class = "db.r6g.large"
|
||||
db_multi_az = true
|
||||
db_backup_retention = 14
|
||||
|
||||
elasticache_node_type = "cache.r6g.large"
|
||||
elasticache_num_nodes = 3
|
||||
|
||||
secrets = {
|
||||
HIBP_API_KEY = var.hibp_api_key
|
||||
RESEND_API_KEY = var.resend_api_key
|
||||
SENTRY_DSN = var.sentry_dsn
|
||||
DATADOG_API_KEY = var.datadog_api_key
|
||||
}
|
||||
}
|
||||
|
||||
variable "hibp_api_key" {
|
||||
description = "Have I Been Pwned API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "resend_api_key" {
|
||||
description = "Resend API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "sentry_dsn" {
|
||||
description = "Sentry DSN"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "datadog_api_key" {
|
||||
description = "Datadog API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
@@ -1,4 +0,0 @@
|
||||
hibp_api_key = "YOUR_HIBP_API_KEY"
|
||||
resend_api_key = "YOUR_RESEND_API_KEY"
|
||||
sentry_dsn = "YOUR_SENTRY_DSN"
|
||||
datadog_api_key = "YOUR_DATADOG_API_KEY"
|
||||
@@ -1,57 +0,0 @@
|
||||
terraform {
|
||||
backend "s3" {
|
||||
bucket = "shieldai-staging-terraform-state"
|
||||
key = "staging/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "shieldai-terraform-locks"
|
||||
}
|
||||
}
|
||||
|
||||
module "shieldai" {
|
||||
source = "../.."
|
||||
|
||||
environment = "staging"
|
||||
aws_region = "us-east-1"
|
||||
project_name = "shieldai"
|
||||
vpc_cidr = "10.0.0.0/16"
|
||||
az_count = 2
|
||||
|
||||
db_instance_class = "db.t3.medium"
|
||||
db_multi_az = false
|
||||
db_backup_retention = 3
|
||||
|
||||
elasticache_node_type = "cache.t3.small"
|
||||
elasticache_num_nodes = 1
|
||||
|
||||
secrets = {
|
||||
HIBP_API_KEY = var.hibp_api_key
|
||||
RESEND_API_KEY = var.resend_api_key
|
||||
SENTRY_DSN = var.sentry_dsn
|
||||
DATADOG_API_KEY = var.datadog_api_key
|
||||
}
|
||||
}
|
||||
|
||||
variable "hibp_api_key" {
|
||||
description = "Have I Been Pwned API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "resend_api_key" {
|
||||
description = "Resend API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "sentry_dsn" {
|
||||
description = "Sentry DSN"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "datadog_api_key" {
|
||||
description = "Datadog API key"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
@@ -1,4 +0,0 @@
|
||||
hibp_api_key = "YOUR_HIBP_API_KEY"
|
||||
resend_api_key = "YOUR_RESEND_API_KEY"
|
||||
sentry_dsn = "YOUR_SENTRY_DSN"
|
||||
datadog_api_key = "YOUR_DATADOG_API_KEY"
|
||||
@@ -1,61 +0,0 @@
|
||||
# ShieldAI Load Tests
|
||||
|
||||
k6 load testing suite for ShieldAI services.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- k6 v0.45+ installed
|
||||
- Target services running on staging environment
|
||||
- Authentication tokens for API access
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Local Execution
|
||||
|
||||
```bash
|
||||
# Run against local development environment
|
||||
k6 run --env BASE_URL=http://localhost:3000 --env AUTH_TOKEN=dev-token src/darkwatch.js
|
||||
|
||||
# Run with results output
|
||||
k6 run --out json=results.json src/darkwatch.js
|
||||
```
|
||||
|
||||
### CI/CD Execution
|
||||
|
||||
```bash
|
||||
# Run on staging environment
|
||||
k6 run --env BASE_URL=https://staging-api.freno.me --env AUTH_TOKEN=$STAGING_AUTH_TOKEN src/darkwatch.js
|
||||
```
|
||||
|
||||
## Test Configuration
|
||||
|
||||
Each test script includes:
|
||||
|
||||
- **Stages**: Ramp-up, sustained load, ramp-down
|
||||
- **Thresholds**: P99 latency and error rate limits
|
||||
- **Metrics**: Custom metrics for error tracking
|
||||
|
||||
### Current Thresholds
|
||||
|
||||
| Service | P99 Latency | Error Rate |
|
||||
|---------|-------------|------------|
|
||||
| Darkwatch | < 200ms | < 1% |
|
||||
|
||||
## Metrics Collection
|
||||
|
||||
Run with output options:
|
||||
|
||||
```bash
|
||||
# JSON output for analysis
|
||||
k6 run --out json=darkwatch-results.json src/darkwatch.js
|
||||
|
||||
# InfluxDB for visualization
|
||||
k6 run --out influxdb=http://influxdb:8086/k6 src/darkwatch.js
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Create load test scripts for Spamshield and Voiceprint
|
||||
2. Integrate with GitHub Actions CI pipeline
|
||||
3. Set up metrics visualization dashboard
|
||||
4. Configure alerting on threshold breaches
|
||||
@@ -1,99 +0,0 @@
|
||||
import http from 'k6/http';
|
||||
import { check, group } from 'k6';
|
||||
import { Rate } from 'k6/metrics';
|
||||
|
||||
// Test configuration
|
||||
export const options = {
|
||||
stages: [
|
||||
{ duration: '30s', target: 100 }, // Ramp up to 100 users
|
||||
{ duration: '2m', target: 500 }, // Ramp to 500 req/s
|
||||
{ duration: '3m', target: 500 }, // Stay at 500 req/s for 3 minutes
|
||||
{ duration: '30s', target: 0 }, // Ramp down to 0
|
||||
],
|
||||
thresholds: {
|
||||
http_req_duration: ['p(99)<200'], // P99 latency < 200ms
|
||||
errors: ['rate<0.01'], // Error rate < 1%
|
||||
},
|
||||
};
|
||||
|
||||
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
|
||||
|
||||
export default function () {
|
||||
group('Watchlist Operations', function () {
|
||||
// GET /watchlist
|
||||
const watchlistRes = http.get(`${BASE_URL}/watchlist`, {
|
||||
headers: { 'Authorization': `Bearer ${getAuthToken()}` },
|
||||
});
|
||||
|
||||
check(watchlistRes, {
|
||||
'watchlist GET status is 200': (r) => r.status === 200,
|
||||
'watchlist GET P99 < 100ms': (r) => r.timings.duration < 100,
|
||||
});
|
||||
|
||||
// POST /watchlist
|
||||
const newItemRes = http.post(
|
||||
`${BASE_URL}/watchlist`,
|
||||
JSON.stringify({ type: 'email', value: `test${Date()}@example.com` }),
|
||||
{
|
||||
headers: {
|
||||
'Authorization': `Bearer ${getAuthToken()}`,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
}
|
||||
);
|
||||
|
||||
check(newItemRes, {
|
||||
'watchlist POST status is 201': (r) => r.status === 201,
|
||||
'watchlist POST P99 < 200ms': (r) => r.timings.duration < 200,
|
||||
});
|
||||
|
||||
// POST /scan
|
||||
const scanRes = http.post(
|
||||
`${BASE_URL}/scan`,
|
||||
{},
|
||||
{
|
||||
headers: { 'Authorization': `Bearer ${getAuthToken()}` },
|
||||
}
|
||||
);
|
||||
|
||||
check(scanRes, {
|
||||
'scan POST status is 200': (r) => r.status === 200,
|
||||
'scan POST P99 < 150ms': (r) => r.timings.duration < 150,
|
||||
});
|
||||
|
||||
// GET /scan/schedule
|
||||
const scheduleRes = http.get(`${BASE_URL}/scan/schedule`, {
|
||||
headers: { 'Authorization': `Bearer ${getAuthToken()}` },
|
||||
});
|
||||
|
||||
check(scheduleRes, {
|
||||
'schedule GET status is 200': (r) => r.status === 200,
|
||||
'schedule GET P99 < 100ms': (r) => r.timings.duration < 100,
|
||||
});
|
||||
|
||||
// GET /exposures
|
||||
const exposuresRes = http.get(`${BASE_URL}/exposures`, {
|
||||
headers: { 'Authorization': `Bearer ${getAuthToken()}` },
|
||||
});
|
||||
|
||||
check(exposuresRes, {
|
||||
'exposures GET status is 200': (r) => r.status === 200,
|
||||
'exposures GET P99 < 150ms': (r) => r.timings.duration < 150,
|
||||
});
|
||||
|
||||
// GET /alerts
|
||||
const alertsRes = http.get(`${BASE_URL}/alerts`, {
|
||||
headers: { 'Authorization': `Bearer ${getAuthToken()}` },
|
||||
});
|
||||
|
||||
check(alertsRes, {
|
||||
'alerts GET status is 200': (r) => r.status === 200,
|
||||
'alerts GET P99 < 150ms': (r) => r.timings.duration < 150,
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
// Helper function to get auth token (replace with actual token retrieval)
|
||||
function getAuthToken() {
|
||||
return __ENV.AUTH_TOKEN || 'test-token';
|
||||
}
|
||||
113
infra/main.tf
113
infra/main.tf
@@ -1,113 +0,0 @@
|
||||
terraform {
|
||||
required_version = ">= 1.5.0"
|
||||
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.30"
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
backend "s3" {
|
||||
bucket = "shieldai-terraform-state"
|
||||
key = "global/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "shieldai-terraform-locks"
|
||||
}
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
region = var.aws_region
|
||||
|
||||
default_tags {
|
||||
tags = {
|
||||
Project = "ShieldAI"
|
||||
ManagedBy = "terraform"
|
||||
Environment = var.environment
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module "vpc" {
|
||||
source = "./modules/vpc"
|
||||
|
||||
environment = var.environment
|
||||
vpc_cidr = var.vpc_cidr
|
||||
az_count = var.az_count
|
||||
project_name = var.project_name
|
||||
kms_key_arn = module.ecs.kms_key_arn
|
||||
}
|
||||
|
||||
module "ecs" {
|
||||
source = "./modules/ecs"
|
||||
|
||||
environment = var.environment
|
||||
cluster_name = "${var.project_name}-${var.environment}"
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
public_subnet_ids = module.vpc.public_subnet_ids
|
||||
security_group_ids = [module.vpc.ecs_security_group_id]
|
||||
alb_security_group_id = module.vpc.alb_security_group_id
|
||||
services = var.services
|
||||
container_images = var.container_images
|
||||
secrets_arn = module.secrets.secrets_manager_arn
|
||||
cache_cluster_arn = module.elasticache.replication_group_arn
|
||||
domain_name = var.domain_name
|
||||
}
|
||||
|
||||
module "rds" {
|
||||
source = "./modules/rds"
|
||||
|
||||
environment = var.environment
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
security_group_id = module.vpc.rds_security_group_id
|
||||
db_name = var.db_name
|
||||
db_instance_class = var.db_instance_class
|
||||
multi_az = var.db_multi_az
|
||||
backup_retention = var.db_backup_retention
|
||||
project_name = var.project_name
|
||||
}
|
||||
|
||||
module "elasticache" {
|
||||
source = "./modules/elasticache"
|
||||
|
||||
environment = var.environment
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
security_group_id = module.vpc.elasticache_security_group_id
|
||||
node_type = var.elasticache_node_type
|
||||
num_nodes = var.elasticache_num_nodes
|
||||
project_name = var.project_name
|
||||
}
|
||||
|
||||
module "s3" {
|
||||
source = "./modules/s3"
|
||||
|
||||
environment = var.environment
|
||||
project_name = var.project_name
|
||||
}
|
||||
|
||||
module "secrets" {
|
||||
source = "./modules/secrets"
|
||||
|
||||
environment = var.environment
|
||||
project_name = var.project_name
|
||||
rds_endpoint = module.rds.db_endpoint
|
||||
db_password = module.rds.db_password
|
||||
elasticache_endpoint = module.elasticache.cache_endpoint
|
||||
redis_auth_token = module.elasticache.auth_token
|
||||
secrets = var.secrets
|
||||
}
|
||||
|
||||
module "cloudwatch" {
|
||||
source = "./modules/cloudwatch"
|
||||
|
||||
environment = var.environment
|
||||
cluster_name = "${var.project_name}-${var.environment}"
|
||||
project_name = var.project_name
|
||||
rds_identifier = module.rds.db_instance_identifier
|
||||
cache_endpoint = module.elasticache.cache_endpoint
|
||||
}
|
||||
@@ -1,464 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "cluster_name" {
|
||||
description = "ECS cluster name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "rds_identifier" {
|
||||
description = "RDS instance identifier"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "cache_endpoint" {
|
||||
description = "ElastiCache endpoint"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "alert_email" {
|
||||
description = "Email address for alert notifications"
|
||||
type = string
|
||||
default = "ops@shieldai.com"
|
||||
}
|
||||
|
||||
resource "aws_sns_topic" "alerts" {
|
||||
name = "${var.project_name}-${var.environment}-alerts"
|
||||
|
||||
tags = {
|
||||
Environment = var.environment
|
||||
Project = var.project_name
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_sns_topic_subscription" "alerts_email" {
|
||||
topic_arn = aws_sns_topic.alerts.arn
|
||||
protocol = "email"
|
||||
endpoint = var.alert_email
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_dashboard" "main" {
|
||||
dashboard_name = "${var.project_name}-${var.environment}-dashboard"
|
||||
|
||||
dashboard_body = jsonencode({
|
||||
widgets = [
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "ECS CPU Utilization"
|
||||
metrics = [
|
||||
["AWS/ECS", "CPUUtilization", "ClusterName", var.cluster_name]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "ECS Memory Utilization"
|
||||
metrics = [
|
||||
["AWS/ECS", "MemoryUtilization", "ClusterName", var.cluster_name]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "RDS CPU Utilization"
|
||||
metrics = [
|
||||
["AWS/RDS", "CPUUtilization", "DBInstanceIdentifier", var.rds_identifier]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 300
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "ALB Request Count"
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", "${var.cluster_name}-alb"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "ALB 5xx Errors"
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "HTTPCode_Elb_5XX_Count", "LoadBalancer", "${var.cluster_name}-alb"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "P99 Latency (Target Group)"
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", "${var.cluster_name}-alb", "Statistic", "p99"],
|
||||
["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", "${var.cluster_name}-alb", "Statistic", "p95"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "Error Rate (5xx / Total)"
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "HTTPCode_Elb_5XX_Count", "LoadBalancer", "${var.cluster_name}-alb"],
|
||||
["AWS/ApplicationELB", "HTTPCode_Elb_4XX_Count", "LoadBalancer", "${var.cluster_name}-alb"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "Throughput (Request Count)"
|
||||
metrics = [
|
||||
["AWS/ApplicationELB", "RequestCount", "LoadBalancer", "${var.cluster_name}-alb"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
yAxis = {
|
||||
left = {
|
||||
label = "Requests/sec"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "API Latency Percentiles"
|
||||
metrics = [
|
||||
["ShieldAI", "api_latency", "service", "api", "percentile", "p99", "statistic", "Average"],
|
||||
["ShieldAI", "api_latency", "service", "api", "percentile", "p95", "statistic", "Average"],
|
||||
["ShieldAI", "api_latency", "service", "api", "percentile", "p50", "statistic", "Average"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "API Error Rate"
|
||||
metrics = [
|
||||
["ShieldAI", "api_errors", "service", "api", "statistic", "Sum"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "API Throughput"
|
||||
metrics = [
|
||||
["ShieldAI", "api_requests", "service", "api", "statistic", "Sum"]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "ECS Running Tasks"
|
||||
metrics = [
|
||||
["AWS/ECS", "RunningTaskCount", "ClusterName", var.cluster_name]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
},
|
||||
{
|
||||
type = "metric"
|
||||
properties = {
|
||||
title = "RDS Read/Write IOPS"
|
||||
metrics = [
|
||||
["AWS/RDS", "ReadIOPS", "DBInstanceIdentifier", var.rds_identifier],
|
||||
["AWS/RDS", "WriteIOPS", "DBInstanceIdentifier", var.rds_identifier]
|
||||
]
|
||||
view = "timeSeries"
|
||||
stacked = false
|
||||
region = "us-east-1"
|
||||
period = 60
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "ecs_cpu_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-ecs-cpu-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/ECS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 80
|
||||
alarm_description = "ECS CPU utilization above 80%"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
ClusterName = var.cluster_name
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "ecs_memory_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-ecs-memory-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "MemoryUtilization"
|
||||
namespace = "AWS/ECS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 85
|
||||
alarm_description = "ECS memory utilization above 85%"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
ClusterName = var.cluster_name
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "alb_5xx" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-alb-5xx"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 3
|
||||
metric_name = "HTTPCode_Elb_5XX_Count"
|
||||
namespace = "AWS/ApplicationELB"
|
||||
period = 60
|
||||
statistic = "Sum"
|
||||
threshold = 10
|
||||
alarm_description = "ALB 5xx errors above 10 per minute"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
LoadBalancer = "${var.cluster_name}-alb"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "rds_cpu_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-rds-cpu-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/RDS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 75
|
||||
alarm_description = "RDS CPU utilization above 75%"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
DBInstanceIdentifier = var.rds_identifier
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "rds_free_storage" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-rds-free-storage"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = 2
|
||||
metric_name = "FreeStorageSpace"
|
||||
namespace = "AWS/RDS"
|
||||
period = 300
|
||||
statistic = "Average"
|
||||
threshold = 524288000
|
||||
alarm_description = "RDS free storage below 500MB"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
DBInstanceIdentifier = var.rds_identifier
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "p99_latency_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-p99-latency-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 3
|
||||
metric_name = "TargetResponseTime"
|
||||
namespace = "AWS/ApplicationELB"
|
||||
period = 60
|
||||
statistic = "p99"
|
||||
threshold = 2
|
||||
alarm_description = "P99 latency above 2 seconds"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
LoadBalancer = "${var.cluster_name}-alb"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "error_rate_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-error-rate-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 3
|
||||
metric_name = "HTTPCode_Elb_5XX_Count"
|
||||
namespace = "AWS/ApplicationELB"
|
||||
period = 60
|
||||
statistic = "Sum"
|
||||
threshold = 5
|
||||
alarm_description = "Error rate above 5 errors per minute"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
LoadBalancer = "${var.cluster_name}-alb"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "throughput_low" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-throughput-low"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = 5
|
||||
metric_name = "RequestCount"
|
||||
namespace = "AWS/ApplicationELB"
|
||||
period = 60
|
||||
statistic = "Sum"
|
||||
threshold = 10
|
||||
alarm_description = "Throughput below 10 requests per minute"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
LoadBalancer = "${var.cluster_name}-alb"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "api" {
|
||||
name = "/${var.project_name}/${var.environment}/api"
|
||||
retention_in_days = 30
|
||||
|
||||
tags = {
|
||||
Environment = var.environment
|
||||
Project = var.project_name
|
||||
Service = "api"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "datadog" {
|
||||
name = "/${var.project_name}/${var.environment}/datadog"
|
||||
retention_in_days = 30
|
||||
|
||||
tags = {
|
||||
Environment = var.environment
|
||||
Project = var.project_name
|
||||
Service = "datadog"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "sentry" {
|
||||
name = "/${var.project_name}/${var.environment}/sentry"
|
||||
retention_in_days = 30
|
||||
|
||||
tags = {
|
||||
Environment = var.environment
|
||||
Project = var.project_name
|
||||
Service = "sentry"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "app_p99_latency_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-app-p99-latency-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 3
|
||||
metric_name = "api_latency"
|
||||
namespace = "ShieldAI"
|
||||
period = 60
|
||||
statistic = "Average"
|
||||
threshold = 2000
|
||||
alarm_description = "Application P99 latency above 2000ms"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
service = "api"
|
||||
percentile = "p99"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "app_error_rate_high" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-app-error-rate-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = 3
|
||||
metric_name = "api_errors"
|
||||
namespace = "ShieldAI"
|
||||
period = 60
|
||||
statistic = "Sum"
|
||||
threshold = 10
|
||||
alarm_description = "Application error count above 10 per minute"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
service = "api"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "app_throughput_low" {
|
||||
alarm_name = "${var.project_name}-${var.environment}-app-throughput-low"
|
||||
comparison_operator = "LessThanThreshold"
|
||||
evaluation_periods = 5
|
||||
metric_name = "api_requests"
|
||||
namespace = "ShieldAI"
|
||||
period = 60
|
||||
statistic = "Sum"
|
||||
threshold = 10
|
||||
alarm_description = "Application throughput below 10 requests per minute"
|
||||
alarm_actions = [aws_sns_topic.alerts.arn]
|
||||
|
||||
dimensions = {
|
||||
service = "api"
|
||||
}
|
||||
}
|
||||
|
||||
output "dashboard_url" {
|
||||
description = "CloudWatch dashboard URL"
|
||||
value = "https://us-east-1.console.aws.amazon.com/cloudwatch/home#dashboards/dashboard/${var.project_name}-${var.environment}-dashboard"
|
||||
}
|
||||
|
||||
output "sns_topic_arn" {
|
||||
description = "SNS topic ARN for alerts"
|
||||
value = aws_sns_topic.alerts.arn
|
||||
}
|
||||
@@ -1,519 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "cluster_name" {
|
||||
description = "ECS cluster name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "vpc_id" {
|
||||
description = "VPC ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "subnet_ids" {
|
||||
description = "Private subnet IDs for ECS tasks"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "public_subnet_ids" {
|
||||
description = "Public subnet IDs for ALB"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "security_group_ids" {
|
||||
description = "Security group IDs"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "alb_security_group_id" {
|
||||
description = "ALB security group ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "services" {
|
||||
description = "ECS services to deploy"
|
||||
type = map(object({
|
||||
cpu = number
|
||||
memory = number
|
||||
port = number
|
||||
}))
|
||||
}
|
||||
|
||||
variable "container_images" {
|
||||
description = "Container image tags"
|
||||
type = map(string)
|
||||
}
|
||||
|
||||
variable "secrets_arn" {
|
||||
description = "Secrets Manager ARN"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "cache_cluster_arn" {
|
||||
description = "ElastiCache replication group ARN"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "domain_name" {
|
||||
description = "Route53 hosted zone domain for ACM cert validation"
|
||||
type = string
|
||||
default = "shieldai.app"
|
||||
}
|
||||
|
||||
resource "aws_ecs_cluster" "main" {
|
||||
name = var.cluster_name
|
||||
|
||||
settings {
|
||||
name = "containerInsights"
|
||||
value = "enabled"
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = var.cluster_name
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_ecs_cluster_capacity_providers" "main" {
|
||||
cluster_name = aws_ecs_cluster.main.name
|
||||
|
||||
capacity_providers = ["FARGATE"]
|
||||
|
||||
default_capacity_provider_strategy {
|
||||
base = 1
|
||||
weight = 100
|
||||
capacity_provider = "FARGATE"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_ecs_task_definition" "services" {
|
||||
for_each = var.services
|
||||
|
||||
family = "${var.cluster_name}-${each.key}"
|
||||
|
||||
container_definitions = jsonencode([
|
||||
{
|
||||
name = each.key
|
||||
image = "ghcr.io/shieldai/shieldai-${each.key}:${var.container_images[each.key]}"
|
||||
cpu = each.cpu
|
||||
memory = each.memory
|
||||
essential = true
|
||||
|
||||
portMappings = [
|
||||
{
|
||||
containerPort = each.port
|
||||
hostPort = each.port
|
||||
protocol = "tcp"
|
||||
}
|
||||
]
|
||||
|
||||
environment = [
|
||||
{
|
||||
name = "NODE_ENV"
|
||||
value = var.environment
|
||||
},
|
||||
{
|
||||
name = "PORT"
|
||||
value = tostring(each.port)
|
||||
},
|
||||
{
|
||||
name = "DD_ENV"
|
||||
value = var.environment
|
||||
},
|
||||
{
|
||||
name = "DD_SERVICE"
|
||||
value = "${var.cluster_name}-${each.key}"
|
||||
},
|
||||
{
|
||||
name = "DD_VERSION"
|
||||
value = var.container_images[each.key]
|
||||
},
|
||||
{
|
||||
name = "DD_TRACE_ENABLED"
|
||||
value = "true"
|
||||
},
|
||||
{
|
||||
name = "DD_LOGS_INJECTION"
|
||||
value = "true"
|
||||
},
|
||||
{
|
||||
name = "DD_AGENT_HOST"
|
||||
value = "localhost"
|
||||
},
|
||||
{
|
||||
name = "DD_AGENT_PORT"
|
||||
value = "8126"
|
||||
},
|
||||
{
|
||||
name = "SENTRY_ENVIRONMENT"
|
||||
value = var.environment
|
||||
},
|
||||
{
|
||||
name = "SENTRY_RELEASE"
|
||||
value = var.container_images[each.key]
|
||||
},
|
||||
{
|
||||
name = "AWS_REGION"
|
||||
value = "us-east-1"
|
||||
},
|
||||
{
|
||||
name = "DD_SITE"
|
||||
value = "datadoghq.com"
|
||||
}
|
||||
]
|
||||
|
||||
secrets = [
|
||||
{
|
||||
name = "DATABASE_URL"
|
||||
valueFrom = "${var.secrets_arn}:DATABASE_URL::"
|
||||
},
|
||||
{
|
||||
name = "REDIS_URL"
|
||||
valueFrom = "${var.secrets_arn}:REDIS_URL::"
|
||||
},
|
||||
{
|
||||
name = "HIBP_API_KEY"
|
||||
valueFrom = "${var.secrets_arn}:HIBP_API_KEY::"
|
||||
},
|
||||
{
|
||||
name = "RESEND_API_KEY"
|
||||
valueFrom = "${var.secrets_arn}:RESEND_API_KEY::"
|
||||
},
|
||||
{
|
||||
name = "SENTRY_DSN"
|
||||
valueFrom = "${var.secrets_arn}:SENTRY_DSN::"
|
||||
},
|
||||
{
|
||||
name = "DD_API_KEY"
|
||||
valueFrom = "${var.secrets_arn}:DD_API_KEY::"
|
||||
}
|
||||
]
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
"awslogs-group" = "/ecs/${var.cluster_name}-${each.key}"
|
||||
"awslogs-region" = "us-east-1"
|
||||
"awslogs-stream-prefix" = each.key
|
||||
}
|
||||
}
|
||||
|
||||
healthCheck = {
|
||||
command = ["CMD-SHELL", "curl -f http://localhost:${each.port}/health || exit 1"]
|
||||
interval = 30
|
||||
timeout = 5
|
||||
retries = 3
|
||||
startPeriod = 60
|
||||
}
|
||||
}
|
||||
])
|
||||
|
||||
network_mode = "awsvpc"
|
||||
memory = each.memory
|
||||
cpu = each.cpu
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
|
||||
execution_role_arn = aws_iam_role.execution[each.key].arn
|
||||
task_role_arn = aws_iam_role.task[each.key].arn
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-${each.key}"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_role" "execution" {
|
||||
for_each = var.services
|
||||
|
||||
name = "${var.cluster_name}-${each.key}-execution"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "ecs-tasks.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
managed_policy_arns = [
|
||||
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
|
||||
]
|
||||
}
|
||||
|
||||
resource "aws_iam_role" "task" {
|
||||
for_each = var.services
|
||||
|
||||
name = "${var.cluster_name}-${each.key}-task"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "ecs-tasks.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
|
||||
inline_policy {
|
||||
name = "secrets-manager-access"
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Effect = "Allow"
|
||||
Action = [
|
||||
"secretsmanager:GetSecretValue",
|
||||
"secretsmanager:DescribeSecret"
|
||||
]
|
||||
Resource = var.secrets_arn
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
inline_policy {
|
||||
name = "elasticache-access"
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Effect = "Allow"
|
||||
Action = [
|
||||
"elasticache:DescribeCacheClusters",
|
||||
"elasticache:DescribeCacheSubnetGroups"
|
||||
]
|
||||
Resource = var.cache_cluster_arn
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_ecs_service" "services" {
|
||||
for_each = var.services
|
||||
|
||||
name = "${var.cluster_name}-${each.key}"
|
||||
cluster = aws_ecs_cluster.main.id
|
||||
task_definition = aws_ecs_task_definition.services[each.key].arn
|
||||
desired_count = var.environment == "production" ? 3 : 1
|
||||
|
||||
launch_type = "FARGATE"
|
||||
|
||||
network_configuration {
|
||||
subnets = var.subnet_ids
|
||||
security_groups = var.security_group_ids
|
||||
assign_public_ip = false
|
||||
}
|
||||
|
||||
load_balancer {
|
||||
target_group_arn = aws_lb_target_group.services[each.key].arn
|
||||
container_name = each.key
|
||||
container_port = each.port
|
||||
}
|
||||
|
||||
auto_scaling {
|
||||
max_capacity = var.environment == "production" ? 10 : 3
|
||||
min_capacity = var.environment == "production" ? 2 : 1
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-${each.key}"
|
||||
Service = each.key
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
aws_lb_listener.https
|
||||
]
|
||||
}
|
||||
|
||||
resource "aws_lb" "main" {
|
||||
name = "${var.cluster_name}-alb"
|
||||
internal = false
|
||||
load_balancer_type = "application"
|
||||
security_groups = [var.alb_security_group_id]
|
||||
subnets = var.public_subnet_ids
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-alb"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate" "main" {
|
||||
domain_name = "${var.cluster_name}.${var.environment}.shieldai.app"
|
||||
validation_method = "DNS"
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-cert"
|
||||
}
|
||||
}
|
||||
|
||||
data "aws_route53_zone" "main" {
|
||||
name = var.domain_name
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "acm_validation" {
|
||||
for_each = {
|
||||
for rv in aws_acm_certificate.main.domain_validation_options : rv.domain_name => rv
|
||||
if rv.resource_record_name != null
|
||||
}
|
||||
|
||||
zone_id = data.aws_route53_zone.main.zone_id
|
||||
name = each.value.resource_record_name
|
||||
type = each.value.resource_record_type
|
||||
ttl = 60
|
||||
records = [each.value.resource_record_value]
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate_validation" "main" {
|
||||
certificate_arn = aws_acm_certificate.main.arn
|
||||
validation_record_fqdns = [aws_route53_record.acm_validation[*].fqdn]
|
||||
}
|
||||
|
||||
resource "aws_lb_target_group" "services" {
|
||||
for_each = var.services
|
||||
|
||||
name = "${var.cluster_name}-${each.key}-tg"
|
||||
port = each.port
|
||||
protocol = "HTTP"
|
||||
vpc_id = var.vpc_id
|
||||
|
||||
health_check {
|
||||
enabled = true
|
||||
healthy_threshold = 3
|
||||
interval = 30
|
||||
matcher = "200"
|
||||
path = "/health"
|
||||
port = "traffic-port"
|
||||
protocol = "HTTP"
|
||||
timeout = 5
|
||||
unhealthy_threshold = 3
|
||||
}
|
||||
|
||||
stickiness {
|
||||
type = "lb_cookie"
|
||||
cookie_duration = 86400
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_listener" "https" {
|
||||
load_balancer_arn = aws_lb.main.arn
|
||||
port = 443
|
||||
protocol = "HTTPS"
|
||||
ssl_certificate_arn = aws_acm_certificate_validation.main.certificate_arn
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.services["api"].arn
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_listener_rule" "services" {
|
||||
for_each = { for k, v in var.services : k => v if k != "api" }
|
||||
|
||||
listener_arn = aws_lb_listener.https.arn
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.services[each.key].arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = ["/${each.key}/*", "/${each.key}"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_listener" "http_redirect" {
|
||||
load_balancer_arn = aws_lb.main.arn
|
||||
port = 80
|
||||
protocol = "HTTP"
|
||||
|
||||
default_action {
|
||||
type = "redirect"
|
||||
|
||||
redirect {
|
||||
port = "443"
|
||||
protocol = "HTTPS"
|
||||
status_code = "HTTP_301"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_target" "services" {
|
||||
for_each = var.services
|
||||
|
||||
service_namespace = "ecs"
|
||||
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.services[each.key].name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
min_capacity = var.environment == "production" ? 2 : 1
|
||||
max_capacity = var.environment == "production" ? 10 : 3
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_policy" "cpu" {
|
||||
for_each = var.services
|
||||
|
||||
name = "${var.cluster_name}-${each.key}-cpu-scaling"
|
||||
service_namespace = "ecs"
|
||||
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.services[each.key].name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
|
||||
target_tracking_scaling_policy_configuration {
|
||||
target_value = 70.0
|
||||
scale_in_cooldown = 60
|
||||
scale_out_cooldown = 30
|
||||
|
||||
customized_metric_specification {
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/ECS"
|
||||
statistic = "Average"
|
||||
dimensions = [{ name = "ClusterName", value = aws_ecs_cluster.main.name }]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_kms_key" "logs" {
|
||||
description = "${var.cluster_name} logs encryption key"
|
||||
deletion_window_in_days = 7
|
||||
enable_key_rotation = true
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-logs-kms"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "services" {
|
||||
for_each = var.services
|
||||
|
||||
name = "/ecs/${var.cluster_name}-${each.key}"
|
||||
retention_in_days = var.environment == "production" ? 30 : 7
|
||||
kms_key_id = aws_kms_key.logs.arn
|
||||
|
||||
tags = {
|
||||
Name = "${var.cluster_name}-${each.key}-logs"
|
||||
}
|
||||
}
|
||||
|
||||
output "cluster_arn" {
|
||||
description = "ECS cluster ARN"
|
||||
value = aws_ecs_cluster.main.arn
|
||||
}
|
||||
|
||||
output "alb_dns_name" {
|
||||
description = "ALB DNS name"
|
||||
value = aws_lb.main.dns_name
|
||||
}
|
||||
|
||||
output "kms_key_arn" {
|
||||
description = "KMS key ARN for log encryption"
|
||||
value = aws_kms_key.logs.arn
|
||||
}
|
||||
@@ -1,102 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "vpc_id" {
|
||||
description = "VPC ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "subnet_ids" {
|
||||
description = "Private subnet IDs"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "security_group_id" {
|
||||
description = "ElastiCache security group ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "node_type" {
|
||||
description = "Cache node type"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "num_nodes" {
|
||||
description = "Number of cache nodes"
|
||||
type = number
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
resource "aws_elasticache_subnet_group" "main" {
|
||||
name = "${var.project_name}-${var.environment}-redis-subnet"
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-redis-subnet"
|
||||
}
|
||||
}
|
||||
|
||||
resource "random_password" "redis_auth" {
|
||||
length = 32
|
||||
special = false
|
||||
|
||||
keepers = {
|
||||
environment = var.environment
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_elasticache_replication_group" "main" {
|
||||
replication_group_id = "${var.project_name}-${var.environment}-redis"
|
||||
description = "${var.project_name} Redis cluster (${var.environment})"
|
||||
|
||||
node_type = var.node_type
|
||||
num_cache_clusters = var.num_nodes
|
||||
engine = "redis"
|
||||
engine_version = "7.0"
|
||||
|
||||
auth_token = random_password.redis_auth.result
|
||||
|
||||
transit_encryption_enabled = true
|
||||
at_rest_encryption_enabled = true
|
||||
|
||||
port = 6379
|
||||
|
||||
subnet_group_name = aws_elasticache_subnet_group.main.name
|
||||
security_group_ids = [var.security_group_id]
|
||||
|
||||
automatic_failover_enabled = var.environment == "production"
|
||||
|
||||
snapshot_retention_limit = var.environment == "production" ? 7 : 1
|
||||
snapshot_window = "03:00-04:00"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-redis"
|
||||
}
|
||||
}
|
||||
|
||||
output "cache_endpoint" {
|
||||
description = "ElastiCache primary endpoint"
|
||||
value = aws_elasticache_replication_group.main.primary_endpoint_address
|
||||
}
|
||||
|
||||
output "reader_endpoint" {
|
||||
description = "ElastiCache reader endpoint"
|
||||
value = aws_elasticache_replication_group.main.reader_endpoint_address
|
||||
}
|
||||
|
||||
output "auth_token" {
|
||||
description = "Redis auth token"
|
||||
value = random_password.redis_auth.result
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "replication_group_arn" {
|
||||
description = "ElastiCache replication group ARN"
|
||||
value = aws_elasticache_replication_group.main.arn
|
||||
}
|
||||
@@ -1,138 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "vpc_id" {
|
||||
description = "VPC ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "subnet_ids" {
|
||||
description = "Private subnet IDs"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "security_group_id" {
|
||||
description = "RDS security group ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "Database name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "db_instance_class" {
|
||||
description = "RDS instance class"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "multi_az" {
|
||||
description = "Multi-AZ deployment"
|
||||
type = bool
|
||||
}
|
||||
|
||||
variable "backup_retention" {
|
||||
description = "Backup retention days"
|
||||
type = number
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
resource "aws_db_subnet_group" "main" {
|
||||
name = "${var.project_name}-${var.environment}-db-subnet"
|
||||
subnet_ids = var.subnet_ids
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-db-subnet"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_db_instance" "main" {
|
||||
identifier = "${var.project_name}-${var.environment}-db"
|
||||
|
||||
engine = "postgres"
|
||||
engine_version = "16.2"
|
||||
instance_class = var.db_instance_class
|
||||
allocated_storage = var.environment == "production" ? 100 : 20
|
||||
|
||||
db_name = var.db_name
|
||||
username = "shieldai"
|
||||
password = random_password.db_password.result
|
||||
|
||||
multi_az = var.multi_az
|
||||
db_subnet_group_name = aws_db_subnet_group.main.name
|
||||
vpc_security_group_ids = [var.security_group_id]
|
||||
|
||||
backup_retention_period = var.backup_retention
|
||||
backup_window = "03:00-04:00"
|
||||
maintenance_window = "sun:04:00-sun:05:00"
|
||||
|
||||
skip_final_snapshot = var.environment != "production"
|
||||
final_snapshot_identifier = "${var.project_name}-${var.environment}-final"
|
||||
|
||||
storage_encrypted = true
|
||||
storage_type = "gp3"
|
||||
iops = var.environment == "production" ? 3000 : 1000
|
||||
|
||||
deletion_protection = var.environment == "production"
|
||||
copy_tags_to_snapshot = true
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-db"
|
||||
}
|
||||
}
|
||||
|
||||
resource "random_password" "db_password" {
|
||||
length = 16
|
||||
special = true
|
||||
|
||||
keepers = {
|
||||
environment = var.environment
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "db_password" {
|
||||
secret_id = aws_secretsmanager_secret.db_password.id
|
||||
secret_string = jsonencode({
|
||||
username = "shieldai"
|
||||
password = random_password.db_password.result
|
||||
engine = "postgres"
|
||||
host = aws_db_instance.main.address
|
||||
port = aws_db_instance.main.port
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret" "db_password" {
|
||||
name = "${var.project_name}-${var.environment}-db-password"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-db-password"
|
||||
}
|
||||
}
|
||||
|
||||
output "db_endpoint" {
|
||||
description = "RDS endpoint"
|
||||
value = aws_db_instance.main.endpoint
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "db_instance_identifier" {
|
||||
description = "RDS instance identifier"
|
||||
value = aws_db_instance.main.identifier
|
||||
}
|
||||
|
||||
output "db_password_secret_arn" {
|
||||
description = "DB password secret ARN"
|
||||
value = aws_secretsmanager_secret.db_password.arn
|
||||
}
|
||||
|
||||
output "db_password" {
|
||||
description = "Generated DB password"
|
||||
value = random_password.db_password.result
|
||||
sensitive = true
|
||||
}
|
||||
@@ -1,145 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket" "terraform_state" {
|
||||
bucket = "${var.project_name}-${var.environment}-terraform-state"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-terraform-state"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_public_access_block" "terraform_state" {
|
||||
bucket = aws_s3_bucket.terraform_state.id
|
||||
|
||||
block_public_acls = true
|
||||
block_public_policy = true
|
||||
ignore_public_acls = true
|
||||
restrict_public_buckets = true
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_versioning" "terraform_state" {
|
||||
bucket = aws_s3_bucket.terraform_state.id
|
||||
versioning_configuration {
|
||||
status = "Enabled"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
|
||||
bucket = aws_s3_bucket.terraform_state.id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "aws:kms"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "terraform_state" {
|
||||
bucket = aws_s3_bucket.terraform_state.id
|
||||
|
||||
rule {
|
||||
id = "expire-noncurrent"
|
||||
status = "Enabled"
|
||||
|
||||
noncurrent_version_expiration {
|
||||
noncurrent_days = 30
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket" "artifacts" {
|
||||
bucket = "${var.project_name}-${var.environment}-artifacts"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-artifacts"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_public_access_block" "artifacts" {
|
||||
bucket = aws_s3_bucket.artifacts.id
|
||||
|
||||
block_public_acls = true
|
||||
block_public_policy = true
|
||||
ignore_public_acls = true
|
||||
restrict_public_buckets = true
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_versioning" "artifacts" {
|
||||
bucket = aws_s3_bucket.artifacts.id
|
||||
versioning_configuration {
|
||||
status = "Enabled"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "artifacts" {
|
||||
bucket = aws_s3_bucket.artifacts.id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "aws:kms"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket" "logs" {
|
||||
bucket = "${var.project_name}-${var.environment}-logs"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-logs"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_public_access_block" "logs" {
|
||||
bucket = aws_s3_bucket.logs.id
|
||||
|
||||
block_public_acls = true
|
||||
block_public_policy = true
|
||||
ignore_public_acls = true
|
||||
restrict_public_buckets = true
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "logs" {
|
||||
bucket = aws_s3_bucket.logs.id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "aws:kms"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "logs" {
|
||||
bucket = aws_s3_bucket.logs.id
|
||||
|
||||
rule {
|
||||
id = "expire-old-logs"
|
||||
status = "Enabled"
|
||||
|
||||
expiration {
|
||||
days = 90
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
output "bucket_name" {
|
||||
description = "Terraform state S3 bucket name"
|
||||
value = aws_s3_bucket.terraform_state.id
|
||||
}
|
||||
|
||||
output "artifacts_bucket_name" {
|
||||
description = "Artifacts S3 bucket name"
|
||||
value = aws_s3_bucket.artifacts.id
|
||||
}
|
||||
|
||||
output "logs_bucket_name" {
|
||||
description = "Logs S3 bucket name"
|
||||
value = aws_s3_bucket.logs.id
|
||||
}
|
||||
@@ -1,69 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "rds_endpoint" {
|
||||
description = "RDS instance endpoint"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "db_password" {
|
||||
description = "Generated RDS password"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "elasticache_endpoint" {
|
||||
description = "ElastiCache primary endpoint"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "redis_auth_token" {
|
||||
description = "ElastiCache auth token"
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "secrets" {
|
||||
description = "Secrets to store"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret" "main" {
|
||||
name = "${var.project_name}-${var.environment}-app-secrets"
|
||||
|
||||
description = "Application secrets for ${var.project_name} (${var.environment})"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-app-secrets"
|
||||
Environment = var.environment
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "main" {
|
||||
secret_id = aws_secretsmanager_secret.main.id
|
||||
|
||||
secret_string = jsonencode(merge({
|
||||
DATABASE_URL = "postgresql://shieldai:${var.db_password}@${var.rds_endpoint}:5432/shieldai"
|
||||
REDIS_URL = "redis://:${var.redis_auth_token}@${var.elasticache_endpoint}:6379"
|
||||
NODE_ENV = var.environment
|
||||
LOG_LEVEL = var.environment == "production" ? "info" : "debug"
|
||||
}, var.secrets))
|
||||
}
|
||||
|
||||
output "secrets_manager_arn" {
|
||||
description = "Secrets Manager ARN"
|
||||
value = aws_secretsmanager_secret.main.arn
|
||||
}
|
||||
|
||||
output "secrets_manager_name" {
|
||||
description = "Secrets Manager secret name"
|
||||
value = aws_secretsmanager_secret.main.name
|
||||
}
|
||||
@@ -1,338 +0,0 @@
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "vpc_cidr" {
|
||||
description = "CIDR block for VPC"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "az_count" {
|
||||
description = "Number of availability zones"
|
||||
type = number
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "kms_key_arn" {
|
||||
description = "KMS key ARN for log encryption"
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
resource "aws_vpc" "main" {
|
||||
cidr_block = var.vpc_cidr
|
||||
enable_dns_support = true
|
||||
enable_dns_hostnames = true
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-vpc"
|
||||
}
|
||||
}
|
||||
|
||||
data "aws_availability_zones" "available" {
|
||||
state = "available"
|
||||
}
|
||||
|
||||
resource "aws_subnet" "public" {
|
||||
count = var.az_count
|
||||
|
||||
vpc_id = aws_vpc.main.id
|
||||
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
|
||||
availability_zone = data.aws_availability_zones.available.names[count.index]
|
||||
map_public_ip_on_launch = false
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-public-${data.aws_availability_zones.available.names[count.index]}"
|
||||
"kubernetes.io/role/elb" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_subnet" "private" {
|
||||
count = var.az_count
|
||||
|
||||
vpc_id = aws_vpc.main.id
|
||||
cidr_block = cidrsubnet(var.vpc_cidr, 8, var.az_count + count.index)
|
||||
availability_zone = data.aws_availability_zones.available.names[count.index]
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-private-${data.aws_availability_zones.available.names[count.index]}"
|
||||
"kubernetes.io/role/internal-elb" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_internet_gateway" "main" {
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-igw"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_eip" "nat" {
|
||||
count = var.az_count
|
||||
|
||||
domain = "vpc"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-nat-${count.index}"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_nat_gateway" "main" {
|
||||
count = var.az_count
|
||||
|
||||
allocation_id = aws_eip.nat[count.index].id
|
||||
subnet_id = aws_subnet.public[count.index].id
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-nat-${count.index}"
|
||||
}
|
||||
|
||||
depends_on = [aws_internet_gateway.main]
|
||||
}
|
||||
|
||||
resource "aws_route_table" "public" {
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
gateway_id = aws_internet_gateway.main.id
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-public-rt"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_route_table" "private" {
|
||||
count = var.az_count
|
||||
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
nat_gateway_id = aws_nat_gateway.main[count.index].id
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-private-rt-${count.index}"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "public" {
|
||||
count = var.az_count
|
||||
|
||||
subnet_id = aws_subnet.public[count.index].id
|
||||
route_table_id = aws_route_table.public.id
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "private" {
|
||||
count = var.az_count
|
||||
|
||||
subnet_id = aws_subnet.private[count.index].id
|
||||
route_table_id = aws_route_table.private[count.index].id
|
||||
}
|
||||
|
||||
resource "aws_security_group" "alb" {
|
||||
name_prefix = "${var.project_name}-${var.environment}-alb"
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
ingress {
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
description = "HTTPS from internet"
|
||||
}
|
||||
|
||||
ingress {
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
description = "HTTP from internet (redirect)"
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-alb-sg"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "ecs" {
|
||||
name_prefix = "${var.project_name}-${var.environment}-ecs"
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
ingress {
|
||||
from_port = 3000
|
||||
to_port = 3003
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.alb.id]
|
||||
description = "Service ports from ALB only"
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-ecs-sg"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "rds" {
|
||||
name_prefix = "${var.project_name}-${var.environment}-rds"
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
ingress {
|
||||
from_port = 5432
|
||||
to_port = 5432
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.ecs.id]
|
||||
description = "PostgreSQL from ECS"
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-rds-sg"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "elasticache" {
|
||||
name_prefix = "${var.project_name}-${var.environment}-elasticache"
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
ingress {
|
||||
from_port = 6379
|
||||
to_port = 6379
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.ecs.id]
|
||||
description = "Redis from ECS"
|
||||
}
|
||||
|
||||
egress {
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-elasticache-sg"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_flow_log" "main" {
|
||||
iam_role_arn = aws_iam_role.flow_log.arn
|
||||
log_destination = aws_cloudwatch_log_group.flow_log.arn
|
||||
vpc_id = aws_vpc.main.id
|
||||
traffic_type = "ALL"
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-flow-log"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_role" "flow_log" {
|
||||
name = "${var.project_name}-${var.environment}-flow-log-role"
|
||||
|
||||
assume_role_policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = "sts:AssumeRole"
|
||||
Effect = "Allow"
|
||||
Principal = {
|
||||
Service = "vpc-flow-logs.amazonaws.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy" "flow_log" {
|
||||
name = "${var.project_name}-${var.environment}-flow-log-policy"
|
||||
role = aws_iam_role.flow_log.id
|
||||
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [
|
||||
{
|
||||
Action = [
|
||||
"logs:CreateLogGroup",
|
||||
"logs:CreateLogStream",
|
||||
"logs:PutLogEvents",
|
||||
"logs:DescribeLogGroups",
|
||||
"logs:DescribeLogStreams"
|
||||
]
|
||||
Effect = "Allow"
|
||||
Resource = [aws_cloudwatch_log_group.flow_log.arn]
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "flow_log" {
|
||||
name = "/${var.project_name}/${var.environment}/vpc-flow-log"
|
||||
retention_in_days = var.environment == "production" ? 30 : 7
|
||||
kms_key_id = var.kms_key_arn != "" ? var.kms_key_arn : null
|
||||
|
||||
tags = {
|
||||
Name = "${var.project_name}-${var.environment}-flow-log"
|
||||
}
|
||||
}
|
||||
|
||||
output "vpc_id" {
|
||||
description = "VPC ID"
|
||||
value = aws_vpc.main.id
|
||||
}
|
||||
|
||||
output "private_subnet_ids" {
|
||||
description = "Private subnet IDs"
|
||||
value = aws_subnet.private[*].id
|
||||
}
|
||||
|
||||
output "public_subnet_ids" {
|
||||
description = "Public subnet IDs"
|
||||
value = aws_subnet.public[*].id
|
||||
}
|
||||
|
||||
output "alb_security_group_id" {
|
||||
description = "ALB security group ID"
|
||||
value = aws_security_group.alb.id
|
||||
}
|
||||
|
||||
output "ecs_security_group_id" {
|
||||
description = "ECS security group ID"
|
||||
value = aws_security_group.ecs.id
|
||||
}
|
||||
|
||||
output "rds_security_group_id" {
|
||||
description = "RDS security group ID"
|
||||
value = aws_security_group.rds.id
|
||||
}
|
||||
|
||||
output "elasticache_security_group_id" {
|
||||
description = "ElastiCache security group ID"
|
||||
value = aws_security_group.elasticache.id
|
||||
}
|
||||
@@ -1,35 +0,0 @@
|
||||
output "vpc_id" {
|
||||
description = "VPC ID"
|
||||
value = module.vpc.vpc_id
|
||||
}
|
||||
|
||||
output "cluster_name" {
|
||||
description = "ECS cluster name"
|
||||
value = "${var.project_name}-${var.environment}"
|
||||
}
|
||||
|
||||
output "rds_endpoint" {
|
||||
description = "RDS endpoint"
|
||||
value = module.rds.db_endpoint
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
output "elasticache_endpoint" {
|
||||
description = "ElastiCache primary endpoint"
|
||||
value = module.elasticache.cache_endpoint
|
||||
}
|
||||
|
||||
output "s3_bucket_name" {
|
||||
description = "S3 bucket name"
|
||||
value = module.s3.bucket_name
|
||||
}
|
||||
|
||||
output "secrets_manager_arn" {
|
||||
description = "Secrets Manager ARN"
|
||||
value = module.secrets.secrets_manager_arn
|
||||
}
|
||||
|
||||
output "cloudwatch_dashboard_url" {
|
||||
description = "CloudWatch dashboard URL"
|
||||
value = module.cloudwatch.dashboard_url
|
||||
}
|
||||
@@ -1,121 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# ShieldAI Docker Compose Rollback Script
|
||||
# Usage: ./rollback-compose.sh <previous_tag> [--env prod|dev]
|
||||
#
|
||||
# Rolls back all services to a previous tagged image using docker-compose.prod.yml
|
||||
#
|
||||
# Examples:
|
||||
# ./rollback-compose.sh v1.2.3 # Rollback to v1.2.3
|
||||
# ./rollback-compose.sh v1.2.3 --env prod # Explicit production compose
|
||||
|
||||
PREVIOUS_TAG="${1:-}"
|
||||
ENV_MODE="${2:-prod}"
|
||||
|
||||
# ─── Configuration ───────────────────────────────────────────────
|
||||
SERVICES="api darkwatch spamshield voiceprint"
|
||||
COMPOSE_FILE="docker-compose.prod.yml"
|
||||
REGISTRY_OWNER="${GITHUB_REPOSITORY_OWNER:-shieldai}"
|
||||
|
||||
# ─── Helpers ─────────────────────────────────────────────────────
|
||||
log() {
|
||||
local level="$1"
|
||||
shift
|
||||
echo "[$(date -u '+%H:%M:%S')] [$level] $*"
|
||||
}
|
||||
|
||||
log_info() { log "INFO" "$@"; }
|
||||
log_warn() { log "WARN" "$@"; }
|
||||
log_error() { log "ERROR" "$@"; }
|
||||
|
||||
# ─── Validation ──────────────────────────────────────────────────
|
||||
if [[ -z "$PREVIOUS_TAG" ]]; then
|
||||
log_error "Usage: $0 <previous_tag> [--env prod|dev]"
|
||||
log_error "Example: $0 v1.2.3"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if ! command -v docker &>/dev/null; then
|
||||
log_error "Docker not found in PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─── Rollback Logic ──────────────────────────────────────────────
|
||||
main() {
|
||||
log_info "=== Docker Compose Rollback ==="
|
||||
log_info "Target tag: $PREVIOUS_TAG"
|
||||
log_info "Compose file: $COMPOSE_FILE"
|
||||
log_info "Registry: ghcr.io/$REGISTRY_OWNER"
|
||||
|
||||
# 1. Pull previous images
|
||||
log_info "Pulling previous images..."
|
||||
local pull_failed=0
|
||||
for svc in $SERVICES; do
|
||||
local image="ghcr.io/${REGISTRY_OWNER}/shieldai-${svc}:${PREVIOUS_TAG}"
|
||||
log_info "Pulling $image..."
|
||||
if docker pull "$image" 2>/dev/null; then
|
||||
log_info "Pulled: $image"
|
||||
else
|
||||
log_warn "Pull failed: $image (may not exist)"
|
||||
pull_failed=1
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ $pull_failed -eq 1 ]]; then
|
||||
log_warn "Some images may not exist at tag $PREVIOUS_TAG"
|
||||
log_info "Continuing with available images..."
|
||||
fi
|
||||
|
||||
# 2. Stop current services gracefully
|
||||
log_info "Stopping current services..."
|
||||
DOCKER_TAG="$PREVIOUS_TAG" docker compose -f "$COMPOSE_FILE" down --timeout 30 2>/dev/null || true
|
||||
|
||||
# 3. Start with previous tag
|
||||
log_info "Starting services with tag $PREVIOUS_TAG..."
|
||||
DOCKER_TAG="$PREVIOUS_TAG" docker compose -f "$COMPOSE_FILE" up -d
|
||||
|
||||
# 4. Wait for services to be healthy
|
||||
log_info "Waiting for services to become healthy..."
|
||||
sleep 10
|
||||
|
||||
# 5. Verify health
|
||||
local passed=0
|
||||
local failed=0
|
||||
|
||||
for svc in $SERVICES; do
|
||||
local port
|
||||
port=$(case "$svc" in
|
||||
api) echo 3000 ;;
|
||||
darkwatch) echo 3001 ;;
|
||||
spamshield) echo 3002 ;;
|
||||
voiceprint) echo 3003 ;;
|
||||
esac)
|
||||
|
||||
local http_code
|
||||
http_code=$(curl -s -o /dev/null -w "%{http_code}" \
|
||||
--connect-timeout 10 --max-time 30 \
|
||||
"http://localhost:${port}/health" 2>/dev/null || echo "000")
|
||||
|
||||
if [[ "$http_code" == "200" ]]; then
|
||||
log_info "Health OK: $svc (port $port, HTTP $http_code)"
|
||||
((passed++))
|
||||
else
|
||||
log_warn "Health FAIL: $svc (port $port, HTTP $http_code)"
|
||||
((failed++))
|
||||
fi
|
||||
done
|
||||
|
||||
log_info "=== Rollback Complete ==="
|
||||
log_info "Passed: $passed, Failed: $failed"
|
||||
|
||||
if [[ $failed -gt 0 ]]; then
|
||||
log_warn "Some services failed health check. Check logs: docker compose -f $COMPOSE_FILE logs"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_info "All services healthy after rollback"
|
||||
exit 0
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@@ -1,164 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# ShieldAI Database Migration Rollback Script
|
||||
# Usage: ./rollback-migration.sh <environment> [--migration <name>]
|
||||
#
|
||||
# Rolls back the most recent migration or a specific named migration
|
||||
# Uses AWS Secrets Manager for database credentials
|
||||
#
|
||||
# Examples:
|
||||
# ./rollback-migration.sh staging # Rollback latest
|
||||
# ./rollback-migration.sh production --migration 001_create_users # Rollback specific
|
||||
|
||||
ENVIRONMENT="${1:-staging}"
|
||||
MIGRATION_NAME="${3:-}"
|
||||
|
||||
# ─── Configuration ───────────────────────────────────────────────
|
||||
SECRET_ID="shieldai-${ENVIRONMENT}-db-password"
|
||||
DB_NAME="shieldai"
|
||||
DB_USER="shieldai"
|
||||
|
||||
# ─── Helpers ─────────────────────────────────────────────────────
|
||||
log() {
|
||||
local level="$1"
|
||||
shift
|
||||
echo "[$(date -u '+%H:%M:%S')] [$level] $*"
|
||||
}
|
||||
|
||||
log_info() { log "INFO" "$@"; }
|
||||
log_warn() { log "WARN" "$@"; }
|
||||
log_error() { log "ERROR" "$@"; }
|
||||
|
||||
# ─── Validation ──────────────────────────────────────────────────
|
||||
if [[ "$ENVIRONMENT" != "staging" && "$ENVIRONMENT" != "production" ]]; then
|
||||
log_error "Invalid environment: $ENVIRONMENT (expected: staging, production)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for cmd in aws jq; do
|
||||
if ! command -v "$cmd" &>/dev/null; then
|
||||
log_error "Missing prerequisite: $cmd"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
# ─── Credentials ─────────────────────────────────────────────────
|
||||
get_db_credentials() {
|
||||
log_info "Fetching database credentials from Secrets Manager..."
|
||||
|
||||
local secret
|
||||
secret=$(aws secretsmanager get-secret-value \
|
||||
--secret-id "$SECRET_ID" \
|
||||
--query 'SecretString' \
|
||||
--output json 2>/dev/null)
|
||||
|
||||
if [[ -z "$secret" ]]; then
|
||||
log_error "Failed to fetch secret: $SECRET_ID"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
export DB_HOST=$(echo "$secret" | jq -r '.host')
|
||||
export DB_PORT=$(echo "$secret" | jq -r '.port' // '5432')
|
||||
export DB_PASS=$(echo "$secret" | jq -r '.password')
|
||||
export DATABASE_URL="postgresql://${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}/${DB_NAME}"
|
||||
|
||||
log_info "Database: ${DB_HOST}:${DB_PORT}/${DB_NAME}"
|
||||
}
|
||||
|
||||
# ─── Migration Status ────────────────────────────────────────────
|
||||
show_migration_status() {
|
||||
log_info "=== Current Migration Status ==="
|
||||
|
||||
if command -v npx &>/dev/null; then
|
||||
npx drizzle-kit status --config=drizzle.config.ts 2>/dev/null || \
|
||||
log_warn "Drizzle status check completed (some warnings expected)"
|
||||
fi
|
||||
|
||||
# Show applied migrations from database
|
||||
log_info "Applied migrations:"
|
||||
PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \
|
||||
-c "SELECT id, checksum, type FROM __drizzle_migrations_schema ORDER BY id DESC;" 2>/dev/null || \
|
||||
log_warn "Could not query migration table (psql may not be installed)"
|
||||
}
|
||||
|
||||
# ─── Rollback Logic ──────────────────────────────────────────────
|
||||
rollback_latest() {
|
||||
log_info "=== Rolling Back Latest Migration ==="
|
||||
|
||||
# Get the latest applied migration
|
||||
local latest_migration
|
||||
latest_migration=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" \
|
||||
-U "$DB_USER" -d "$DB_NAME" -t -A \
|
||||
-c "SELECT id FROM __drizzle_migrations_schema ORDER BY id DESC LIMIT 1;" 2>/dev/null)
|
||||
|
||||
if [[ -z "$latest_migration" ]]; then
|
||||
log_warn "No applied migrations found"
|
||||
return 0
|
||||
fi
|
||||
|
||||
log_info "Latest migration: $latest_migration"
|
||||
|
||||
# Resolve the migration (marks it as not applied)
|
||||
if command -v npx &>/dev/null; then
|
||||
npx drizzle-kit migrate:resolve --migration "$latest_migration" --status applied 2>/dev/null || \
|
||||
log_warn "Migration resolve completed (check output for details)"
|
||||
fi
|
||||
|
||||
log_info "Migration $latest_migration marked as resolved"
|
||||
}
|
||||
|
||||
rollback_specific() {
|
||||
local target="$1"
|
||||
log_info "=== Rolling Back Migration: $target ==="
|
||||
|
||||
if command -v npx &>/dev/null; then
|
||||
npx drizzle-kit migrate:resolve --migration "$target" --status applied 2>/dev/null || \
|
||||
log_warn "Migration resolve completed (check output for details)"
|
||||
fi
|
||||
|
||||
log_info "Migration $target marked as resolved"
|
||||
}
|
||||
|
||||
# ─── Verification ────────────────────────────────────────────────
|
||||
verify_connection() {
|
||||
log_info "=== Verifying Database Connection ==="
|
||||
|
||||
local result
|
||||
result=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -p "$DB_PORT" \
|
||||
-U "$DB_USER" -d "$DB_NAME" -t -A \
|
||||
-c "SELECT version();" 2>/dev/null || echo "FAIL")
|
||||
|
||||
if [[ "$result" != "FAIL" ]]; then
|
||||
log_info "Connection OK: PostgreSQL $result"
|
||||
else
|
||||
log_warn "Connection check failed"
|
||||
fi
|
||||
}
|
||||
|
||||
# ─── Main ────────────────────────────────────────────────────────
|
||||
main() {
|
||||
log_info "=== ShieldAI Migration Rollback ==="
|
||||
log_info "Environment: $ENVIRONMENT"
|
||||
log_info "Secret: $SECRET_ID"
|
||||
|
||||
get_db_credentials
|
||||
show_migration_status
|
||||
|
||||
if [[ -n "$MIGRATION_NAME" ]]; then
|
||||
rollback_specific "$MIGRATION_NAME"
|
||||
else
|
||||
rollback_latest
|
||||
fi
|
||||
|
||||
verify_connection
|
||||
show_migration_status
|
||||
|
||||
log_info "=== Rollback Complete ==="
|
||||
log_info "Next steps:"
|
||||
log_info "1. Verify application schema compatibility"
|
||||
log_info "2. Run application health checks"
|
||||
log_info "3. If needed, redeploy ECS services: ./rollback.sh $ENVIRONMENT all"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@@ -1,255 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# ShieldAI ECS Rollback Script
|
||||
# Usage: ./rollback.sh <environment> <service|all> [--verify]
|
||||
#
|
||||
# Environments: staging, production
|
||||
# Services: api, darkwatch, spamshield, voiceprint, all
|
||||
#
|
||||
# Examples:
|
||||
# ./rollback.sh staging api # Rollback single service
|
||||
# ./rollback.sh production all # Rollback all services
|
||||
# ./rollback.sh production all --verify # Rollback with post-verification
|
||||
|
||||
# ─── Configuration ───────────────────────────────────────────────
|
||||
ENVIRONMENT="${1:-staging}"
|
||||
SERVICE="${2:-all}"
|
||||
VERIFY="${3:-false}"
|
||||
|
||||
CLUSTER="shieldai-${ENVIRONMENT}"
|
||||
SERVICES_LIST="api darkwatch spamshield voiceprint"
|
||||
EXIT_CODE=0
|
||||
TIMESTAMP=$(date -u '+%Y-%m-%d %H:%M:%S UTC')
|
||||
LOG_FILE="/tmp/shieldai-rollback-${ENVIRONMENT}-${TIMESTAMP//[: ]/_}.log"
|
||||
|
||||
# ─── Helpers ─────────────────────────────────────────────────────
|
||||
log() {
|
||||
local level="$1"
|
||||
shift
|
||||
local msg="$*"
|
||||
echo "[$(date -u '+%H:%M:%S')] [$level] $msg" | tee -a "$LOG_FILE"
|
||||
}
|
||||
|
||||
log_info() { log "INFO" "$@"; }
|
||||
log_warn() { log "WARN" "$@"; }
|
||||
log_error() { log "ERROR" "$@"; }
|
||||
|
||||
# ─── Validation ──────────────────────────────────────────────────
|
||||
validate_environment() {
|
||||
if [[ "$ENVIRONMENT" != "staging" && "$ENVIRONMENT" != "production" ]]; then
|
||||
log_error "Invalid environment: $ENVIRONMENT (expected: staging, production)"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
validate_service() {
|
||||
if [[ "$SERVICE" == "all" ]]; then
|
||||
return 0
|
||||
fi
|
||||
if ! echo "$SERVICES_LIST" | grep -qw "$SERVICE"; then
|
||||
log_error "Invalid service: $SERVICE (expected: api, darkwatch, spamshield, voiceprint, all)"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_prerequisites() {
|
||||
local missing=()
|
||||
|
||||
for cmd in aws jq curl; do
|
||||
if ! command -v "$cmd" &>/dev/null; then
|
||||
missing+=("$cmd")
|
||||
fi
|
||||
done
|
||||
|
||||
if [[ ${#missing[@]} -gt 0 ]]; then
|
||||
log_error "Missing prerequisites: ${missing[*]}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "${AWS_DEFAULT_REGION:-}" ]]; then
|
||||
export AWS_DEFAULT_REGION="us-east-1"
|
||||
fi
|
||||
|
||||
log_info "Prerequisites OK (region: $AWS_DEFAULT_REGION)"
|
||||
}
|
||||
|
||||
# ─── Rollback Logic ──────────────────────────────────────────────
|
||||
get_target_services() {
|
||||
if [[ "$SERVICE" == "all" ]]; then
|
||||
echo "$SERVICES_LIST"
|
||||
else
|
||||
echo "$SERVICE"
|
||||
fi
|
||||
}
|
||||
|
||||
rollback_service() {
|
||||
local svc="$1"
|
||||
local service_name="${CLUSTER}-${svc}"
|
||||
|
||||
log_info "Rolling back $service_name..."
|
||||
|
||||
# Check current deployment status
|
||||
local current_task_def
|
||||
current_task_def=$(aws ecs describe-services \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "$service_name" \
|
||||
--query 'services[0].taskDefinition' \
|
||||
--output text 2>/dev/null || echo "UNKNOWN")
|
||||
|
||||
log_info "Current task definition: $current_task_def"
|
||||
|
||||
# Execute rollback
|
||||
if aws ecs update-service \
|
||||
--cluster "$CLUSTER" \
|
||||
--service "$service_name" \
|
||||
--rollback \
|
||||
--no-cli-auto-prompt 2>>"$LOG_FILE"; then
|
||||
log_info "Rollback initiated for $service_name"
|
||||
else
|
||||
log_error "Rollback failed to initiate for $service_name"
|
||||
EXIT_CODE=1
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Wait for stabilization (max 5 minutes)
|
||||
log_info "Waiting for $service_name to stabilize (timeout: 300s)..."
|
||||
if aws ecs wait services-stable \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "$service_name" \
|
||||
--timeout 300 2>>"$LOG_FILE"; then
|
||||
log_info "$service_name stabilized successfully"
|
||||
else
|
||||
log_warn "$service_name stabilization timed out or failed"
|
||||
EXIT_CODE=1
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Get new task definition after rollback
|
||||
local new_task_def
|
||||
new_task_def=$(aws ecs describe-services \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "$service_name" \
|
||||
--query 'services[0].taskDefinition' \
|
||||
--output text 2>/dev/null || echo "UNKNOWN")
|
||||
|
||||
local running_count
|
||||
running_count=$(aws ecs describe-services \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "$service_name" \
|
||||
--query 'services[0].runningCount' \
|
||||
--output text 2>/dev/null || echo "0")
|
||||
|
||||
local desired_count
|
||||
desired_count=$(aws ecs describe-services \
|
||||
--cluster "$CLUSTER" \
|
||||
--services "$service_name" \
|
||||
--query 'services[0].desiredCount' \
|
||||
--output text 2>/dev/null || echo "0")
|
||||
|
||||
log_info "Rollback complete: $service_name -> $new_task_def ($running_count/$desired_count running)"
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
# ─── Health Verification ─────────────────────────────────────────
|
||||
verify_health() {
|
||||
local svc="$1"
|
||||
local port
|
||||
port=$(case "$svc" in
|
||||
api) echo 3000 ;;
|
||||
darkwatch) echo 3001 ;;
|
||||
spamshield) echo 3002 ;;
|
||||
voiceprint) echo 3003 ;;
|
||||
*) echo 3000 ;;
|
||||
esac)
|
||||
|
||||
local alb_dns="https://${CLUSTER}-alb.${AWS_DEFAULT_REGION}.elb.amazonaws.com"
|
||||
|
||||
log_info "Verifying health for $svc (ALB: $alb_dns)..."
|
||||
|
||||
local http_code
|
||||
http_code=$(curl -s -o /dev/null -w "%{http_code}" \
|
||||
--connect-timeout 10 \
|
||||
--max-time 30 \
|
||||
"$alb_dns/health" 2>/dev/null || echo "000")
|
||||
|
||||
if [[ "$http_code" == "200" ]]; then
|
||||
log_info "Health check PASSED: $svc (HTTP $http_code)"
|
||||
return 0
|
||||
else
|
||||
log_warn "Health check FAILED: $svc (HTTP $http_code)"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
verify_all_services() {
|
||||
log_info "=== Post-Rollback Health Verification ==="
|
||||
local passed=0
|
||||
local failed=0
|
||||
|
||||
for svc in $(get_target_services); do
|
||||
if verify_health "$svc"; then
|
||||
((passed++))
|
||||
else
|
||||
((failed++))
|
||||
fi
|
||||
done
|
||||
|
||||
log_info "Verification complete: $passed passed, $failed failed"
|
||||
|
||||
if [[ $failed -gt 0 ]]; then
|
||||
log_warn "Some services failed health verification"
|
||||
EXIT_CODE=1
|
||||
fi
|
||||
}
|
||||
|
||||
# ─── Main Execution ──────────────────────────────────────────────
|
||||
main() {
|
||||
log_info "=== ShieldAI Rollback ==="
|
||||
log_info "Environment: $ENVIRONMENT"
|
||||
log_info "Service(s): $SERVICE"
|
||||
log_info "Cluster: $CLUSTER"
|
||||
log_info "Verify: $VERIFY"
|
||||
log_info "Timestamp: $TIMESTAMP"
|
||||
log_info "Log file: $LOG_FILE"
|
||||
log_info "=========================="
|
||||
|
||||
# Validate inputs
|
||||
validate_environment
|
||||
validate_service
|
||||
check_prerequisites
|
||||
|
||||
# Execute rollback for each target service
|
||||
local rolled_back=0
|
||||
local failed=0
|
||||
|
||||
for svc in $(get_target_services); do
|
||||
if rollback_service "$svc"; then
|
||||
((rolled_back++))
|
||||
else
|
||||
((failed++))
|
||||
fi
|
||||
done
|
||||
|
||||
log_info "=== Rollback Summary ==="
|
||||
log_info "Rolled back: $rolled_back services"
|
||||
log_info "Failed: $failed services"
|
||||
|
||||
# Post-rollback verification
|
||||
if [[ "$VERIFY" == "--verify" ]] || [[ "$VERIFY" == "true" ]]; then
|
||||
verify_all_services
|
||||
fi
|
||||
|
||||
if [[ $failed -gt 0 ]]; then
|
||||
log_error "Rollback completed with $failed failure(s)"
|
||||
log_info "Full log: $LOG_FILE"
|
||||
exit "$EXIT_CODE"
|
||||
fi
|
||||
|
||||
log_info "Rollback completed successfully"
|
||||
log_info "Full log: $LOG_FILE"
|
||||
exit 0
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@@ -1,237 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -uo pipefail
|
||||
|
||||
# ShieldAI Rollback Test Suite
|
||||
# Usage: ./test-rollback.sh [ecs|compose|migration|all]
|
||||
#
|
||||
# Validates rollback scripts and procedures without mutating production
|
||||
# Run against staging environment for integration tests
|
||||
|
||||
TEST_SUITE="${1:-all}"
|
||||
PASS=0
|
||||
FAIL=0
|
||||
SKIP=0
|
||||
|
||||
# ─── Helpers ─────────────────────────────────────────────────────
|
||||
log() {
|
||||
echo "[$(date -u '+%H:%M:%S')] $*"
|
||||
}
|
||||
|
||||
assert_eq() {
|
||||
local desc="$1" expected="$2" actual="$3"
|
||||
if [[ "$expected" == "$actual" ]]; then
|
||||
log " ✅ PASS: $desc"
|
||||
((PASS++))
|
||||
else
|
||||
log " ❌ FAIL: $desc (expected: $expected, got: $actual)"
|
||||
((FAIL++))
|
||||
fi
|
||||
}
|
||||
|
||||
assert_file_exists() {
|
||||
local desc="$1" path="$2"
|
||||
if [[ -f "$path" ]]; then
|
||||
log " ✅ PASS: $desc"
|
||||
((PASS++))
|
||||
else
|
||||
log " ❌ FAIL: $desc ($path not found)"
|
||||
((FAIL++))
|
||||
fi
|
||||
}
|
||||
|
||||
assert_executable() {
|
||||
local desc="$1" path="$2"
|
||||
if [[ -x "$path" ]]; then
|
||||
log " ✅ PASS: $desc"
|
||||
((PASS++))
|
||||
else
|
||||
log " ❌ FAIL: $desc ($path not executable)"
|
||||
((FAIL++))
|
||||
fi
|
||||
}
|
||||
|
||||
assert_script_syntax() {
|
||||
local desc="$1" path="$2"
|
||||
if bash -n "$path" 2>/dev/null; then
|
||||
log " ✅ PASS: $desc (syntax OK)"
|
||||
((PASS++))
|
||||
else
|
||||
log " ❌ FAIL: $desc (syntax error)"
|
||||
((FAIL++))
|
||||
fi
|
||||
}
|
||||
|
||||
assert_contains() {
|
||||
local desc="$1" file="$2" pattern="$3"
|
||||
if grep -q -- "$pattern" "$file" 2>/dev/null; then
|
||||
log " ✅ PASS: $desc"
|
||||
((PASS++))
|
||||
else
|
||||
log " ❌ FAIL: $desc (pattern '$pattern' not found in $file)"
|
||||
((FAIL++))
|
||||
fi
|
||||
}
|
||||
|
||||
# ─── Test: File Structure ────────────────────────────────────────
|
||||
test_file_structure() {
|
||||
log "=== Test: File Structure ==="
|
||||
|
||||
assert_file_exists "ROLLBACK.md exists" "infra/ROLLBACK.md"
|
||||
assert_file_exists "rollback.sh exists" "infra/scripts/rollback.sh"
|
||||
assert_file_exists "rollback-compose.sh exists" "infra/scripts/rollback-compose.sh"
|
||||
assert_file_exists "rollback-migration.sh exists" "infra/scripts/rollback-migration.sh"
|
||||
assert_executable "rollback.sh is executable" "infra/scripts/rollback.sh"
|
||||
assert_executable "rollback-compose.sh is executable" "infra/scripts/rollback-compose.sh"
|
||||
assert_executable "rollback-migration.sh is executable" "infra/scripts/rollback-migration.sh"
|
||||
}
|
||||
|
||||
# ─── Test: Script Syntax ─────────────────────────────────────────
|
||||
test_script_syntax() {
|
||||
log "=== Test: Script Syntax ==="
|
||||
|
||||
assert_script_syntax "rollback.sh syntax" "infra/scripts/rollback.sh"
|
||||
assert_script_syntax "rollback-compose.sh syntax" "infra/scripts/rollback-compose.sh"
|
||||
assert_script_syntax "rollback-migration.sh syntax" "infra/scripts/rollback-migration.sh"
|
||||
}
|
||||
|
||||
# ─── Test: ROLLBACK.md Content ───────────────────────────────────
|
||||
test_documentation() {
|
||||
log "=== Test: Documentation Content ==="
|
||||
|
||||
local doc="infra/ROLLBACK.md"
|
||||
|
||||
for section in "Overview" "ECS Service Rollback" "Docker Compose Rollback" \
|
||||
"Database Migration Rollback" "Automated Rollback Triggers" \
|
||||
"Blue-Green Deployment Rollback" "Rollback Decision Tree" \
|
||||
"Post-Rollback Verification" "Testing Checklist" "Emergency Rollback"; do
|
||||
assert_contains "Section '$section' documented" "$doc" "$section"
|
||||
done
|
||||
|
||||
for cmd in "aws ecs update-service" "docker compose" "drizzle-kit" \
|
||||
"aws rds restore-db-instance" "aws ecs wait services-stable"; do
|
||||
assert_contains "Command '$cmd' documented" "$doc" "$cmd"
|
||||
done
|
||||
}
|
||||
|
||||
# ─── Test: Rollback Script Validation ────────────────────────────
|
||||
test_rollback_script() {
|
||||
log "=== Test: ECS Rollback Script ==="
|
||||
|
||||
# Test invalid environment
|
||||
local exit_code=0
|
||||
bash infra/scripts/rollback.sh invalid_env api >/dev/null 2>&1 || exit_code=$?
|
||||
assert_eq "Invalid environment returns exit code 1" "1" "$exit_code"
|
||||
|
||||
# Test invalid service
|
||||
exit_code=0
|
||||
bash infra/scripts/rollback.sh staging invalid_svc >/dev/null 2>&1 || exit_code=$?
|
||||
assert_eq "Invalid service returns exit code 1" "1" "$exit_code"
|
||||
|
||||
# Verify script has required functions
|
||||
for func in "validate_environment" "validate_service" "rollback_service" \
|
||||
"verify_health" "check_prerequisites" "main"; do
|
||||
assert_contains "Function '$func' defined" "infra/scripts/rollback.sh" "$func"
|
||||
done
|
||||
|
||||
# Verify all services are handled
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
assert_contains "Service '$svc' in SERVICES_LIST" "infra/scripts/rollback.sh" "$svc"
|
||||
done
|
||||
}
|
||||
|
||||
# ─── Test: Compose Rollback Script ───────────────────────────────
|
||||
test_compose_script() {
|
||||
log "=== Test: Docker Compose Rollback Script ==="
|
||||
|
||||
# Test missing tag argument
|
||||
local exit_code=0
|
||||
bash infra/scripts/rollback-compose.sh >/dev/null 2>&1 || exit_code=$?
|
||||
assert_eq "Missing tag returns exit code 1" "1" "$exit_code"
|
||||
|
||||
# Verify compose file exists
|
||||
assert_file_exists "docker-compose.prod.yml exists" "docker-compose.prod.yml"
|
||||
|
||||
# Verify all services are defined in compose
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
assert_contains "Service '$svc' in docker-compose.prod.yml" "docker-compose.prod.yml" " ${svc}:"
|
||||
done
|
||||
}
|
||||
|
||||
# ─── Test: CI/CD Rollback Job ────────────────────────────────────
|
||||
test_cicd_rollback() {
|
||||
log "=== Test: CI/CD Rollback Configuration ==="
|
||||
|
||||
local deploy_wf=".github/workflows/deploy.yml"
|
||||
|
||||
assert_contains "Rollback job defined" "$deploy_wf" "rollback:"
|
||||
assert_contains "Health check triggers rollback" "$deploy_wf" "needs.health-check.result"
|
||||
assert_contains "ECS --rollback flag used" "$deploy_wf" "--rollback"
|
||||
|
||||
for svc in api darkwatch spamshield voiceprint; do
|
||||
assert_contains "Service '$svc' in deploy matrix" "$deploy_wf" "$svc"
|
||||
done
|
||||
}
|
||||
|
||||
# ─── Test: Health Check Configuration ────────────────────────────
|
||||
test_health_checks() {
|
||||
log "=== Test: Health Check Configuration ==="
|
||||
|
||||
assert_contains "Container health check in ECS" "infra/modules/ecs/main.tf" "healthCheck"
|
||||
assert_contains "ALB health check defined" "infra/modules/ecs/main.tf" "health_check"
|
||||
assert_contains "ALB 5xx alarm configured" "infra/modules/cloudwatch/main.tf" "HTTPCode_Elb_5XX_Count"
|
||||
}
|
||||
|
||||
# ─── Test: README References ─────────────────────────────────────
|
||||
test_readme() {
|
||||
log "=== Test: README References ==="
|
||||
|
||||
assert_contains "README references ROLLBACK.md" "infra/README.md" "ROLLBACK.md"
|
||||
assert_contains "README documents rollback.sh" "infra/README.md" "rollback.sh"
|
||||
assert_contains "README documents rollback-compose.sh" "infra/README.md" "rollback-compose.sh"
|
||||
assert_contains "README documents rollback-migration.sh" "infra/README.md" "rollback-migration.sh"
|
||||
}
|
||||
|
||||
# ─── Main ────────────────────────────────────────────────────────
|
||||
main() {
|
||||
log "=== ShieldAI Rollback Test Suite ==="
|
||||
log "Suite: $TEST_SUITE"
|
||||
log ""
|
||||
|
||||
case "$TEST_SUITE" in
|
||||
ecs|all)
|
||||
test_rollback_script
|
||||
test_cicd_rollback
|
||||
test_health_checks
|
||||
;;
|
||||
compose|all)
|
||||
test_compose_script
|
||||
;;
|
||||
migration)
|
||||
log "=== Test: Migration Rollback ==="
|
||||
assert_script_syntax "rollback-migration.sh syntax" "infra/scripts/rollback-migration.sh"
|
||||
assert_contains "Uses Secrets Manager" "infra/scripts/rollback-migration.sh" "secretsmanager"
|
||||
assert_contains "Uses drizzle-kit" "infra/scripts/rollback-migration.sh" "drizzle-kit"
|
||||
;;
|
||||
esac
|
||||
|
||||
test_file_structure
|
||||
test_script_syntax
|
||||
test_documentation
|
||||
test_readme
|
||||
|
||||
log ""
|
||||
log "=== Results ==="
|
||||
log "Passed: $PASS"
|
||||
log "Failed: $FAIL"
|
||||
log ""
|
||||
|
||||
if [[ $FAIL -gt 0 ]]; then
|
||||
log "❌ SOME TESTS FAILED"
|
||||
return 1
|
||||
fi
|
||||
|
||||
log "✅ ALL TESTS PASSED"
|
||||
return 0
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@@ -1,122 +0,0 @@
|
||||
variable "aws_region" {
|
||||
description = "AWS region"
|
||||
type = string
|
||||
default = "us-east-1"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
description = "Deployment environment"
|
||||
type = string
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "production"], var.environment)
|
||||
error_message = "Environment must be one of: dev, staging, production."
|
||||
}
|
||||
}
|
||||
|
||||
variable "project_name" {
|
||||
description = "Project name for resource naming"
|
||||
type = string
|
||||
default = "shieldai"
|
||||
}
|
||||
|
||||
variable "vpc_cidr" {
|
||||
description = "CIDR block for VPC"
|
||||
type = string
|
||||
default = "10.0.0.0/16"
|
||||
}
|
||||
|
||||
variable "az_count" {
|
||||
description = "Number of availability zones"
|
||||
type = number
|
||||
default = 2
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "RDS database name"
|
||||
type = string
|
||||
default = "shieldai"
|
||||
}
|
||||
|
||||
variable "db_instance_class" {
|
||||
description = "RDS instance class"
|
||||
type = string
|
||||
default = "db.t3.medium"
|
||||
}
|
||||
|
||||
variable "db_multi_az" {
|
||||
description = "Enable Multi-AZ deployment"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "db_backup_retention" {
|
||||
description = "RDS backup retention period in days"
|
||||
type = number
|
||||
default = 7
|
||||
}
|
||||
|
||||
variable "elasticache_node_type" {
|
||||
description = "ElastiCache node type"
|
||||
type = string
|
||||
default = "cache.t3.medium"
|
||||
}
|
||||
|
||||
variable "elasticache_num_nodes" {
|
||||
description = "Number of ElastiCache nodes"
|
||||
type = number
|
||||
default = 2
|
||||
}
|
||||
|
||||
variable "services" {
|
||||
description = "ECS services to deploy"
|
||||
type = map(object({
|
||||
cpu = number
|
||||
memory = number
|
||||
port = number
|
||||
}))
|
||||
default = {
|
||||
api = {
|
||||
cpu = 512
|
||||
memory = 1024
|
||||
port = 3000
|
||||
}
|
||||
darkwatch = {
|
||||
cpu = 256
|
||||
memory = 512
|
||||
port = 3001
|
||||
}
|
||||
spamshield = {
|
||||
cpu = 256
|
||||
memory = 512
|
||||
port = 3002
|
||||
}
|
||||
voiceprint = {
|
||||
cpu = 512
|
||||
memory = 1024
|
||||
port = 3003
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "container_images" {
|
||||
description = "Container image tags per service"
|
||||
type = map(string)
|
||||
default = {
|
||||
api = "latest"
|
||||
darkwatch = "latest"
|
||||
spamshield = "latest"
|
||||
voiceprint = "latest"
|
||||
}
|
||||
}
|
||||
|
||||
variable "secrets" {
|
||||
description = "Secrets to store in AWS Secrets Manager"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "domain_name" {
|
||||
description = "Route53 hosted zone domain for ACM cert validation"
|
||||
type = string
|
||||
default = "shieldai.app"
|
||||
}
|
||||
Reference in New Issue
Block a user