Files
Kordant/docs/BACKUPS.md
2026-05-27 10:30:23 -04:00

1.4 KiB

Backup Strategy

Database Backups

Automated Backups

  • Frequency: Daily at 3 AM UTC
  • Retention: 7 days daily, 4 weeks weekly, 12 months monthly
  • Storage: Encrypted S3 bucket in separate region
  • Type: Full backup + WAL archiving for point-in-time recovery

Point-in-Time Recovery

  • RPO: < 15 minutes
  • RTO: < 1 hour
  • Method: WAL archive restoration to specific timestamp

Backup Verification

  • Monthly restore test to staging environment
  • Automated integrity checks on backup files
  • Alert on backup failure within 5 minutes

Redis Backups

Configuration

  • RDB snapshots: Every 6 hours
  • AOF persistence: Enabled for point-in-time recovery
  • Storage: Backed up to S3 daily

Recovery

  • Restore from latest RDB snapshot
  • Replay AOF for recent changes
  • Test data integrity after restore

Backup Monitoring

Alerts

  • Backup failure → Immediate PagerDuty alert
  • Backup size anomaly → Slack notification
  • Restore test failure → Jira ticket creation

Metrics

  • Backup duration
  • Backup size
  • Restore time
  • Data loss window (RPO)

Emergency Procedures

Complete Data Loss

  1. Activate disaster recovery plan
  2. Restore from latest backup
  3. Replay WAL/AOF for recent changes
  4. Verify data integrity
  5. Resume operations

Partial Data Corruption

  1. Identify affected data
  2. Restore specific tables from backup
  3. Verify data consistency
  4. Resume operations