Files
Kordant/tasks/web-production/31-db-backup.md
2026-05-26 16:06:34 -04:00

2.6 KiB

31. Backup Strategy & Point-in-Time Recovery

meta: id: web-production-31 feature: web-production priority: P1 depends_on: [] tags: [database, reliability, production]

objective:

  • Implement automated database backups with point-in-time recovery capability

deliverables:

  • Automated daily backups
  • Point-in-time recovery setup
  • Backup testing and verification
  • Retention policy

steps:

  1. Set up automated backups:
    • If PostgreSQL: configure pg_dump cron job or managed backups (RDS, Cloud SQL)
    • If SQLite/Turso: configure Turso database branching/backups
    • Daily full backups at off-peak hours (3 AM UTC)
    • Hourly incremental backups (WAL archiving for Postgres)
  2. Configure backup storage:
    • Store in separate region/cloud provider (S3, GCS, R2)
    • Encrypt backups at rest
    • Versioning enabled (protect against deletion)
  3. Implement point-in-time recovery:
    • WAL archiving for PostgreSQL
    • Transaction log backups every 15 minutes
    • Test recovery to specific timestamp
  4. Add backup monitoring:
    • Alert on backup failure
    • Track backup size and duration
    • Verify backup integrity (checksum)
  5. Test restore procedures:
    • Monthly restore test to staging environment
    • Document step-by-step restore process
    • Measure RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
    • Target: RTO < 1 hour, RPO < 15 minutes
  6. Document retention:
    • Daily backups: 7 days
    • Weekly backups: 4 weeks
    • Monthly backups: 12 months
    • Annual backups: 7 years (compliance)
  7. Add Redis backup:
    • RDB snapshots every 6 hours
    • AOF persistence for point-in-time
    • Backup to S3/GCS

tests:

  • Integration: Test backup creation
  • Recovery: Test restore to staging
  • Monitoring: Verify backup alerts

acceptance_criteria:

  • Daily automated backups running successfully
  • Backups stored in separate region with encryption
  • Point-in-time recovery tested and working
  • Backup failures trigger alerts within 5 minutes
  • Monthly restore test completed and documented
  • RTO < 1 hour, RPO < 15 minutes
  • Retention policy enforced automatically
  • Redis backups included in strategy

validation:

  • Check backup storage → daily backups present
  • Trigger restore test → staging database restored successfully
  • Simulate backup failure → alert received
  • Check retention → old backups purged per policy

notes:

  • Turso offers automatic backups for SQLite — verify configuration
  • RDS automated backups are easiest for PostgreSQL
  • Test restores are critical — untested backups are useless
  • Document restore process for on-call engineers