2.6 KiB
2.6 KiB
31. Backup Strategy & Point-in-Time Recovery
meta: id: web-production-31 feature: web-production priority: P1 depends_on: [] tags: [database, reliability, production]
objective:
- Implement automated database backups with point-in-time recovery capability
deliverables:
- Automated daily backups
- Point-in-time recovery setup
- Backup testing and verification
- Retention policy
steps:
- Set up automated backups:
- If PostgreSQL: configure pg_dump cron job or managed backups (RDS, Cloud SQL)
- If SQLite/Turso: configure Turso database branching/backups
- Daily full backups at off-peak hours (3 AM UTC)
- Hourly incremental backups (WAL archiving for Postgres)
- Configure backup storage:
- Store in separate region/cloud provider (S3, GCS, R2)
- Encrypt backups at rest
- Versioning enabled (protect against deletion)
- Implement point-in-time recovery:
- WAL archiving for PostgreSQL
- Transaction log backups every 15 minutes
- Test recovery to specific timestamp
- Add backup monitoring:
- Alert on backup failure
- Track backup size and duration
- Verify backup integrity (checksum)
- Test restore procedures:
- Monthly restore test to staging environment
- Document step-by-step restore process
- Measure RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
- Target: RTO < 1 hour, RPO < 15 minutes
- Document retention:
- Daily backups: 7 days
- Weekly backups: 4 weeks
- Monthly backups: 12 months
- Annual backups: 7 years (compliance)
- Add Redis backup:
- RDB snapshots every 6 hours
- AOF persistence for point-in-time
- Backup to S3/GCS
tests:
- Integration: Test backup creation
- Recovery: Test restore to staging
- Monitoring: Verify backup alerts
acceptance_criteria:
- Daily automated backups running successfully
- Backups stored in separate region with encryption
- Point-in-time recovery tested and working
- Backup failures trigger alerts within 5 minutes
- Monthly restore test completed and documented
- RTO < 1 hour, RPO < 15 minutes
- Retention policy enforced automatically
- Redis backups included in strategy
validation:
- Check backup storage → daily backups present
- Trigger restore test → staging database restored successfully
- Simulate backup failure → alert received
- Check retention → old backups purged per policy
notes:
- Turso offers automatic backups for SQLite — verify configuration
- RDS automated backups are easiest for PostgreSQL
- Test restores are critical — untested backups are useless
- Document restore process for on-call engineers