78 lines
2.6 KiB
Markdown
78 lines
2.6 KiB
Markdown
# 31. Backup Strategy & Point-in-Time Recovery
|
|
|
|
meta:
|
|
id: web-production-31
|
|
feature: web-production
|
|
priority: P1
|
|
depends_on: []
|
|
tags: [database, reliability, production]
|
|
|
|
objective:
|
|
- Implement automated database backups with point-in-time recovery capability
|
|
|
|
deliverables:
|
|
- Automated daily backups
|
|
- Point-in-time recovery setup
|
|
- Backup testing and verification
|
|
- Retention policy
|
|
|
|
steps:
|
|
1. Set up automated backups:
|
|
- If PostgreSQL: configure pg_dump cron job or managed backups (RDS, Cloud SQL)
|
|
- If SQLite/Turso: configure Turso database branching/backups
|
|
- Daily full backups at off-peak hours (3 AM UTC)
|
|
- Hourly incremental backups (WAL archiving for Postgres)
|
|
2. Configure backup storage:
|
|
- Store in separate region/cloud provider (S3, GCS, R2)
|
|
- Encrypt backups at rest
|
|
- Versioning enabled (protect against deletion)
|
|
3. Implement point-in-time recovery:
|
|
- WAL archiving for PostgreSQL
|
|
- Transaction log backups every 15 minutes
|
|
- Test recovery to specific timestamp
|
|
4. Add backup monitoring:
|
|
- Alert on backup failure
|
|
- Track backup size and duration
|
|
- Verify backup integrity (checksum)
|
|
5. Test restore procedures:
|
|
- Monthly restore test to staging environment
|
|
- Document step-by-step restore process
|
|
- Measure RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
|
|
- Target: RTO < 1 hour, RPO < 15 minutes
|
|
6. Document retention:
|
|
- Daily backups: 7 days
|
|
- Weekly backups: 4 weeks
|
|
- Monthly backups: 12 months
|
|
- Annual backups: 7 years (compliance)
|
|
7. Add Redis backup:
|
|
- RDB snapshots every 6 hours
|
|
- AOF persistence for point-in-time
|
|
- Backup to S3/GCS
|
|
|
|
tests:
|
|
- Integration: Test backup creation
|
|
- Recovery: Test restore to staging
|
|
- Monitoring: Verify backup alerts
|
|
|
|
acceptance_criteria:
|
|
- Daily automated backups running successfully
|
|
- Backups stored in separate region with encryption
|
|
- Point-in-time recovery tested and working
|
|
- Backup failures trigger alerts within 5 minutes
|
|
- Monthly restore test completed and documented
|
|
- RTO < 1 hour, RPO < 15 minutes
|
|
- Retention policy enforced automatically
|
|
- Redis backups included in strategy
|
|
|
|
validation:
|
|
- Check backup storage → daily backups present
|
|
- Trigger restore test → staging database restored successfully
|
|
- Simulate backup failure → alert received
|
|
- Check retention → old backups purged per policy
|
|
|
|
notes:
|
|
- Turso offers automatic backups for SQLite — verify configuration
|
|
- RDS automated backups are easiest for PostgreSQL
|
|
- Test restores are critical — untested backups are useless
|
|
- Document restore process for on-call engineers
|