get to prod tasks
This commit is contained in:
77
tasks/web-production/31-db-backup.md
Normal file
77
tasks/web-production/31-db-backup.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# 31. Backup Strategy & Point-in-Time Recovery
|
||||
|
||||
meta:
|
||||
id: web-production-31
|
||||
feature: web-production
|
||||
priority: P1
|
||||
depends_on: []
|
||||
tags: [database, reliability, production]
|
||||
|
||||
objective:
|
||||
- Implement automated database backups with point-in-time recovery capability
|
||||
|
||||
deliverables:
|
||||
- Automated daily backups
|
||||
- Point-in-time recovery setup
|
||||
- Backup testing and verification
|
||||
- Retention policy
|
||||
|
||||
steps:
|
||||
1. Set up automated backups:
|
||||
- If PostgreSQL: configure pg_dump cron job or managed backups (RDS, Cloud SQL)
|
||||
- If SQLite/Turso: configure Turso database branching/backups
|
||||
- Daily full backups at off-peak hours (3 AM UTC)
|
||||
- Hourly incremental backups (WAL archiving for Postgres)
|
||||
2. Configure backup storage:
|
||||
- Store in separate region/cloud provider (S3, GCS, R2)
|
||||
- Encrypt backups at rest
|
||||
- Versioning enabled (protect against deletion)
|
||||
3. Implement point-in-time recovery:
|
||||
- WAL archiving for PostgreSQL
|
||||
- Transaction log backups every 15 minutes
|
||||
- Test recovery to specific timestamp
|
||||
4. Add backup monitoring:
|
||||
- Alert on backup failure
|
||||
- Track backup size and duration
|
||||
- Verify backup integrity (checksum)
|
||||
5. Test restore procedures:
|
||||
- Monthly restore test to staging environment
|
||||
- Document step-by-step restore process
|
||||
- Measure RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
|
||||
- Target: RTO < 1 hour, RPO < 15 minutes
|
||||
6. Document retention:
|
||||
- Daily backups: 7 days
|
||||
- Weekly backups: 4 weeks
|
||||
- Monthly backups: 12 months
|
||||
- Annual backups: 7 years (compliance)
|
||||
7. Add Redis backup:
|
||||
- RDB snapshots every 6 hours
|
||||
- AOF persistence for point-in-time
|
||||
- Backup to S3/GCS
|
||||
|
||||
tests:
|
||||
- Integration: Test backup creation
|
||||
- Recovery: Test restore to staging
|
||||
- Monitoring: Verify backup alerts
|
||||
|
||||
acceptance_criteria:
|
||||
- Daily automated backups running successfully
|
||||
- Backups stored in separate region with encryption
|
||||
- Point-in-time recovery tested and working
|
||||
- Backup failures trigger alerts within 5 minutes
|
||||
- Monthly restore test completed and documented
|
||||
- RTO < 1 hour, RPO < 15 minutes
|
||||
- Retention policy enforced automatically
|
||||
- Redis backups included in strategy
|
||||
|
||||
validation:
|
||||
- Check backup storage → daily backups present
|
||||
- Trigger restore test → staging database restored successfully
|
||||
- Simulate backup failure → alert received
|
||||
- Check retention → old backups purged per policy
|
||||
|
||||
notes:
|
||||
- Turso offers automatic backups for SQLite — verify configuration
|
||||
- RDS automated backups are easiest for PostgreSQL
|
||||
- Test restores are critical — untested backups are useless
|
||||
- Document restore process for on-call engineers
|
||||
Reference in New Issue
Block a user