This commit is contained in:
2026-05-27 10:30:23 -04:00
parent 5214412fff
commit 1e1773c186
48 changed files with 5351 additions and 160 deletions

59
docs/BACKUPS.md Normal file
View File

@@ -0,0 +1,59 @@
# Backup Strategy
## Database Backups
### Automated Backups
- **Frequency**: Daily at 3 AM UTC
- **Retention**: 7 days daily, 4 weeks weekly, 12 months monthly
- **Storage**: Encrypted S3 bucket in separate region
- **Type**: Full backup + WAL archiving for point-in-time recovery
### Point-in-Time Recovery
- **RPO**: < 15 minutes
- **RTO**: < 1 hour
- **Method**: WAL archive restoration to specific timestamp
### Backup Verification
- Monthly restore test to staging environment
- Automated integrity checks on backup files
- Alert on backup failure within 5 minutes
## Redis Backups
### Configuration
- **RDB snapshots**: Every 6 hours
- **AOF persistence**: Enabled for point-in-time recovery
- **Storage**: Backed up to S3 daily
### Recovery
- Restore from latest RDB snapshot
- Replay AOF for recent changes
- Test data integrity after restore
## Backup Monitoring
### Alerts
- Backup failure → Immediate PagerDuty alert
- Backup size anomaly → Slack notification
- Restore test failure → Jira ticket creation
### Metrics
- Backup duration
- Backup size
- Restore time
- Data loss window (RPO)
## Emergency Procedures
### Complete Data Loss
1. Activate disaster recovery plan
2. Restore from latest backup
3. Replay WAL/AOF for recent changes
4. Verify data integrity
5. Resume operations
### Partial Data Corruption
1. Identify affected data
2. Restore specific tables from backup
3. Verify data consistency
4. Resume operations

51
docs/MIGRATIONS.md Normal file
View File

@@ -0,0 +1,51 @@
# Database Migration Safety Guidelines
## Principles
1. **Additive changes only**: Production migrations should only add new columns, tables, or indexes
2. **No destructive changes**: Never DROP columns or tables in production migrations
3. **Two-phase migrations**: For destructive changes, use a two-phase approach:
- Phase 1: Add new schema, deploy code to use it
- Phase 2: Remove old schema after code is stable
## Migration Process
### Before Migration
1. Test migration on staging database
2. Verify application works with new schema
3. Take database backup
4. Document rollback procedure
### During Migration
1. Run migration in dry-run mode first
2. Apply migration to production
3. Verify migration completed successfully
4. Monitor application for errors
### After Migration
1. Verify all queries work correctly
2. Monitor performance metrics
3. Update documentation if needed
## Rollback Procedures
### Emergency Rollback
1. Stop application deployment
2. Restore database from backup
3. Revert to previous application version
4. Verify application functionality
### Planned Rollback
1. Deploy previous application version
2. Run rollback migration
3. Verify application functionality
4. Update monitoring dashboards
## Migration Checklist
- [ ] Migration tested on staging
- [ ] Backup taken before production migration
- [ ] Rollback procedure documented
- [ ] Team notified of maintenance window
- [ ] Monitoring dashboards prepared
- [ ] Support team on standby