Data Retention & Cleanup
Retention Policy Model
flow8 implements automated data cleanup via retention policies that specify:
- Target scope — Which data to delete (flows, audit logs, etc.)
- Cadence — Time-based retention (e.g., keep 30 days)
- Entry count — Size-based retention (e.g., keep last 1000 entries)
- Enforced minimum — Safety floor (never delete recent data)
Policy Types
General Retention Policy
Applied to all flows in a company:
type RetentionPolicy struct { ID primitive.ObjectID CompanyID primitive.ObjectID Scope string // "flows.all", "audit" Cadence time.Duration // e.g., 30 * 24 * time.Hour MinEntries int // e.g., 100 EnforcedMinimum time.Duration // e.g., 14 * 24 * time.Hour (safety floor) Enabled bool CreatedAt time.Time UpdatedAt time.Time}Flow-Specific Policy
Override general policy for a specific flow (e.g., high-priority audit flow):
type FlowRetentionPolicy struct { ID primitive.ObjectID CompanyID primitive.ObjectID FlowID primitive.ObjectID Cadence time.Duration // Override general policy MinEntries int EnforcedMinimum time.Duration Enabled bool CreatedAt time.Time}Flow Group Policy
Apply retention to multiple flows:
type FlowGroupRetentionPolicy struct { ID primitive.ObjectID CompanyID primitive.ObjectID FlowGroupID primitive.ObjectID Scope string // "flows.group" Cadence time.Duration MinEntries int EnforcedMinimum time.Duration Enabled bool CreatedAt time.Time}Configuration
Environment Variables
# Cleanup job intervalRETENTION_CLEANUP_INTERVAL=2m
# General policiesRETENTION_AUDIT_CADENCE=30dRETENTION_AUDIT_MIN_ENTRIES=10RETENTION_AUDIT_ENFORCED_MINIMUM=14d
RETENTION_FLOWS_ALL_CADENCE=90dRETENTION_FLOWS_ALL_MIN_ENTRIES=100
RETENTION_FLOWS_FILTERED_CADENCE=30dRETENTION_FLOWS_FILTERED_MIN_ENTRIES=10RETENTION_FLOWS_FILTERED_ENFORCED_MINIMUM=3dYAML Configuration
retention: cleanup_interval: "2m" # Run cleanup every 2 minutes policies: audit_logs: cadence: "30d" # Keep 30 days min_entries: 10 # Keep at least 10 entries enforced_minimum: "14d" # Never delete < 14 days old
flows_all: cadence: "90d" min_entries: 100 enforced_minimum: "0" # No safety floor
flows_filtered: cadence: "30d" min_entries: 10 enforced_minimum: "3d" # GDPR: min 3 days for data subject deletion
plays: cadence: "7d" min_entries: 1000
play_layers: cadence: "7d" min_entries: 5000
kv_stores: cadence: "1d" min_entries: 100Cleanup Job
Execution
The cleanup job runs periodically:
func (s *RetentionPolicyService) CleanupOldEntries(ctx context.Context, companyID string) error { policies := s.loadPolicies(companyID)
for _, policy := range policies { // 1. Calculate cutoff time cutoffTime := time.Now().Add(-policy.Cadence)
// 2. Apply enforced minimum if policy.EnforcedMinimum > 0 { enforcedCutoff := time.Now().Add(-policy.EnforcedMinimum) if cutoffTime.Before(enforcedCutoff) { cutoffTime = enforcedCutoff } }
// 3. Count entries to delete count, _ := s.db.Collection(policy.Collection).CountDocuments(ctx, bson.M{ "company_id": companyID, "created_at": bson.M{ "$lt": cutoffTime }, })
// 4. Honor min_entries if count > policy.MinEntries { // Delete in batches to avoid locking batchSize := 500 for i := 0; i < count; i += batchSize { s.db.Collection(policy.Collection).DeleteMany(ctx, bson.M{ "company_id": companyID, "created_at": bson.M{ "$lt": cutoffTime }, }, &options.DeleteOptions{}.SetLimit(int64(batchSize))) } } }}Audit trail:
{ "action_type": "retention_cleanup_run", "collection": "plays", "company_id": "company_123", "entries_deleted": 1500, "duration_ms": 2345, "timestamp": "2026-04-04T10:15:00Z"}Cleanup Schedule
┌─────────────────────────────────────────────┐│ Cleanup Job Runs Every 2 Minutes │├─────────────────────────────────────────────┤│ 10:00 → Check audit logs, delete old ││ Check flows, delete old ││ Check plays, delete old ││ Duration: ~3 seconds ││ ││ 10:02 → Repeat... ││ 10:04 → Repeat... │└─────────────────────────────────────────────┘Enforced Minimums
Retention policies have safety floors to prevent accidental data loss:
Audit Logs
Enforced minimum: 14 days
Even if policy specifies 1 day, audit logs older than 14 days are deleted automatically:
Cleanup Decision Tree:
Is entry older than enforced minimum (14d)? ├─ YES → Can delete if exceeds min_entries └─ NO → KEEP (always)Reason: Regulatory compliance, incident investigation window
Filtered Flows
Enforced minimum: 3 days
GDPR right to erasure grace period:
User requests: "Delete all my data" ↓Admin initiates deletion ↓System marks as "filtered" (soft delete) ↓After 3 days, hard delete via retentionThis grace period allows rollback if deletion is requested by mistake.
All Collections
Minimum 10 entries
Even if policy specifies “keep 1”, at least 10 entries are retained:
Count entries older than cutoff: 2Min entries threshold: 10Decision: KEEP (because 2 < 10)Policy Hierarchy
When multiple policies apply, use highest retention:
Flow-specific policy (highest priority) ↓Flow group policy ↓General policy (lowest priority)
Example:- General: Keep 30 days- Flow group: Keep 60 days- Flow-specific: Keep 7 days
Evaluation for flow X in group Y:1. Is flow-specific policy? YES → Use 7 days? NO, too low (check enforced min)2. Is enforced min? YES (14 days) → Use 14 days3. Result: Keep 14 days (enforced minimum wins)REST API
Create Policy
POST /api/v1/admin/retention-policies{ "scope": "flows.filtered", "cadence": "7d", "min_entries": 50, "enforced_minimum": "3d", "enabled": true}List Policies
GET /api/v1/admin/retention-policies?company_id=company_123
Response:{ "policies": [ { "id": "policy_123", "scope": "audit_logs", "cadence": "30d", "enforced_minimum": "14d", "enabled": true, "next_cleanup": "2026-04-04T10:17:00Z" } ]}Update Policy
PATCH /api/v1/admin/retention-policies/policy_123{ "cadence": "60d", "enabled": false}Check Last Cleanup
GET /api/v1/admin/retention-policies/stats
Response:{ "last_cleanup": "2026-04-04T10:15:00Z", "last_duration_ms": 3456, "entries_deleted": 1234, "next_cleanup": "2026-04-04T10:17:00Z", "collections_processed": ["audit_logs", "plays", "flows"]}Compliance Considerations
GDPR Right to Erasure
Users can request deletion of personal data:
User Request: "Delete my data" ↓Admin deletes user account ↓System marks related flows as "filtered" ↓After 3-day grace period, hard deleted by retention ↓Audit log: "User data deleted" (kept for compliance)Audit logs are NOT deleted (to satisfy audit trail requirements) but personal data is masked:
func (s *AuditService) MaskPersonalData(log *DBAuditLog) { if time.Since(log.Timestamp) > 30*24*time.Hour { log.UserEmail = "[REDACTED]" log.UserIPAddress = "[REDACTED]" }}HIPAA Retention
Healthcare data requires longer retention:
retention: policies: healthcare_flows: cadence: "6y" # 6 years per HIPAA enforced_minimum: "6y" enabled: trueSOC 2 Audit Trail
For SOC 2 certification, maintain 12+ months audit trail:
retention: policies: audit_logs: cadence: "365d" # 1 year enforced_minimum: "365d" enabled: trueMonitoring
Cleanup Metrics
flow8_retention_cleanup_duration_seconds - collection - status
flow8_retention_entries_deleted_total - collection - company_id
flow8_retention_policy_evaluation_seconds - collection - action (keep / delete)Alerting
alerts:- alert: RetentionCleanupFailed expr: flow8_retention_cleanup_duration_seconds > 30 for: 5m annotations: summary: "Retention cleanup took > 30 seconds"
- alert: RetentionMinimumBreach expr: flow8_audit_log_count < 10 for: 1h annotations: summary: "Audit logs below minimum threshold (< 10 entries)"Troubleshooting
Data Deleted Too Aggressively
Cause: Policy cadence too short
Solution:
# Increase retention periodretention: policies: plays: cadence: "30d" # was 7dData Not Being Deleted
Cause: Min entries preventing cleanup
Solution:
# Check entry countdb.plays.countDocuments({ company_id: ObjectId("company_123"), created_at: { $lt: new Date(Date.now() - 30*24*60*60*1000) }})
# If count < min_entries, increase min_entries or lower cadence# Or check if enforced_minimum is blocking (should use highest of enforced/cadence)Cleanup Job Taking Too Long
Cause: Deleting too many entries in single batch
Solution:
// Reduce batch size in cleanup jobconst BATCH_SIZE = 250 // was 500Or increase cleanup interval:
RETENTION_CLEANUP_INTERVAL=5m # was 2mBest Practices
-
Set enforced minimums appropriately: Balance retention needs with storage cost
-
Monitor cleanup metrics: Alert if cleanup exceeds expected duration
-
Test before applying: Create test flow, verify policy behavior
-
Review quarterly: Adjust policies as storage and compliance needs change
-
Document policy rationale: Add comments explaining why policy was set
-
Separate policies by sensitivity: High-sensitivity data → longer retention
-
Plan for compliance: Ensure retention aligns with regulatory requirements
-
Backup before aggressive cleanup: Take MongoDB backup before testing aggressive policies