Skip to content

Data Retention & Cleanup

Retention Policy Model

flow8 implements automated data cleanup via retention policies that specify:

  1. Target scope — Which data to delete (flows, audit logs, etc.)
  2. Cadence — Time-based retention (e.g., keep 30 days)
  3. Entry count — Size-based retention (e.g., keep last 1000 entries)
  4. Enforced minimum — Safety floor (never delete recent data)

Policy Types

General Retention Policy

Applied to all flows in a company:

type RetentionPolicy struct {
ID primitive.ObjectID
CompanyID primitive.ObjectID
Scope string // "flows.all", "audit"
Cadence time.Duration // e.g., 30 * 24 * time.Hour
MinEntries int // e.g., 100
EnforcedMinimum time.Duration // e.g., 14 * 24 * time.Hour (safety floor)
Enabled bool
CreatedAt time.Time
UpdatedAt time.Time
}

Flow-Specific Policy

Override general policy for a specific flow (e.g., high-priority audit flow):

type FlowRetentionPolicy struct {
ID primitive.ObjectID
CompanyID primitive.ObjectID
FlowID primitive.ObjectID
Cadence time.Duration // Override general policy
MinEntries int
EnforcedMinimum time.Duration
Enabled bool
CreatedAt time.Time
}

Flow Group Policy

Apply retention to multiple flows:

type FlowGroupRetentionPolicy struct {
ID primitive.ObjectID
CompanyID primitive.ObjectID
FlowGroupID primitive.ObjectID
Scope string // "flows.group"
Cadence time.Duration
MinEntries int
EnforcedMinimum time.Duration
Enabled bool
CreatedAt time.Time
}

Configuration

Environment Variables

Terminal window
# Cleanup job interval
RETENTION_CLEANUP_INTERVAL=2m
# General policies
RETENTION_AUDIT_CADENCE=30d
RETENTION_AUDIT_MIN_ENTRIES=10
RETENTION_AUDIT_ENFORCED_MINIMUM=14d
RETENTION_FLOWS_ALL_CADENCE=90d
RETENTION_FLOWS_ALL_MIN_ENTRIES=100
RETENTION_FLOWS_FILTERED_CADENCE=30d
RETENTION_FLOWS_FILTERED_MIN_ENTRIES=10
RETENTION_FLOWS_FILTERED_ENFORCED_MINIMUM=3d

YAML Configuration

config/config.yml
retention:
cleanup_interval: "2m" # Run cleanup every 2 minutes
policies:
audit_logs:
cadence: "30d" # Keep 30 days
min_entries: 10 # Keep at least 10 entries
enforced_minimum: "14d" # Never delete < 14 days old
flows_all:
cadence: "90d"
min_entries: 100
enforced_minimum: "0" # No safety floor
flows_filtered:
cadence: "30d"
min_entries: 10
enforced_minimum: "3d" # GDPR: min 3 days for data subject deletion
plays:
cadence: "7d"
min_entries: 1000
play_layers:
cadence: "7d"
min_entries: 5000
kv_stores:
cadence: "1d"
min_entries: 100

Cleanup Job

Execution

The cleanup job runs periodically:

pkg/service/retention_policy_service.go
func (s *RetentionPolicyService) CleanupOldEntries(ctx context.Context, companyID string) error {
policies := s.loadPolicies(companyID)
for _, policy := range policies {
// 1. Calculate cutoff time
cutoffTime := time.Now().Add(-policy.Cadence)
// 2. Apply enforced minimum
if policy.EnforcedMinimum > 0 {
enforcedCutoff := time.Now().Add(-policy.EnforcedMinimum)
if cutoffTime.Before(enforcedCutoff) {
cutoffTime = enforcedCutoff
}
}
// 3. Count entries to delete
count, _ := s.db.Collection(policy.Collection).CountDocuments(ctx, bson.M{
"company_id": companyID,
"created_at": bson.M{ "$lt": cutoffTime },
})
// 4. Honor min_entries
if count > policy.MinEntries {
// Delete in batches to avoid locking
batchSize := 500
for i := 0; i < count; i += batchSize {
s.db.Collection(policy.Collection).DeleteMany(ctx, bson.M{
"company_id": companyID,
"created_at": bson.M{ "$lt": cutoffTime },
}, &options.DeleteOptions{}.SetLimit(int64(batchSize)))
}
}
}
}

Audit trail:

{
"action_type": "retention_cleanup_run",
"collection": "plays",
"company_id": "company_123",
"entries_deleted": 1500,
"duration_ms": 2345,
"timestamp": "2026-04-04T10:15:00Z"
}

Cleanup Schedule

┌─────────────────────────────────────────────┐
│ Cleanup Job Runs Every 2 Minutes │
├─────────────────────────────────────────────┤
│ 10:00 → Check audit logs, delete old │
│ Check flows, delete old │
│ Check plays, delete old │
│ Duration: ~3 seconds │
│ │
│ 10:02 → Repeat... │
│ 10:04 → Repeat... │
└─────────────────────────────────────────────┘

Enforced Minimums

Retention policies have safety floors to prevent accidental data loss:

Audit Logs

Enforced minimum: 14 days

Even if policy specifies 1 day, audit logs older than 14 days are deleted automatically:

Cleanup Decision Tree:
Is entry older than enforced minimum (14d)?
├─ YES → Can delete if exceeds min_entries
└─ NO → KEEP (always)

Reason: Regulatory compliance, incident investigation window

Filtered Flows

Enforced minimum: 3 days

GDPR right to erasure grace period:

User requests: "Delete all my data"
Admin initiates deletion
System marks as "filtered" (soft delete)
After 3 days, hard delete via retention

This grace period allows rollback if deletion is requested by mistake.

All Collections

Minimum 10 entries

Even if policy specifies “keep 1”, at least 10 entries are retained:

Count entries older than cutoff: 2
Min entries threshold: 10
Decision: KEEP (because 2 < 10)

Policy Hierarchy

When multiple policies apply, use highest retention:

Flow-specific policy (highest priority)
Flow group policy
General policy (lowest priority)
Example:
- General: Keep 30 days
- Flow group: Keep 60 days
- Flow-specific: Keep 7 days
Evaluation for flow X in group Y:
1. Is flow-specific policy? YES → Use 7 days? NO, too low (check enforced min)
2. Is enforced min? YES (14 days) → Use 14 days
3. Result: Keep 14 days (enforced minimum wins)

REST API

Create Policy

Terminal window
POST /api/v1/admin/retention-policies
{
"scope": "flows.filtered",
"cadence": "7d",
"min_entries": 50,
"enforced_minimum": "3d",
"enabled": true
}

List Policies

Terminal window
GET /api/v1/admin/retention-policies?company_id=company_123
Response:
{
"policies": [
{
"id": "policy_123",
"scope": "audit_logs",
"cadence": "30d",
"enforced_minimum": "14d",
"enabled": true,
"next_cleanup": "2026-04-04T10:17:00Z"
}
]
}

Update Policy

Terminal window
PATCH /api/v1/admin/retention-policies/policy_123
{
"cadence": "60d",
"enabled": false
}

Check Last Cleanup

Terminal window
GET /api/v1/admin/retention-policies/stats
Response:
{
"last_cleanup": "2026-04-04T10:15:00Z",
"last_duration_ms": 3456,
"entries_deleted": 1234,
"next_cleanup": "2026-04-04T10:17:00Z",
"collections_processed": ["audit_logs", "plays", "flows"]
}

Compliance Considerations

GDPR Right to Erasure

Users can request deletion of personal data:

User Request: "Delete my data"
Admin deletes user account
System marks related flows as "filtered"
After 3-day grace period, hard deleted by retention
Audit log: "User data deleted" (kept for compliance)

Audit logs are NOT deleted (to satisfy audit trail requirements) but personal data is masked:

func (s *AuditService) MaskPersonalData(log *DBAuditLog) {
if time.Since(log.Timestamp) > 30*24*time.Hour {
log.UserEmail = "[REDACTED]"
log.UserIPAddress = "[REDACTED]"
}
}

HIPAA Retention

Healthcare data requires longer retention:

retention:
policies:
healthcare_flows:
cadence: "6y" # 6 years per HIPAA
enforced_minimum: "6y"
enabled: true

SOC 2 Audit Trail

For SOC 2 certification, maintain 12+ months audit trail:

retention:
policies:
audit_logs:
cadence: "365d" # 1 year
enforced_minimum: "365d"
enabled: true

Monitoring

Cleanup Metrics

flow8_retention_cleanup_duration_seconds
- collection
- status
flow8_retention_entries_deleted_total
- collection
- company_id
flow8_retention_policy_evaluation_seconds
- collection
- action (keep / delete)

Alerting

alerts:
- alert: RetentionCleanupFailed
expr: flow8_retention_cleanup_duration_seconds > 30
for: 5m
annotations:
summary: "Retention cleanup took > 30 seconds"
- alert: RetentionMinimumBreach
expr: flow8_audit_log_count < 10
for: 1h
annotations:
summary: "Audit logs below minimum threshold (< 10 entries)"

Troubleshooting

Data Deleted Too Aggressively

Cause: Policy cadence too short

Solution:

# Increase retention period
retention:
policies:
plays:
cadence: "30d" # was 7d

Data Not Being Deleted

Cause: Min entries preventing cleanup

Solution:

Terminal window
# Check entry count
db.plays.countDocuments({
company_id: ObjectId("company_123"),
created_at: { $lt: new Date(Date.now() - 30*24*60*60*1000) }
})
# If count < min_entries, increase min_entries or lower cadence
# Or check if enforced_minimum is blocking (should use highest of enforced/cadence)

Cleanup Job Taking Too Long

Cause: Deleting too many entries in single batch

Solution:

// Reduce batch size in cleanup job
const BATCH_SIZE = 250 // was 500

Or increase cleanup interval:

Terminal window
RETENTION_CLEANUP_INTERVAL=5m # was 2m

Best Practices

  1. Set enforced minimums appropriately: Balance retention needs with storage cost

  2. Monitor cleanup metrics: Alert if cleanup exceeds expected duration

  3. Test before applying: Create test flow, verify policy behavior

  4. Review quarterly: Adjust policies as storage and compliance needs change

  5. Document policy rationale: Add comments explaining why policy was set

  6. Separate policies by sensitivity: High-sensitivity data → longer retention

  7. Plan for compliance: Ensure retention aligns with regulatory requirements

  8. Backup before aggressive cleanup: Take MongoDB backup before testing aggressive policies