Scaling & Performance

Scaling Architecture

flow8 is designed for horizontal scalability with a stateless application layer and a shared data layer:

┌─────────────────────────────────────────────────────┐
│              Load Balancer / Ingress                 │
│         (Round-robin, least connections)            │
└──────────┬───────────────┬───────────────┬──────────┘
           │               │               │
       ┌───▼───┐       ┌───▼───┐      ┌───▼───┐
       │ flow8 │       │ flow8 │      │ flow8 │
       │ i1    │       │ i2    │      │ i3    │
       │ :4454 │       │ :4454 │      │ :4454 │
       └───┬───┘       └───┬───┘      └───┬───┘
           │               │               │
           └───────────────┴───────────────┘
                       │
              ┌────────▼────────┐
              │ MongoDB Replica │
              │ Set (shared)    │
              │ or Atlas        │
              └─────────────────┘

Stateless Application

Each instance is independent:

No in-memory state shared between instances
Sessions stored in MongoDB (not in-memory)
Flows and plays retrieved from MongoDB (not cached locally)
Cache is per-instance, non-critical (can lose without affecting correctness)

Benefits:

Scale up or down instantly
Survive instance failure without session loss
Load balance with simple round-robin
Deploy updates without stopping other instances

Channel Port Management

Each instance reserves a port range for async execution:

Instance 1: Ports 7701-7799 (100 concurrent channels)
Instance 2: Ports 7800-7899 (100 concurrent channels)
Instance 3: Ports 7900-7999 (100 concurrent channels)

Configuration:

# Instance 1
CHANNEL_PORT_RANGE=7701-7799
CHANNEL_POOL_SIZE=100

# Instance 2 (different node)
CHANNEL_PORT_RANGE=7800-7899
CHANNEL_POOL_SIZE=100

Kubernetes:

# Statefulset (ensures unique ordinals)
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: flow8
spec:
  serviceName: flow8
  replicas: 3
  template:
    spec:
      containers:
      - name: flow8
        env:
        - name: CHANNEL_PORT_RANGE
          value: "$(POD_ORDINAL)700-$(POD_ORDINAL)799"
        - name: POD_ORDINAL
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['statefulset.kubernetes.io/pod-ordinal']

But flow8 is not stateful, so standard Deployment is preferred (with manual port range assignment).

Horizontal Scaling

Adding Instances (Manual)

# Start second instance (different host)
docker run \
  -e MONGODB_URI=mongodb://mongo:27017 \
  -e SERVER_PORT=4454 \
  -e CHANNEL_PORT_RANGE=7800-7899 \
  -p 4454:4454 \
  -p 7800-7899:7800-7899 \
  flow8core:latest

Both instances connect to same MongoDB and serve requests independently.

Kubernetes Scaling

# Scale to 5 replicas
kubectl scale deployment flow8 -n flow8 --replicas=5

# Or edit deployment
kubectl edit deployment flow8 -n flow8

When scaling, ensure channel port ranges don’t overlap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: flow8-channel-ranges
  namespace: flow8
data:
  ranges.yaml: |
    - replica: 0
      range: "7701-7799"
    - replica: 1
      range: "7800-7899"
    - replica: 2
      range: "7900-7999"

Load Balancing

Recommended: Session-sticky load balancing to improve cache hit rate

# nginx config
upstream flow8_backend {
  least_conn;  # Balance by active connections
  server flow8-1:4454;
  server flow8-2:4454;
  server flow8-3:4454;
}

server {
  listen 80;
  location / {
    proxy_pass http://flow8_backend;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

In Kubernetes (Ingress):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flow8
spec:
  rules:
  - host: app.flow8.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: flow8
            port:
              number: 80
  # Load balancing handled by Service (round-robin by default)

Vertical Scaling

Increase resources for single instance (if horizontal scaling isn’t feasible):

# Kubernetes resource limits
resources:
  requests:
    cpu: 4
    memory: 4Gi
  limits:
    cpu: 8
    memory: 8Gi

# Goroutine limit
env:
- name: GOMAXPROCS
  value: "8"

CPU scaling effect:

1 CPU: ~100 concurrent plays
2 CPUs: ~200 concurrent plays
4 CPUs: ~400 concurrent plays

Memory scaling effect:

512 MB: Very limited, high GC pressure
2 GB: Good for moderate loads
8 GB: Suitable for high loads

MongoDB Scaling

Replica Sets

For redundancy and read scaling:

// Initialize replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo-0:27017" },
    { _id: 1, host: "mongo-1:27017" },
    { _id: 2, host: "mongo-2:27017" }
  ]
})

Benefits:

Automatic failover (if primary dies)
Read scaling (secondary nodes can serve reads)
No single point of failure

MongoDB Atlas (Managed)

Recommended for production (no operational overhead):

MONGODB_URI=mongodb+srv://user:password@cluster0.mongodb.net/flow8?retryWrites=true&w=majority

Scaling options:

Vertical: Increase cluster tier (M10 → M20 → M30)
Horizontal: Enable sharding (distribute data by shard key)

Sharding (for 100+ concurrent plays)

Partition data by company or flow for massive scale:

// Enable sharding on flows collection
sh.enableSharding("flow8")
sh.shardCollection("flow8.plays", { company_id: 1, created_at: 1 })

// Results:
// - Shard 1: company_a, company_b
// - Shard 2: company_c, company_d
// - Shard 3: company_e, company_f

Performance improvement:

Single MongoDB: ~1000 plays/sec
Sharded (3 shards): ~3000 plays/sec
Sharded (10 shards): ~10,000 plays/sec

Caching Layer

flow8 implements three caching levels:

1. In-Memory Cache (Per Instance)

Caches frequently-accessed data with TTL:

type TTLCache struct {
    data map[string]interface{}
    ttl  map[string]time.Time
}

Cached data:

Flow definitions (TTL: 5 minutes)
Module catalog (TTL: 1 hour)
Test case fixtures (TTL: 24 hours)
Audit filter queries (TTL: 3 minutes, refreshed by background job)

Cache stats:

curl http://localhost:4454/metrics | grep cache
flow8_cache_hits_total 45000
flow8_cache_misses_total 5000  # Cache hit rate: 90%

Configuration:

cache:
  flows:
    ttl_seconds: 300
    max_entries: 10000
  modules:
    ttl_seconds: 3600
    max_entries: 1000
  test_fixtures:
    ttl_seconds: 86400
    max_entries: 500

2. KV Store Concurrency

Concurrent writes to same key use conflict resolution:

Write 1: kv["total"] = 100
Write 2: kv["total"] = 150  (concurrent)

Result: kv["total"] = 150 (last write wins)

For critical state, use database transactions:

func (s *LayerService) IncrementCounter(ctx context.Context, key string) {
    session, _ := s.db.Client().StartSession()
    defer session.EndSession(ctx)

    session.WithTxn(ctx, func(ctx context.Context) error {
        current := s.kv.Get(key)
        s.kv.Set(key, current + 1)  // Atomic within transaction
        return nil
    })
}

3. Redis (Optional, Enterprise)

For distributed caching across instances:

cache:
  backend: redis  # or "memory" (default)
  redis:
    host: redis.company.com
    port: 6379
    password: "[encrypted]"
    ttl_seconds: 300

Benefits:

Shared cache across instances
Reduced database queries
Better cache hit rate on distributed systems

Performance Optimization

Query Optimization

Index strategy:

// Primary index for most queries
db.flows.createIndex({ company_id: 1, created_at: -1 })
db.flows.createIndex({ company_id: 1, name: 1 })

// Composite indexes for common filters
db.plays.createIndex({ flow_id: 1, created_at: -1 })
db.plays.createIndex({ company_id: 1, status: 1, created_at: -1 })

Avoid:

Scanning without company_id filter
Querying by arbitrary fields (add index first)
Fetching all fields when only subset needed

Module Execution Optimization

Parallel execution:

By default, flowlets with no dependencies execute in parallel:

{
  "name": "process-documents",
  "flowlets": [
    {
      "name": "process-1",
      "module_ref": "extract",
      "depends_on": []  // No deps, executes in parallel
    },
    {
      "name": "process-2",
      "module_ref": "extract",
      "depends_on": []  // No deps, executes in parallel
    },
    {
      "name": "aggregate",
      "module_ref": "aggregator",
      "depends_on": ["process-1", "process-2"]  // Waits for parallel tasks
    }
  ]
}

Channel-based async execution:

For long-running tasks, defer to channel:

{
  "name": "heavy-processing",
  "module_ref": "document-ocr",
  "defer_after_ms": 0,  // Execute on background channel
  "timeout": "300s"     // 5 minute timeout
}

Timeout tuning:

execution:
  default_timeout_seconds: 300      # 5 min default
  max_timeout_seconds: 3600         # Max 1 hour
  channel_execution_timeout_seconds: 1800  # 30 min for async

Network Optimization

Reduce API calls:

// BAD: Multiple round trips
for _, id := range ids {
    flow := getFlow(id)  // 1 query per ID
}

// GOOD: Batch query
flows := getFlows(ids)  // 1 query for all

Compression:

# Enable gzip compression
http:
  compression: true
  compression_level: 6  # 1-9, higher = slower but better compression

Scaling Guidelines

When to Scale Horizontally

Add instances when:

CPU utilization > 70% consistently
Response latency > 500ms (p95)
Cannot increase instance memory further
Need high availability (2+ instances)

# Add instance
kubectl scale deployment flow8 -n flow8 --replicas=$((current + 1))

When to Scale Vertically

Increase resources when:

Able to utilize more CPUs (< 100% CPU with high memory)
Memory pressure (GC overhead > 20%)
Single instance serves all traffic (no HA requirement)

# Update resources
kubectl set resources deployment flow8 \
  --limits=cpu=4,memory=4Gi \
  --requests=cpu=2,memory=2Gi

When to Scale MongoDB

Add MongoDB capacity when:

Query latency > 100ms (slow query log)
Connection pool exhausted (95%+ utilization)
Disk I/O at capacity
Replication lag > 10 seconds

Solutions:

Add read replicas (for read-heavy workloads)
Shard the database (for write-heavy workloads)
Upgrade to higher tier (Atlas)

Performance Baseline

Single Instance

Metric	Value
CPU	2 cores
Memory	2 GB
Concurrent plays	~200
Throughput	~100 plays/min
p95 latency	< 500ms
Max connections	100

Scaled (3 instances)

Metric	Value
Concurrent plays	~600
Throughput	~300 plays/min
MongoDB bottleneck	~1000 plays/sec max
Failover time	< 30s

With MongoDB Sharding

Metric	Value
Concurrent plays	~3000
Throughput	~1500 plays/min
MongoDB throughput	~5000 plays/sec

Bottleneck Identification

Check Response Latency
    ├─ p95 < 100ms → Check for spikes (GC, lock contention)
    ├─ 100-500ms → Normal load, monitor CPU/memory
    ├─ 500-2000ms → Check MongoDB slow logs
    └─ > 2000ms → Likely I/O bound (disk, network)

Check CPU Utilization
    ├─ < 50% → Likely I/O bound
    ├─ 50-70% → Optimal range
    ├─ 70-90% → Add instance or increase CPU
    └─ > 90% → Critical, scale immediately

Check Memory Usage
    ├─ < 50% → Increase cache size
    ├─ 50-80% → Optimal range
    ├─ 80-90% → Monitor GC activity
    └─ > 90% → Likely memory leak, restart instances

Check MongoDB
    ├─ Query latency < 50ms → Healthy
    ├─ 50-100ms → Monitor indexes
    ├─ 100-500ms → Slow queries, optimize
    └─ > 500ms → Add indexes or shards

Cost Optimization

Compute

Scale down replicas during off-peak hours
Use spot/preemptible instances for non-critical workloads
Right-size instances to actual usage

Database

Use MongoDB Atlas (saves ops overhead)
Implement effective retention policies (less data = lower storage cost)
Archive old flows to separate cold storage

Network

Enable compression (reduce bandwidth)
Cache frequently-accessed data
Use CDN for static assets

Example Savings

Before optimization:
- 10 instances (always on): $5,000/month
- MongoDB: 1TB storage: $2,000/month
- Network: 100GB/month: $1,000/month
Total: $8,000/month

After optimization:
- 3-5 instances (auto-scale): $2,000/month
- MongoDB: 100GB (retention policy): $500/month
- Network: 50GB (compression): $500/month
Total: $3,000/month

Savings: 62% ($5,000/month)