Skip to content

Scaling & Performance

Scaling Architecture

flow8 is designed for horizontal scalability with a stateless application layer and a shared data layer:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load Balancer / Ingress β”‚
β”‚ (Round-robin, least connections) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
β”‚ flow8 β”‚ β”‚ flow8 β”‚ β”‚ flow8 β”‚
β”‚ i1 β”‚ β”‚ i2 β”‚ β”‚ i3 β”‚
β”‚ :4454 β”‚ β”‚ :4454 β”‚ β”‚ :4454 β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”¬β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MongoDB Replica β”‚
β”‚ Set (shared) β”‚
β”‚ or Atlas β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stateless Application

Each instance is independent:

  • No in-memory state shared between instances
  • Sessions stored in MongoDB (not in-memory)
  • Flows and plays retrieved from MongoDB (not cached locally)
  • Cache is per-instance, non-critical (can lose without affecting correctness)

Benefits:

  • Scale up or down instantly
  • Survive instance failure without session loss
  • Load balance with simple round-robin
  • Deploy updates without stopping other instances

Channel Port Management

Each instance reserves a port range for async execution:

Instance 1: Ports 7701-7799 (100 concurrent channels)
Instance 2: Ports 7800-7899 (100 concurrent channels)
Instance 3: Ports 7900-7999 (100 concurrent channels)

Configuration:

Terminal window
# Instance 1
CHANNEL_PORT_RANGE=7701-7799
CHANNEL_POOL_SIZE=100
# Instance 2 (different node)
CHANNEL_PORT_RANGE=7800-7899
CHANNEL_POOL_SIZE=100

Kubernetes:

# Statefulset (ensures unique ordinals)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: flow8
spec:
serviceName: flow8
replicas: 3
template:
spec:
containers:
- name: flow8
env:
- name: CHANNEL_PORT_RANGE
value: "$(POD_ORDINAL)700-$(POD_ORDINAL)799"
- name: POD_ORDINAL
valueFrom:
fieldRef:
fieldPath: metadata.annotations['statefulset.kubernetes.io/pod-ordinal']

But flow8 is not stateful, so standard Deployment is preferred (with manual port range assignment).

Horizontal Scaling

Adding Instances (Manual)

Terminal window
# Start second instance (different host)
docker run \
-e MONGODB_URI=mongodb://mongo:27017 \
-e SERVER_PORT=4454 \
-e CHANNEL_PORT_RANGE=7800-7899 \
-p 4454:4454 \
-p 7800-7899:7800-7899 \
flow8core:latest

Both instances connect to same MongoDB and serve requests independently.

Kubernetes Scaling

Terminal window
# Scale to 5 replicas
kubectl scale deployment flow8 -n flow8 --replicas=5
# Or edit deployment
kubectl edit deployment flow8 -n flow8

When scaling, ensure channel port ranges don’t overlap:

apiVersion: v1
kind: ConfigMap
metadata:
name: flow8-channel-ranges
namespace: flow8
data:
ranges.yaml: |
- replica: 0
range: "7701-7799"
- replica: 1
range: "7800-7899"
- replica: 2
range: "7900-7999"

Load Balancing

Recommended: Session-sticky load balancing to improve cache hit rate

# nginx config
upstream flow8_backend {
least_conn; # Balance by active connections
server flow8-1:4454;
server flow8-2:4454;
server flow8-3:4454;
}
server {
listen 80;
location / {
proxy_pass http://flow8_backend;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

In Kubernetes (Ingress):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: flow8
spec:
rules:
- host: app.flow8.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: flow8
port:
number: 80
# Load balancing handled by Service (round-robin by default)

Vertical Scaling

Increase resources for single instance (if horizontal scaling isn’t feasible):

# Kubernetes resource limits
resources:
requests:
cpu: 4
memory: 4Gi
limits:
cpu: 8
memory: 8Gi
# Goroutine limit
env:
- name: GOMAXPROCS
value: "8"

CPU scaling effect:

  • 1 CPU: ~100 concurrent plays
  • 2 CPUs: ~200 concurrent plays
  • 4 CPUs: ~400 concurrent plays

Memory scaling effect:

  • 512 MB: Very limited, high GC pressure
  • 2 GB: Good for moderate loads
  • 8 GB: Suitable for high loads

MongoDB Scaling

Replica Sets

For redundancy and read scaling:

// Initialize replica set
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "mongo-0:27017" },
{ _id: 1, host: "mongo-1:27017" },
{ _id: 2, host: "mongo-2:27017" }
]
})

Benefits:

  • Automatic failover (if primary dies)
  • Read scaling (secondary nodes can serve reads)
  • No single point of failure

MongoDB Atlas (Managed)

Recommended for production (no operational overhead):

Terminal window
MONGODB_URI=mongodb+srv://user:password@cluster0.mongodb.net/flow8?retryWrites=true&w=majority

Scaling options:

  • Vertical: Increase cluster tier (M10 β†’ M20 β†’ M30)
  • Horizontal: Enable sharding (distribute data by shard key)

Sharding (for 100+ concurrent plays)

Partition data by company or flow for massive scale:

// Enable sharding on flows collection
sh.enableSharding("flow8")
sh.shardCollection("flow8.plays", { company_id: 1, created_at: 1 })
// Results:
// - Shard 1: company_a, company_b
// - Shard 2: company_c, company_d
// - Shard 3: company_e, company_f

Performance improvement:

  • Single MongoDB: ~1000 plays/sec
  • Sharded (3 shards): ~3000 plays/sec
  • Sharded (10 shards): ~10,000 plays/sec

Caching Layer

flow8 implements three caching levels:

1. In-Memory Cache (Per Instance)

Caches frequently-accessed data with TTL:

type TTLCache struct {
data map[string]interface{}
ttl map[string]time.Time
}

Cached data:

  • Flow definitions (TTL: 5 minutes)
  • Module catalog (TTL: 1 hour)
  • Test case fixtures (TTL: 24 hours)
  • Audit filter queries (TTL: 3 minutes, refreshed by background job)

Cache stats:

Terminal window
curl http://localhost:4454/metrics | grep cache
flow8_cache_hits_total 45000
flow8_cache_misses_total 5000 # Cache hit rate: 90%

Configuration:

cache:
flows:
ttl_seconds: 300
max_entries: 10000
modules:
ttl_seconds: 3600
max_entries: 1000
test_fixtures:
ttl_seconds: 86400
max_entries: 500

2. KV Store Concurrency

Concurrent writes to same key use conflict resolution:

Write 1: kv["total"] = 100
Write 2: kv["total"] = 150 (concurrent)
Result: kv["total"] = 150 (last write wins)

For critical state, use database transactions:

func (s *LayerService) IncrementCounter(ctx context.Context, key string) {
session, _ := s.db.Client().StartSession()
defer session.EndSession(ctx)
session.WithTxn(ctx, func(ctx context.Context) error {
current := s.kv.Get(key)
s.kv.Set(key, current + 1) // Atomic within transaction
return nil
})
}

3. Redis (Optional, Enterprise)

For distributed caching across instances:

config/config.yml
cache:
backend: redis # or "memory" (default)
redis:
host: redis.company.com
port: 6379
password: "[encrypted]"
ttl_seconds: 300

Benefits:

  • Shared cache across instances
  • Reduced database queries
  • Better cache hit rate on distributed systems

Performance Optimization

Query Optimization

Index strategy:

// Primary index for most queries
db.flows.createIndex({ company_id: 1, created_at: -1 })
db.flows.createIndex({ company_id: 1, name: 1 })
// Composite indexes for common filters
db.plays.createIndex({ flow_id: 1, created_at: -1 })
db.plays.createIndex({ company_id: 1, status: 1, created_at: -1 })

Avoid:

  • Scanning without company_id filter
  • Querying by arbitrary fields (add index first)
  • Fetching all fields when only subset needed

Module Execution Optimization

Parallel execution:

By default, flowlets with no dependencies execute in parallel:

{
"name": "process-documents",
"flowlets": [
{
"name": "process-1",
"module_ref": "extract",
"depends_on": [] // No deps, executes in parallel
},
{
"name": "process-2",
"module_ref": "extract",
"depends_on": [] // No deps, executes in parallel
},
{
"name": "aggregate",
"module_ref": "aggregator",
"depends_on": ["process-1", "process-2"] // Waits for parallel tasks
}
]
}

Channel-based async execution:

For long-running tasks, defer to channel:

{
"name": "heavy-processing",
"module_ref": "document-ocr",
"defer_after_ms": 0, // Execute on background channel
"timeout": "300s" // 5 minute timeout
}

Timeout tuning:

config/config.yml
execution:
default_timeout_seconds: 300 # 5 min default
max_timeout_seconds: 3600 # Max 1 hour
channel_execution_timeout_seconds: 1800 # 30 min for async

Network Optimization

Reduce API calls:

// BAD: Multiple round trips
for _, id := range ids {
flow := getFlow(id) // 1 query per ID
}
// GOOD: Batch query
flows := getFlows(ids) // 1 query for all

Compression:

# Enable gzip compression
http:
compression: true
compression_level: 6 # 1-9, higher = slower but better compression

Scaling Guidelines

When to Scale Horizontally

Add instances when:

  • CPU utilization > 70% consistently
  • Response latency > 500ms (p95)
  • Cannot increase instance memory further
  • Need high availability (2+ instances)
Terminal window
# Add instance
kubectl scale deployment flow8 -n flow8 --replicas=$((current + 1))

When to Scale Vertically

Increase resources when:

  • Able to utilize more CPUs (< 100% CPU with high memory)
  • Memory pressure (GC overhead > 20%)
  • Single instance serves all traffic (no HA requirement)
Terminal window
# Update resources
kubectl set resources deployment flow8 \
--limits=cpu=4,memory=4Gi \
--requests=cpu=2,memory=2Gi

When to Scale MongoDB

Add MongoDB capacity when:

  • Query latency > 100ms (slow query log)
  • Connection pool exhausted (95%+ utilization)
  • Disk I/O at capacity
  • Replication lag > 10 seconds

Solutions:

  1. Add read replicas (for read-heavy workloads)
  2. Shard the database (for write-heavy workloads)
  3. Upgrade to higher tier (Atlas)

Performance Baseline

Single Instance

MetricValue
CPU2 cores
Memory2 GB
Concurrent plays~200
Throughput~100 plays/min
p95 latency< 500ms
Max connections100

Scaled (3 instances)

MetricValue
Concurrent plays~600
Throughput~300 plays/min
MongoDB bottleneck~1000 plays/sec max
Failover time< 30s

With MongoDB Sharding

MetricValue
Concurrent plays~3000
Throughput~1500 plays/min
MongoDB throughput~5000 plays/sec

Bottleneck Identification

Check Response Latency
β”œβ”€ p95 < 100ms β†’ Check for spikes (GC, lock contention)
β”œβ”€ 100-500ms β†’ Normal load, monitor CPU/memory
β”œβ”€ 500-2000ms β†’ Check MongoDB slow logs
└─ > 2000ms β†’ Likely I/O bound (disk, network)
Check CPU Utilization
β”œβ”€ < 50% β†’ Likely I/O bound
β”œβ”€ 50-70% β†’ Optimal range
β”œβ”€ 70-90% β†’ Add instance or increase CPU
└─ > 90% β†’ Critical, scale immediately
Check Memory Usage
β”œβ”€ < 50% β†’ Increase cache size
β”œβ”€ 50-80% β†’ Optimal range
β”œβ”€ 80-90% β†’ Monitor GC activity
└─ > 90% β†’ Likely memory leak, restart instances
Check MongoDB
β”œβ”€ Query latency < 50ms β†’ Healthy
β”œβ”€ 50-100ms β†’ Monitor indexes
β”œβ”€ 100-500ms β†’ Slow queries, optimize
└─ > 500ms β†’ Add indexes or shards

Cost Optimization

Compute

  • Scale down replicas during off-peak hours
  • Use spot/preemptible instances for non-critical workloads
  • Right-size instances to actual usage

Database

  • Use MongoDB Atlas (saves ops overhead)
  • Implement effective retention policies (less data = lower storage cost)
  • Archive old flows to separate cold storage

Network

  • Enable compression (reduce bandwidth)
  • Cache frequently-accessed data
  • Use CDN for static assets

Example Savings

Before optimization:
- 10 instances (always on): $5,000/month
- MongoDB: 1TB storage: $2,000/month
- Network: 100GB/month: $1,000/month
Total: $8,000/month
After optimization:
- 3-5 instances (auto-scale): $2,000/month
- MongoDB: 100GB (retention policy): $500/month
- Network: 50GB (compression): $500/month
Total: $3,000/month
Savings: 62% ($5,000/month)