Scaling & Performance
Scaling Architecture
flow8 is designed for horizontal scalability with a stateless application layer and a shared data layer:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Load Balancer / Ingress ββ (Round-robin, least connections) βββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββ¬βββββββββββ β β β βββββΌββββ βββββΌββββ βββββΌββββ β flow8 β β flow8 β β flow8 β β i1 β β i2 β β i3 β β :4454 β β :4454 β β :4454 β βββββ¬ββββ βββββ¬ββββ βββββ¬ββββ β β β βββββββββββββββββ΄ββββββββββββββββ β ββββββββββΌβββββββββ β MongoDB Replica β β Set (shared) β β or Atlas β βββββββββββββββββββStateless Application
Each instance is independent:
- No in-memory state shared between instances
- Sessions stored in MongoDB (not in-memory)
- Flows and plays retrieved from MongoDB (not cached locally)
- Cache is per-instance, non-critical (can lose without affecting correctness)
Benefits:
- Scale up or down instantly
- Survive instance failure without session loss
- Load balance with simple round-robin
- Deploy updates without stopping other instances
Channel Port Management
Each instance reserves a port range for async execution:
Instance 1: Ports 7701-7799 (100 concurrent channels)Instance 2: Ports 7800-7899 (100 concurrent channels)Instance 3: Ports 7900-7999 (100 concurrent channels)Configuration:
# Instance 1CHANNEL_PORT_RANGE=7701-7799CHANNEL_POOL_SIZE=100
# Instance 2 (different node)CHANNEL_PORT_RANGE=7800-7899CHANNEL_POOL_SIZE=100Kubernetes:
# Statefulset (ensures unique ordinals)apiVersion: apps/v1kind: StatefulSetmetadata: name: flow8spec: serviceName: flow8 replicas: 3 template: spec: containers: - name: flow8 env: - name: CHANNEL_PORT_RANGE value: "$(POD_ORDINAL)700-$(POD_ORDINAL)799" - name: POD_ORDINAL valueFrom: fieldRef: fieldPath: metadata.annotations['statefulset.kubernetes.io/pod-ordinal']But flow8 is not stateful, so standard Deployment is preferred (with manual port range assignment).
Horizontal Scaling
Adding Instances (Manual)
# Start second instance (different host)docker run \ -e MONGODB_URI=mongodb://mongo:27017 \ -e SERVER_PORT=4454 \ -e CHANNEL_PORT_RANGE=7800-7899 \ -p 4454:4454 \ -p 7800-7899:7800-7899 \ flow8core:latestBoth instances connect to same MongoDB and serve requests independently.
Kubernetes Scaling
# Scale to 5 replicaskubectl scale deployment flow8 -n flow8 --replicas=5
# Or edit deploymentkubectl edit deployment flow8 -n flow8When scaling, ensure channel port ranges donβt overlap:
apiVersion: v1kind: ConfigMapmetadata: name: flow8-channel-ranges namespace: flow8data: ranges.yaml: | - replica: 0 range: "7701-7799" - replica: 1 range: "7800-7899" - replica: 2 range: "7900-7999"Load Balancing
Recommended: Session-sticky load balancing to improve cache hit rate
# nginx configupstream flow8_backend { least_conn; # Balance by active connections server flow8-1:4454; server flow8-2:4454; server flow8-3:4454;}
server { listen 80; location / { proxy_pass http://flow8_backend; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; }}In Kubernetes (Ingress):
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: flow8spec: rules: - host: app.flow8.io http: paths: - path: / pathType: Prefix backend: service: name: flow8 port: number: 80 # Load balancing handled by Service (round-robin by default)Vertical Scaling
Increase resources for single instance (if horizontal scaling isnβt feasible):
# Kubernetes resource limitsresources: requests: cpu: 4 memory: 4Gi limits: cpu: 8 memory: 8Gi
# Goroutine limitenv:- name: GOMAXPROCS value: "8"CPU scaling effect:
- 1 CPU: ~100 concurrent plays
- 2 CPUs: ~200 concurrent plays
- 4 CPUs: ~400 concurrent plays
Memory scaling effect:
- 512 MB: Very limited, high GC pressure
- 2 GB: Good for moderate loads
- 8 GB: Suitable for high loads
MongoDB Scaling
Replica Sets
For redundancy and read scaling:
// Initialize replica setrs.initiate({ _id: "rs0", members: [ { _id: 0, host: "mongo-0:27017" }, { _id: 1, host: "mongo-1:27017" }, { _id: 2, host: "mongo-2:27017" } ]})Benefits:
- Automatic failover (if primary dies)
- Read scaling (secondary nodes can serve reads)
- No single point of failure
MongoDB Atlas (Managed)
Recommended for production (no operational overhead):
MONGODB_URI=mongodb+srv://user:password@cluster0.mongodb.net/flow8?retryWrites=true&w=majorityScaling options:
- Vertical: Increase cluster tier (M10 β M20 β M30)
- Horizontal: Enable sharding (distribute data by shard key)
Sharding (for 100+ concurrent plays)
Partition data by company or flow for massive scale:
// Enable sharding on flows collectionsh.enableSharding("flow8")sh.shardCollection("flow8.plays", { company_id: 1, created_at: 1 })
// Results:// - Shard 1: company_a, company_b// - Shard 2: company_c, company_d// - Shard 3: company_e, company_fPerformance improvement:
- Single MongoDB: ~1000 plays/sec
- Sharded (3 shards): ~3000 plays/sec
- Sharded (10 shards): ~10,000 plays/sec
Caching Layer
flow8 implements three caching levels:
1. In-Memory Cache (Per Instance)
Caches frequently-accessed data with TTL:
type TTLCache struct { data map[string]interface{} ttl map[string]time.Time}Cached data:
- Flow definitions (TTL: 5 minutes)
- Module catalog (TTL: 1 hour)
- Test case fixtures (TTL: 24 hours)
- Audit filter queries (TTL: 3 minutes, refreshed by background job)
Cache stats:
curl http://localhost:4454/metrics | grep cacheflow8_cache_hits_total 45000flow8_cache_misses_total 5000 # Cache hit rate: 90%Configuration:
cache: flows: ttl_seconds: 300 max_entries: 10000 modules: ttl_seconds: 3600 max_entries: 1000 test_fixtures: ttl_seconds: 86400 max_entries: 5002. KV Store Concurrency
Concurrent writes to same key use conflict resolution:
Write 1: kv["total"] = 100Write 2: kv["total"] = 150 (concurrent)
Result: kv["total"] = 150 (last write wins)For critical state, use database transactions:
func (s *LayerService) IncrementCounter(ctx context.Context, key string) { session, _ := s.db.Client().StartSession() defer session.EndSession(ctx)
session.WithTxn(ctx, func(ctx context.Context) error { current := s.kv.Get(key) s.kv.Set(key, current + 1) // Atomic within transaction return nil })}3. Redis (Optional, Enterprise)
For distributed caching across instances:
cache: backend: redis # or "memory" (default) redis: host: redis.company.com port: 6379 password: "[encrypted]" ttl_seconds: 300Benefits:
- Shared cache across instances
- Reduced database queries
- Better cache hit rate on distributed systems
Performance Optimization
Query Optimization
Index strategy:
// Primary index for most queriesdb.flows.createIndex({ company_id: 1, created_at: -1 })db.flows.createIndex({ company_id: 1, name: 1 })
// Composite indexes for common filtersdb.plays.createIndex({ flow_id: 1, created_at: -1 })db.plays.createIndex({ company_id: 1, status: 1, created_at: -1 })Avoid:
- Scanning without company_id filter
- Querying by arbitrary fields (add index first)
- Fetching all fields when only subset needed
Module Execution Optimization
Parallel execution:
By default, flowlets with no dependencies execute in parallel:
{ "name": "process-documents", "flowlets": [ { "name": "process-1", "module_ref": "extract", "depends_on": [] // No deps, executes in parallel }, { "name": "process-2", "module_ref": "extract", "depends_on": [] // No deps, executes in parallel }, { "name": "aggregate", "module_ref": "aggregator", "depends_on": ["process-1", "process-2"] // Waits for parallel tasks } ]}Channel-based async execution:
For long-running tasks, defer to channel:
{ "name": "heavy-processing", "module_ref": "document-ocr", "defer_after_ms": 0, // Execute on background channel "timeout": "300s" // 5 minute timeout}Timeout tuning:
execution: default_timeout_seconds: 300 # 5 min default max_timeout_seconds: 3600 # Max 1 hour channel_execution_timeout_seconds: 1800 # 30 min for asyncNetwork Optimization
Reduce API calls:
// BAD: Multiple round tripsfor _, id := range ids { flow := getFlow(id) // 1 query per ID}
// GOOD: Batch queryflows := getFlows(ids) // 1 query for allCompression:
# Enable gzip compressionhttp: compression: true compression_level: 6 # 1-9, higher = slower but better compressionScaling Guidelines
When to Scale Horizontally
Add instances when:
- CPU utilization > 70% consistently
- Response latency > 500ms (p95)
- Cannot increase instance memory further
- Need high availability (2+ instances)
# Add instancekubectl scale deployment flow8 -n flow8 --replicas=$((current + 1))When to Scale Vertically
Increase resources when:
- Able to utilize more CPUs (< 100% CPU with high memory)
- Memory pressure (GC overhead > 20%)
- Single instance serves all traffic (no HA requirement)
# Update resourceskubectl set resources deployment flow8 \ --limits=cpu=4,memory=4Gi \ --requests=cpu=2,memory=2GiWhen to Scale MongoDB
Add MongoDB capacity when:
- Query latency > 100ms (slow query log)
- Connection pool exhausted (95%+ utilization)
- Disk I/O at capacity
- Replication lag > 10 seconds
Solutions:
- Add read replicas (for read-heavy workloads)
- Shard the database (for write-heavy workloads)
- Upgrade to higher tier (Atlas)
Performance Baseline
Single Instance
| Metric | Value |
|---|---|
| CPU | 2 cores |
| Memory | 2 GB |
| Concurrent plays | ~200 |
| Throughput | ~100 plays/min |
| p95 latency | < 500ms |
| Max connections | 100 |
Scaled (3 instances)
| Metric | Value |
|---|---|
| Concurrent plays | ~600 |
| Throughput | ~300 plays/min |
| MongoDB bottleneck | ~1000 plays/sec max |
| Failover time | < 30s |
With MongoDB Sharding
| Metric | Value |
|---|---|
| Concurrent plays | ~3000 |
| Throughput | ~1500 plays/min |
| MongoDB throughput | ~5000 plays/sec |
Bottleneck Identification
Check Response Latency ββ p95 < 100ms β Check for spikes (GC, lock contention) ββ 100-500ms β Normal load, monitor CPU/memory ββ 500-2000ms β Check MongoDB slow logs ββ > 2000ms β Likely I/O bound (disk, network)
Check CPU Utilization ββ < 50% β Likely I/O bound ββ 50-70% β Optimal range ββ 70-90% β Add instance or increase CPU ββ > 90% β Critical, scale immediately
Check Memory Usage ββ < 50% β Increase cache size ββ 50-80% β Optimal range ββ 80-90% β Monitor GC activity ββ > 90% β Likely memory leak, restart instances
Check MongoDB ββ Query latency < 50ms β Healthy ββ 50-100ms β Monitor indexes ββ 100-500ms β Slow queries, optimize ββ > 500ms β Add indexes or shardsCost Optimization
Compute
- Scale down replicas during off-peak hours
- Use spot/preemptible instances for non-critical workloads
- Right-size instances to actual usage
Database
- Use MongoDB Atlas (saves ops overhead)
- Implement effective retention policies (less data = lower storage cost)
- Archive old flows to separate cold storage
Network
- Enable compression (reduce bandwidth)
- Cache frequently-accessed data
- Use CDN for static assets
Example Savings
Before optimization:- 10 instances (always on): $5,000/month- MongoDB: 1TB storage: $2,000/month- Network: 100GB/month: $1,000/monthTotal: $8,000/month
After optimization:- 3-5 instances (auto-scale): $2,000/month- MongoDB: 100GB (retention policy): $500/month- Network: 50GB (compression): $500/monthTotal: $3,000/month
Savings: 62% ($5,000/month)