AI Provider Configuration

Supported AI Providers

flow8 integrates with multiple large language model (LLM) providers, allowing you to choose the best model for each use case without changing flow definitions.

Provider	Models	Capabilities	Cost	Latency	Self-Hosted
OpenAI	GPT-4, GPT-3.5, text-davinci-003	Chat, embeddings, function calling	High	Low (optimized)	No
Anthropic Claude	Claude 3 (Opus, Sonnet, Haiku)	Chat, long context (200K tokens)	Medium	Low	No
Mistral	Mistral Large, Medium, Small	Chat, function calling	Low	Medium	No
Ollama	Llama 2, Neural Chat, Orca, Mistral	Chat (limited capability)	Free	High (depends on hardware)	Yes
OpenAI Compatible	Any (LM Studio, text-generation-webui)	Chat, depends on model	Free	Depends on hardware	Yes

Component Configuration

AI providers are configured as named components in the component_configs MongoDB collection.

Configuration Structure

{
  "_id": ObjectId(),
  "name": "openai-gpt4",
  "kind": "ai",
  "company_id": ObjectId("company_123"),
  "config": {
    "provider": "openai",
    "api_key": "[encrypted: sk-...]",
    "model": "gpt-4-turbo-preview",
    "base_url": "https://api.openai.com/v1",  // optional
    "temperature": 0.7,
    "max_tokens": 2048,
    "timeout_seconds": 30,
    "top_p": 1.0,                           // optional
    "frequency_penalty": 0.0,               // optional
    "presence_penalty": 0.0                 // optional
  },
  "is_default": true,
  "created_at": ISODate("2026-04-04T10:00:00Z"),
  "updated_at": ISODate("2026-04-04T10:00:00Z")
}

Creating Components via REST API

# Create default OpenAI component
curl -X POST http://localhost:4454/api/v1/admin/components \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "gpt4-default",
    "kind": "ai",
    "config": {
      "provider": "openai",
      "api_key": "sk-...",
      "model": "gpt-4-turbo-preview",
      "temperature": 0.7,
      "max_tokens": 2048
    },
    "is_default": true
  }'

# Create named Mistral component (for cost savings)
curl -X POST http://localhost:4454/api/v1/admin/components \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{
    "name": "mistral-budget",
    "kind": "ai",
    "config": {
      "provider": "mistral",
      "api_key": "...",
      "model": "mistral-large",
      "temperature": 0.8,
      "max_tokens": 1024
    },
    "is_default": false
  }'

Provider Details

OpenAI

Configuration:

{
  "provider": "openai",
  "api_key": "sk-...",              // Required
  "model": "gpt-4-turbo-preview",   // Required
  "base_url": "https://api.openai.com/v1",  // Optional
  "temperature": 0.7,                // 0-2, default: 0.7
  "max_tokens": 2048,                // Max response length
  "timeout_seconds": 30,
  "top_p": 1.0,                      // 0-1, nucleus sampling
  "frequency_penalty": 0.0,          // -2 to 2
  "presence_penalty": 0.0            // -2 to 2
}

Available Models:

gpt-4-turbo-preview — Most capable, best for complex reasoning
gpt-4 — Stable GPT-4 release
gpt-3.5-turbo — Fast, cost-effective for simple tasks
text-davinci-003 — Legacy, not recommended for new deployments

Supported via base_url:

Azure OpenAI (base_url: https://{resource}.openai.azure.com/v1)
Local OpenAI proxy (e.g., LiteLLM)

Example: Azure OpenAI

{
  "provider": "openai",
  "api_key": "your-azure-api-key",
  "model": "gpt-4",
  "base_url": "https://myresource.openai.azure.com/v1",
  "temperature": 0.3
}

Anthropic Claude

Configuration:

{
  "provider": "anthropic",
  "api_key": "sk-ant-...",           // Required
  "model": "claude-3-opus-20240229", // Required
  "temperature": 0.7,                // 0-1
  "max_tokens": 2048,
  "timeout_seconds": 30,
  "system_prompt": "You are a helpful assistant"  // Optional
}

Available Models:

Model	Capability	Cost	Best For
claude-3-opus-20240229	Highest	High	Complex reasoning, long context (200K)
claude-3-sonnet-20240229	High	Medium	Balanced capability and cost
claude-3-haiku-20240307	Moderate	Low	Fast, cost-effective classification

Key Features:

200K context window: Process entire documents without summarization
Vision: Can analyze images (via base64 encoding)
Structured output: Supports JSON mode for parsing

Mistral

Configuration:

{
  "provider": "mistral",
  "api_key": "...",                  // Required
  "model": "mistral-large",          // Required
  "temperature": 0.7,
  "max_tokens": 2048,
  "timeout_seconds": 30,
  "top_p": 1.0,
  "safe_prompt": false               // Optional
}

Available Models:

Model	Context	Cost	Use Case
mistral-large	8K	Medium	General purpose, function calling
mistral-medium	8K	Low	Cost-effective
mistral-small	8K	Very Low	Simple tasks, lightweight
mistral-tiny	32K	Lowest	Classification, lightweight

Advantages:

Open source (Mistral 7B available self-hosted)
Competitive pricing
Strong European presence (privacy considerations)

Ollama (Self-Hosted)

Configuration:

{
  "provider": "ollama",
  "base_url": "http://ollama:11434",  // Required
  "model": "mistral:7b",               // Required
  "temperature": 0.7,
  "top_p": 1.0,
  "top_k": 40,
  "timeout_seconds": 60,
  "stream": false                      // Optional: stream responses
}

Available Models:

Pull from Ollama registry:

ollama pull mistral:7b
ollama pull neural-chat:7b
ollama pull orca-mini:3b
ollama pull llama2:13b
ollama run mistral:7b

Setup:

# Docker
docker run -d --name ollama -p 11434:11434 ollama/ollama:latest

# Start Ollama service
docker exec ollama ollama pull mistral:7b

# In flow8, configure:
{
  "provider": "ollama",
  "base_url": "http://ollama:11434",
  "model": "mistral:7b",
  "timeout_seconds": 120  // Longer timeout for slow hardware
}

Hardware Requirements:

7B model: 8GB VRAM (GPU) or 16GB RAM
13B model: 16GB VRAM (GPU) or 32GB RAM
70B model: 48GB+ VRAM

OpenAI-Compatible Endpoints

For any OpenAI-compatible API (LM Studio, text-generation-webui, etc.):

Configuration:

{
  "provider": "openai_compatible",
  "base_url": "http://lm-studio:1234/v1",
  "api_key": "not-needed",            // Optional
  "model": "local-model",
  "temperature": 0.7,
  "max_tokens": 2048,
  "timeout_seconds": 60
}

Examples:

LM Studio:

{
  "provider": "openai_compatible",
  "base_url": "http://localhost:1234/v1",
  "model": "local-model",
  "temperature": 0.7
}

text-generation-webui:

{
  "provider": "openai_compatible",
  "base_url": "http://localhost:5000/v1",
  "api_key": "not-needed",
  "model": "text-generation-webui-model",
  "temperature": 0.7
}

Default vs Named Components

Setting Default AI Provider

Only one AI component per company can be the default:

# Set as default (replaces previous default)
curl -X PATCH http://localhost:4454/api/v1/admin/components/gpt4/default \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"is_default": true}'

All flows using default AI provider will automatically use gpt4.

Using Named Components in Flows

Override default in specific flowlets:

{
  "name": "extract-with-claude",
  "module_ref": "text-extraction",
  "component_config_ids": {
    "ai": "claude-opus"  // Override default
  }
}

This flowlet uses Claude instead of default provider.

Multi-Provider Strategy

Cost Optimization Example

Default: Mistral (budget, general use)
├─ Most flows use Mistral
├─ Cost: ~$0.50 per 1M tokens

Named: claude-opus (expensive, complex tasks)
├─ Legal document analysis
├─ Multi-step reasoning
├─ Cost: ~$15 per 1M tokens

Named: gpt-3.5 (fast, lightweight)
├─ Classification tasks
├─ Simple summarization
└─ Cost: ~$0.50 per 1M tokens

Flow configuration:

{
  "name": "document-workflow",
  "flowlets": [
    {
      "name": "classify-document",
      "module_ref": "chat-completion",
      "component_config_ids": {
        "ai": "gpt35-fast"  // Fast classification
      }
    },
    {
      "name": "analyze-legal",
      "module_ref": "text-extraction",
      "component_config_ids": {
        "ai": "claude-opus"  // Complex analysis
      }
    },
    {
      "name": "summarize",
      "module_ref": "document-summary"
      // No override, uses default (Mistral)
    }
  ]
}

Regional/Availability Strategy

Primary: OpenAI (US)
├─ Default for most flows
├─ Latency: ~200ms

Fallback: Azure OpenAI (EU)
├─ For EU data residency
├─ Uses same API, different endpoint

Emergency: Ollama (on-prem)
├─ If cloud unavailable
├─ Degraded quality but available

Rate Limiting & Cost Management

Per-Provider Rate Limits

{
  "name": "openai-gpt4",
  "kind": "ai",
  "config": {
    "provider": "openai",
    "api_key": "sk-...",
    "model": "gpt-4-turbo-preview",
    "rate_limit": {
      "requests_per_minute": 60,
      "tokens_per_minute": 10000
    }
  }
}

flow8 implements exponential backoff and queuing if limits exceeded.

Cost Tracking

Monitor token usage per provider:

# Get cost metrics (requires Prometheus)
curl http://localhost:9090/api/v1/query?query=flow8_ai_tokens_total{provider="openai"}

# Export to cost management system
{
  "provider": "openai",
  "tokens_used": 45000,
  "cost": 0.67,
  "period": "2026-04-04"
}

Testing Providers

Health Check

# Test connectivity to provider
curl -X POST http://localhost:4454/api/v1/admin/components/test \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{
    "provider": "openai",
    "api_key": "sk-...",
    "model": "gpt-4"
  }'

Response:
{
  "status": "ok",
  "latency_ms": 234,
  "model_available": true,
  "error": null
}

Smoke Test

Test each provider with sample prompts:

# Create test flow
{
  "name": "provider-test",
  "flowlets": [
    {
      "name": "test-openai",
      "module_ref": "chat-completion",
      "component_config_ids": {
        "ai": "gpt4"
      },
      "input_mapping": {
        "prompt": "Hello, world! What is 2+2?"
      }
    },
    {
      "name": "test-claude",
      "module_ref": "chat-completion",
      "component_config_ids": {
        "ai": "claude-opus"
      },
      "input_mapping": {
        "prompt": "Hello, world! What is 2+2?"
      }
    }
  ]
}

# Execute and compare responses

Migration & Switching

Zero-Downtime Provider Switch

Create new component (don’t make default yet):

curl -X POST http://localhost:4454/api/v1/admin/components \
  -d '{
    "name": "openai-gpt4-new",
    "kind": "ai",
    "config": { ... },
    "is_default": false
  }'

Test with canary flows:

{
  "name": "canary-flow",
  "flowlets": [
    {
      "component_config_ids": {
        "ai": "openai-gpt4-new"  // New provider
      }
    }
  ]
}

Monitor quality and latency metrics
Update other flows to use new provider
Make it the new default:

curl -X PATCH http://localhost:4454/api/v1/admin/components/openai-gpt4-new/default \
  -d '{"is_default": true}'

Update remaining flows that explicitly referenced old provider
Decommission old component:

curl -X DELETE http://localhost:4454/api/v1/admin/components/openai-gpt4-old

Troubleshooting

”API key invalid” Error

Error: invalid_api_key

Solution:

Verify key format (should start with provider prefix: sk- for OpenAI, sk-ant- for Anthropic)
Ensure key has required permissions
Check if key is expired or revoked
Verify key is for correct organization

”Model not found” Error

Error: model_not_found

Solution:

Verify model name is exact (case-sensitive)
Check provider docs for available models
For Ollama, ensure model is pulled: ollama pull mistral:7b

Timeout/Slow Responses

timeout waiting for response

Solution:

Increase timeout_seconds in component config
Check network latency to provider
For self-hosted Ollama, increase timeout_seconds (120+ for large models)
Monitor provider status page for outages

Rate Limit Exceeded

Error: rate_limit_exceeded

Solution:

Increase rate_limit.requests_per_minute if quota allows
Reduce number of concurrent flows
Implement request queuing/backpressure
Use cheaper/faster model as fallback

Best Practices

Use appropriate model for task:
- GPT-4 for complex reasoning
- Mistral for cost-effective general use
- Haiku for fast classification
Monitor costs: Track token usage per provider and flow
Test before production: Verify quality with canary flows
Have fallback: Configure multiple providers for redundancy
Set reasonable timeouts: Account for network latency and model size
Cache when possible: Use KV store to cache expensive API calls
Stream for long outputs: Enable streaming for responses > 2K tokens