Skip to content

AI Provider Configuration

Supported AI Providers

flow8 integrates with multiple large language model (LLM) providers, allowing you to choose the best model for each use case without changing flow definitions.

ProviderModelsCapabilitiesCostLatencySelf-Hosted
OpenAIGPT-4, GPT-3.5, text-davinci-003Chat, embeddings, function callingHighLow (optimized)No
Anthropic ClaudeClaude 3 (Opus, Sonnet, Haiku)Chat, long context (200K tokens)MediumLowNo
MistralMistral Large, Medium, SmallChat, function callingLowMediumNo
OllamaLlama 2, Neural Chat, Orca, MistralChat (limited capability)FreeHigh (depends on hardware)Yes
OpenAI CompatibleAny (LM Studio, text-generation-webui)Chat, depends on modelFreeDepends on hardwareYes

Component Configuration

AI providers are configured as named components in the component_configs MongoDB collection.

Configuration Structure

{
"_id": ObjectId(),
"name": "openai-gpt4",
"kind": "ai",
"company_id": ObjectId("company_123"),
"config": {
"provider": "openai",
"api_key": "[encrypted: sk-...]",
"model": "gpt-4-turbo-preview",
"base_url": "https://api.openai.com/v1", // optional
"temperature": 0.7,
"max_tokens": 2048,
"timeout_seconds": 30,
"top_p": 1.0, // optional
"frequency_penalty": 0.0, // optional
"presence_penalty": 0.0 // optional
},
"is_default": true,
"created_at": ISODate("2026-04-04T10:00:00Z"),
"updated_at": ISODate("2026-04-04T10:00:00Z")
}

Creating Components via REST API

Terminal window
# Create default OpenAI component
curl -X POST http://localhost:4454/api/v1/admin/components \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "gpt4-default",
"kind": "ai",
"config": {
"provider": "openai",
"api_key": "sk-...",
"model": "gpt-4-turbo-preview",
"temperature": 0.7,
"max_tokens": 2048
},
"is_default": true
}'
# Create named Mistral component (for cost savings)
curl -X POST http://localhost:4454/api/v1/admin/components \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{
"name": "mistral-budget",
"kind": "ai",
"config": {
"provider": "mistral",
"api_key": "...",
"model": "mistral-large",
"temperature": 0.8,
"max_tokens": 1024
},
"is_default": false
}'

Provider Details

OpenAI

Configuration:

{
"provider": "openai",
"api_key": "sk-...", // Required
"model": "gpt-4-turbo-preview", // Required
"base_url": "https://api.openai.com/v1", // Optional
"temperature": 0.7, // 0-2, default: 0.7
"max_tokens": 2048, // Max response length
"timeout_seconds": 30,
"top_p": 1.0, // 0-1, nucleus sampling
"frequency_penalty": 0.0, // -2 to 2
"presence_penalty": 0.0 // -2 to 2
}

Available Models:

  • gpt-4-turbo-preview β€” Most capable, best for complex reasoning
  • gpt-4 β€” Stable GPT-4 release
  • gpt-3.5-turbo β€” Fast, cost-effective for simple tasks
  • text-davinci-003 β€” Legacy, not recommended for new deployments

Supported via base_url:

  • Azure OpenAI (base_url: https://{resource}.openai.azure.com/v1)
  • Local OpenAI proxy (e.g., LiteLLM)

Example: Azure OpenAI

{
"provider": "openai",
"api_key": "your-azure-api-key",
"model": "gpt-4",
"base_url": "https://myresource.openai.azure.com/v1",
"temperature": 0.3
}

Anthropic Claude

Configuration:

{
"provider": "anthropic",
"api_key": "sk-ant-...", // Required
"model": "claude-3-opus-20240229", // Required
"temperature": 0.7, // 0-1
"max_tokens": 2048,
"timeout_seconds": 30,
"system_prompt": "You are a helpful assistant" // Optional
}

Available Models:

ModelCapabilityCostBest For
claude-3-opus-20240229HighestHighComplex reasoning, long context (200K)
claude-3-sonnet-20240229HighMediumBalanced capability and cost
claude-3-haiku-20240307ModerateLowFast, cost-effective classification

Key Features:

  • 200K context window: Process entire documents without summarization
  • Vision: Can analyze images (via base64 encoding)
  • Structured output: Supports JSON mode for parsing

Mistral

Configuration:

{
"provider": "mistral",
"api_key": "...", // Required
"model": "mistral-large", // Required
"temperature": 0.7,
"max_tokens": 2048,
"timeout_seconds": 30,
"top_p": 1.0,
"safe_prompt": false // Optional
}

Available Models:

ModelContextCostUse Case
mistral-large8KMediumGeneral purpose, function calling
mistral-medium8KLowCost-effective
mistral-small8KVery LowSimple tasks, lightweight
mistral-tiny32KLowestClassification, lightweight

Advantages:

  • Open source (Mistral 7B available self-hosted)
  • Competitive pricing
  • Strong European presence (privacy considerations)

Ollama (Self-Hosted)

Configuration:

{
"provider": "ollama",
"base_url": "http://ollama:11434", // Required
"model": "mistral:7b", // Required
"temperature": 0.7,
"top_p": 1.0,
"top_k": 40,
"timeout_seconds": 60,
"stream": false // Optional: stream responses
}

Available Models:

Pull from Ollama registry:

Terminal window
ollama pull mistral:7b
ollama pull neural-chat:7b
ollama pull orca-mini:3b
ollama pull llama2:13b
ollama run mistral:7b

Setup:

Terminal window
# Docker
docker run -d --name ollama -p 11434:11434 ollama/ollama:latest
# Start Ollama service
docker exec ollama ollama pull mistral:7b
# In flow8, configure:
{
"provider": "ollama",
"base_url": "http://ollama:11434",
"model": "mistral:7b",
"timeout_seconds": 120 // Longer timeout for slow hardware
}

Hardware Requirements:

  • 7B model: 8GB VRAM (GPU) or 16GB RAM
  • 13B model: 16GB VRAM (GPU) or 32GB RAM
  • 70B model: 48GB+ VRAM

OpenAI-Compatible Endpoints

For any OpenAI-compatible API (LM Studio, text-generation-webui, etc.):

Configuration:

{
"provider": "openai_compatible",
"base_url": "http://lm-studio:1234/v1",
"api_key": "not-needed", // Optional
"model": "local-model",
"temperature": 0.7,
"max_tokens": 2048,
"timeout_seconds": 60
}

Examples:

LM Studio:

{
"provider": "openai_compatible",
"base_url": "http://localhost:1234/v1",
"model": "local-model",
"temperature": 0.7
}

text-generation-webui:

{
"provider": "openai_compatible",
"base_url": "http://localhost:5000/v1",
"api_key": "not-needed",
"model": "text-generation-webui-model",
"temperature": 0.7
}

Default vs Named Components

Setting Default AI Provider

Only one AI component per company can be the default:

Terminal window
# Set as default (replaces previous default)
curl -X PATCH http://localhost:4454/api/v1/admin/components/gpt4/default \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"is_default": true}'

All flows using default AI provider will automatically use gpt4.

Using Named Components in Flows

Override default in specific flowlets:

{
"name": "extract-with-claude",
"module_ref": "text-extraction",
"component_config_ids": {
"ai": "claude-opus" // Override default
}
}

This flowlet uses Claude instead of default provider.

Multi-Provider Strategy

Cost Optimization Example

Default: Mistral (budget, general use)
β”œβ”€ Most flows use Mistral
β”œβ”€ Cost: ~$0.50 per 1M tokens
Named: claude-opus (expensive, complex tasks)
β”œβ”€ Legal document analysis
β”œβ”€ Multi-step reasoning
β”œβ”€ Cost: ~$15 per 1M tokens
Named: gpt-3.5 (fast, lightweight)
β”œβ”€ Classification tasks
β”œβ”€ Simple summarization
└─ Cost: ~$0.50 per 1M tokens

Flow configuration:

{
"name": "document-workflow",
"flowlets": [
{
"name": "classify-document",
"module_ref": "chat-completion",
"component_config_ids": {
"ai": "gpt35-fast" // Fast classification
}
},
{
"name": "analyze-legal",
"module_ref": "text-extraction",
"component_config_ids": {
"ai": "claude-opus" // Complex analysis
}
},
{
"name": "summarize",
"module_ref": "document-summary"
// No override, uses default (Mistral)
}
]
}

Regional/Availability Strategy

Primary: OpenAI (US)
β”œβ”€ Default for most flows
β”œβ”€ Latency: ~200ms
Fallback: Azure OpenAI (EU)
β”œβ”€ For EU data residency
β”œβ”€ Uses same API, different endpoint
Emergency: Ollama (on-prem)
β”œβ”€ If cloud unavailable
β”œβ”€ Degraded quality but available

Rate Limiting & Cost Management

Per-Provider Rate Limits

{
"name": "openai-gpt4",
"kind": "ai",
"config": {
"provider": "openai",
"api_key": "sk-...",
"model": "gpt-4-turbo-preview",
"rate_limit": {
"requests_per_minute": 60,
"tokens_per_minute": 10000
}
}
}

flow8 implements exponential backoff and queuing if limits exceeded.

Cost Tracking

Monitor token usage per provider:

Terminal window
# Get cost metrics (requires Prometheus)
curl http://localhost:9090/api/v1/query?query=flow8_ai_tokens_total{provider="openai"}
# Export to cost management system
{
"provider": "openai",
"tokens_used": 45000,
"cost": 0.67,
"period": "2026-04-04"
}

Testing Providers

Health Check

Terminal window
# Test connectivity to provider
curl -X POST http://localhost:4454/api/v1/admin/components/test \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{
"provider": "openai",
"api_key": "sk-...",
"model": "gpt-4"
}'
Response:
{
"status": "ok",
"latency_ms": 234,
"model_available": true,
"error": null
}

Smoke Test

Test each provider with sample prompts:

Terminal window
# Create test flow
{
"name": "provider-test",
"flowlets": [
{
"name": "test-openai",
"module_ref": "chat-completion",
"component_config_ids": {
"ai": "gpt4"
},
"input_mapping": {
"prompt": "Hello, world! What is 2+2?"
}
},
{
"name": "test-claude",
"module_ref": "chat-completion",
"component_config_ids": {
"ai": "claude-opus"
},
"input_mapping": {
"prompt": "Hello, world! What is 2+2?"
}
}
]
}
# Execute and compare responses

Migration & Switching

Zero-Downtime Provider Switch

  1. Create new component (don’t make default yet):
Terminal window
curl -X POST http://localhost:4454/api/v1/admin/components \
-d '{
"name": "openai-gpt4-new",
"kind": "ai",
"config": { ... },
"is_default": false
}'
  1. Test with canary flows:
{
"name": "canary-flow",
"flowlets": [
{
"component_config_ids": {
"ai": "openai-gpt4-new" // New provider
}
}
]
}
  1. Monitor quality and latency metrics

  2. Update other flows to use new provider

  3. Make it the new default:

Terminal window
curl -X PATCH http://localhost:4454/api/v1/admin/components/openai-gpt4-new/default \
-d '{"is_default": true}'
  1. Update remaining flows that explicitly referenced old provider

  2. Decommission old component:

Terminal window
curl -X DELETE http://localhost:4454/api/v1/admin/components/openai-gpt4-old

Troubleshooting

”API key invalid” Error

Error: invalid_api_key

Solution:

  1. Verify key format (should start with provider prefix: sk- for OpenAI, sk-ant- for Anthropic)
  2. Ensure key has required permissions
  3. Check if key is expired or revoked
  4. Verify key is for correct organization

”Model not found” Error

Error: model_not_found

Solution:

  1. Verify model name is exact (case-sensitive)
  2. Check provider docs for available models
  3. For Ollama, ensure model is pulled: ollama pull mistral:7b

Timeout/Slow Responses

timeout waiting for response

Solution:

  1. Increase timeout_seconds in component config
  2. Check network latency to provider
  3. For self-hosted Ollama, increase timeout_seconds (120+ for large models)
  4. Monitor provider status page for outages

Rate Limit Exceeded

Error: rate_limit_exceeded

Solution:

  1. Increase rate_limit.requests_per_minute if quota allows
  2. Reduce number of concurrent flows
  3. Implement request queuing/backpressure
  4. Use cheaper/faster model as fallback

Best Practices

  1. Use appropriate model for task:

    • GPT-4 for complex reasoning
    • Mistral for cost-effective general use
    • Haiku for fast classification
  2. Monitor costs: Track token usage per provider and flow

  3. Test before production: Verify quality with canary flows

  4. Have fallback: Configure multiple providers for redundancy

  5. Set reasonable timeouts: Account for network latency and model size

  6. Cache when possible: Use KV store to cache expensive API calls

  7. Stream for long outputs: Enable streaming for responses > 2K tokens