AI Provider Configuration
Supported AI Providers
flow8 integrates with multiple large language model (LLM) providers, allowing you to choose the best model for each use case without changing flow definitions.
| Provider | Models | Capabilities | Cost | Latency | Self-Hosted |
|---|---|---|---|---|---|
| OpenAI | GPT-4, GPT-3.5, text-davinci-003 | Chat, embeddings, function calling | High | Low (optimized) | No |
| Anthropic Claude | Claude 3 (Opus, Sonnet, Haiku) | Chat, long context (200K tokens) | Medium | Low | No |
| Mistral | Mistral Large, Medium, Small | Chat, function calling | Low | Medium | No |
| Ollama | Llama 2, Neural Chat, Orca, Mistral | Chat (limited capability) | Free | High (depends on hardware) | Yes |
| OpenAI Compatible | Any (LM Studio, text-generation-webui) | Chat, depends on model | Free | Depends on hardware | Yes |
Component Configuration
AI providers are configured as named components in the component_configs MongoDB collection.
Configuration Structure
{ "_id": ObjectId(), "name": "openai-gpt4", "kind": "ai", "company_id": ObjectId("company_123"), "config": { "provider": "openai", "api_key": "[encrypted: sk-...]", "model": "gpt-4-turbo-preview", "base_url": "https://api.openai.com/v1", // optional "temperature": 0.7, "max_tokens": 2048, "timeout_seconds": 30, "top_p": 1.0, // optional "frequency_penalty": 0.0, // optional "presence_penalty": 0.0 // optional }, "is_default": true, "created_at": ISODate("2026-04-04T10:00:00Z"), "updated_at": ISODate("2026-04-04T10:00:00Z")}Creating Components via REST API
# Create default OpenAI componentcurl -X POST http://localhost:4454/api/v1/admin/components \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "gpt4-default", "kind": "ai", "config": { "provider": "openai", "api_key": "sk-...", "model": "gpt-4-turbo-preview", "temperature": 0.7, "max_tokens": 2048 }, "is_default": true }'
# Create named Mistral component (for cost savings)curl -X POST http://localhost:4454/api/v1/admin/components \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d '{ "name": "mistral-budget", "kind": "ai", "config": { "provider": "mistral", "api_key": "...", "model": "mistral-large", "temperature": 0.8, "max_tokens": 1024 }, "is_default": false }'Provider Details
OpenAI
Configuration:
{ "provider": "openai", "api_key": "sk-...", // Required "model": "gpt-4-turbo-preview", // Required "base_url": "https://api.openai.com/v1", // Optional "temperature": 0.7, // 0-2, default: 0.7 "max_tokens": 2048, // Max response length "timeout_seconds": 30, "top_p": 1.0, // 0-1, nucleus sampling "frequency_penalty": 0.0, // -2 to 2 "presence_penalty": 0.0 // -2 to 2}Available Models:
gpt-4-turbo-previewβ Most capable, best for complex reasoninggpt-4β Stable GPT-4 releasegpt-3.5-turboβ Fast, cost-effective for simple taskstext-davinci-003β Legacy, not recommended for new deployments
Supported via base_url:
- Azure OpenAI (
base_url: https://{resource}.openai.azure.com/v1) - Local OpenAI proxy (e.g., LiteLLM)
Example: Azure OpenAI
{ "provider": "openai", "api_key": "your-azure-api-key", "model": "gpt-4", "base_url": "https://myresource.openai.azure.com/v1", "temperature": 0.3}Anthropic Claude
Configuration:
{ "provider": "anthropic", "api_key": "sk-ant-...", // Required "model": "claude-3-opus-20240229", // Required "temperature": 0.7, // 0-1 "max_tokens": 2048, "timeout_seconds": 30, "system_prompt": "You are a helpful assistant" // Optional}Available Models:
| Model | Capability | Cost | Best For |
|---|---|---|---|
| claude-3-opus-20240229 | Highest | High | Complex reasoning, long context (200K) |
| claude-3-sonnet-20240229 | High | Medium | Balanced capability and cost |
| claude-3-haiku-20240307 | Moderate | Low | Fast, cost-effective classification |
Key Features:
- 200K context window: Process entire documents without summarization
- Vision: Can analyze images (via base64 encoding)
- Structured output: Supports JSON mode for parsing
Mistral
Configuration:
{ "provider": "mistral", "api_key": "...", // Required "model": "mistral-large", // Required "temperature": 0.7, "max_tokens": 2048, "timeout_seconds": 30, "top_p": 1.0, "safe_prompt": false // Optional}Available Models:
| Model | Context | Cost | Use Case |
|---|---|---|---|
| mistral-large | 8K | Medium | General purpose, function calling |
| mistral-medium | 8K | Low | Cost-effective |
| mistral-small | 8K | Very Low | Simple tasks, lightweight |
| mistral-tiny | 32K | Lowest | Classification, lightweight |
Advantages:
- Open source (Mistral 7B available self-hosted)
- Competitive pricing
- Strong European presence (privacy considerations)
Ollama (Self-Hosted)
Configuration:
{ "provider": "ollama", "base_url": "http://ollama:11434", // Required "model": "mistral:7b", // Required "temperature": 0.7, "top_p": 1.0, "top_k": 40, "timeout_seconds": 60, "stream": false // Optional: stream responses}Available Models:
Pull from Ollama registry:
ollama pull mistral:7bollama pull neural-chat:7bollama pull orca-mini:3bollama pull llama2:13bollama run mistral:7bSetup:
# Dockerdocker run -d --name ollama -p 11434:11434 ollama/ollama:latest
# Start Ollama servicedocker exec ollama ollama pull mistral:7b
# In flow8, configure:{ "provider": "ollama", "base_url": "http://ollama:11434", "model": "mistral:7b", "timeout_seconds": 120 // Longer timeout for slow hardware}Hardware Requirements:
- 7B model: 8GB VRAM (GPU) or 16GB RAM
- 13B model: 16GB VRAM (GPU) or 32GB RAM
- 70B model: 48GB+ VRAM
OpenAI-Compatible Endpoints
For any OpenAI-compatible API (LM Studio, text-generation-webui, etc.):
Configuration:
{ "provider": "openai_compatible", "base_url": "http://lm-studio:1234/v1", "api_key": "not-needed", // Optional "model": "local-model", "temperature": 0.7, "max_tokens": 2048, "timeout_seconds": 60}Examples:
LM Studio:
{ "provider": "openai_compatible", "base_url": "http://localhost:1234/v1", "model": "local-model", "temperature": 0.7}text-generation-webui:
{ "provider": "openai_compatible", "base_url": "http://localhost:5000/v1", "api_key": "not-needed", "model": "text-generation-webui-model", "temperature": 0.7}Default vs Named Components
Setting Default AI Provider
Only one AI component per company can be the default:
# Set as default (replaces previous default)curl -X PATCH http://localhost:4454/api/v1/admin/components/gpt4/default \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d '{"is_default": true}'All flows using default AI provider will automatically use gpt4.
Using Named Components in Flows
Override default in specific flowlets:
{ "name": "extract-with-claude", "module_ref": "text-extraction", "component_config_ids": { "ai": "claude-opus" // Override default }}This flowlet uses Claude instead of default provider.
Multi-Provider Strategy
Cost Optimization Example
Default: Mistral (budget, general use)ββ Most flows use Mistralββ Cost: ~$0.50 per 1M tokens
Named: claude-opus (expensive, complex tasks)ββ Legal document analysisββ Multi-step reasoningββ Cost: ~$15 per 1M tokens
Named: gpt-3.5 (fast, lightweight)ββ Classification tasksββ Simple summarizationββ Cost: ~$0.50 per 1M tokensFlow configuration:
{ "name": "document-workflow", "flowlets": [ { "name": "classify-document", "module_ref": "chat-completion", "component_config_ids": { "ai": "gpt35-fast" // Fast classification } }, { "name": "analyze-legal", "module_ref": "text-extraction", "component_config_ids": { "ai": "claude-opus" // Complex analysis } }, { "name": "summarize", "module_ref": "document-summary" // No override, uses default (Mistral) } ]}Regional/Availability Strategy
Primary: OpenAI (US)ββ Default for most flowsββ Latency: ~200ms
Fallback: Azure OpenAI (EU)ββ For EU data residencyββ Uses same API, different endpoint
Emergency: Ollama (on-prem)ββ If cloud unavailableββ Degraded quality but availableRate Limiting & Cost Management
Per-Provider Rate Limits
{ "name": "openai-gpt4", "kind": "ai", "config": { "provider": "openai", "api_key": "sk-...", "model": "gpt-4-turbo-preview", "rate_limit": { "requests_per_minute": 60, "tokens_per_minute": 10000 } }}flow8 implements exponential backoff and queuing if limits exceeded.
Cost Tracking
Monitor token usage per provider:
# Get cost metrics (requires Prometheus)curl http://localhost:9090/api/v1/query?query=flow8_ai_tokens_total{provider="openai"}
# Export to cost management system{ "provider": "openai", "tokens_used": 45000, "cost": 0.67, "period": "2026-04-04"}Testing Providers
Health Check
# Test connectivity to providercurl -X POST http://localhost:4454/api/v1/admin/components/test \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -d '{ "provider": "openai", "api_key": "sk-...", "model": "gpt-4" }'
Response:{ "status": "ok", "latency_ms": 234, "model_available": true, "error": null}Smoke Test
Test each provider with sample prompts:
# Create test flow{ "name": "provider-test", "flowlets": [ { "name": "test-openai", "module_ref": "chat-completion", "component_config_ids": { "ai": "gpt4" }, "input_mapping": { "prompt": "Hello, world! What is 2+2?" } }, { "name": "test-claude", "module_ref": "chat-completion", "component_config_ids": { "ai": "claude-opus" }, "input_mapping": { "prompt": "Hello, world! What is 2+2?" } } ]}
# Execute and compare responsesMigration & Switching
Zero-Downtime Provider Switch
- Create new component (donβt make default yet):
curl -X POST http://localhost:4454/api/v1/admin/components \ -d '{ "name": "openai-gpt4-new", "kind": "ai", "config": { ... }, "is_default": false }'- Test with canary flows:
{ "name": "canary-flow", "flowlets": [ { "component_config_ids": { "ai": "openai-gpt4-new" // New provider } } ]}-
Monitor quality and latency metrics
-
Update other flows to use new provider
-
Make it the new default:
curl -X PATCH http://localhost:4454/api/v1/admin/components/openai-gpt4-new/default \ -d '{"is_default": true}'-
Update remaining flows that explicitly referenced old provider
-
Decommission old component:
curl -X DELETE http://localhost:4454/api/v1/admin/components/openai-gpt4-oldTroubleshooting
βAPI key invalidβ Error
Error: invalid_api_keySolution:
- Verify key format (should start with provider prefix:
sk-for OpenAI,sk-ant-for Anthropic) - Ensure key has required permissions
- Check if key is expired or revoked
- Verify key is for correct organization
βModel not foundβ Error
Error: model_not_foundSolution:
- Verify model name is exact (case-sensitive)
- Check provider docs for available models
- For Ollama, ensure model is pulled:
ollama pull mistral:7b
Timeout/Slow Responses
timeout waiting for responseSolution:
- Increase
timeout_secondsin component config - Check network latency to provider
- For self-hosted Ollama, increase
timeout_seconds(120+ for large models) - Monitor provider status page for outages
Rate Limit Exceeded
Error: rate_limit_exceededSolution:
- Increase
rate_limit.requests_per_minuteif quota allows - Reduce number of concurrent flows
- Implement request queuing/backpressure
- Use cheaper/faster model as fallback
Best Practices
-
Use appropriate model for task:
- GPT-4 for complex reasoning
- Mistral for cost-effective general use
- Haiku for fast classification
-
Monitor costs: Track token usage per provider and flow
-
Test before production: Verify quality with canary flows
-
Have fallback: Configure multiple providers for redundancy
-
Set reasonable timeouts: Account for network latency and model size
-
Cache when possible: Use KV store to cache expensive API calls
-
Stream for long outputs: Enable streaming for responses > 2K tokens