LiteLLM Proxy Setup - Alex Sidebar

LiteLLM proxy connects Alex Sidebar to Amazon Bedrock, Google Vertex AI, and other enterprise AI providers. Teams can use their existing cloud infrastructure without changing their security setup.

For iOS Developers: If your company already pays for AWS Bedrock or Google Cloud AI, this guide shows how to use those models in Xcode with Alex Sidebar instead of buying separate API keys.

What is LiteLLM?

LiteLLM is an open-source proxy that translates between the OpenAI API format and 100+ different AI providers. Alex Sidebar can work with enterprise AI services that don’t support the OpenAI API format.

LiteLLM is what Alex Sidebar uses internally for model connections, making it a well-tested solution for enterprise deployments. Current stable version: v1.73.6-stable (June 2025)

Why Use LiteLLM?

Use Your Company's AI

If your company uses AWS Bedrock or Google Cloud AI, LiteLLM lets you access those models through Alex Sidebar

Data Never Leaves Your Infrastructure

Your code stays within your company’s cloud. No data goes to Alex Sidebar servers

Track Costs by Project

See exactly how much each project costs. Set budgets and get alerts

One Interface for All Models

Switch between Claude 4 on Bedrock, Gemini 2.5 on Vertex, or GPT-4 on Azure without changing code

Quick Start

Install LiteLLM

Choose your deployment method:Option 1: pip install (simplest)

pip install 'litellm[proxy]'
litellm --model bedrock/claude-4-sonnet --port 4000

Option 2: Docker (recommended for production)

docker run -p 4000:4000 ghcr.io/berriai/litellm:v1.73.6-stable

Configure Your Providers

Create a config.yaml file in your LiteLLM directory:

model_list:
  # Amazon Bedrock - Latest Claude 4 Models
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  # Google Vertex AI - Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-gcp-project"
      vertex_location: "us-central1"
  
         # OpenAI O-Series with Reasoning
   - model_name: "o4-mini"
     litellm_params:
       model: "o4-mini-2025-04-16"
       api_key: "your-openai-key"
       
   - model_name: "o3-pro"
     litellm_params:
       model: "o3-pro-2025-06-10"
       api_key: "your-openai-key"

# Start proxy with config
# litellm --config config.yaml --port 4000

Connect Alex Sidebar

In Alex Sidebar, add a custom model pointing to your LiteLLM proxy:

Open Settings → Models → Custom Models
Click “Add New Model”
Configure:
- Model ID: Your model name from config.yaml (e.g., claude-4-sonnet)
- Base URL: Your LiteLLM URL + /v1 (e.g., https://litellm.company.com/v1)
- API Key: Your LiteLLM proxy key (if configured)

Provider-Specific Setup

Amazon Bedrock

Setup
Authentication

Ensure your AWS credentials are configured on the LiteLLM server
Enable the models you need in the AWS Bedrock console
Add to your LiteLLM config:

model_list:
  # Latest Claude 4 Models
  - model_name: "claude-4-opus"
    litellm_params:
      model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  # Reasoning and Thinking Support
  - model_name: "claude-4-sonnet-reasoning"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      thinking: true
  
  # Latest Llama 4 Models
  - model_name: "llama4-70b"
    litellm_params:
      model: "bedrock/meta.llama4-70b-instruct-v1:0"
      aws_region_name: "us-east-1"
      
  # DeepSeek R1 Models
  - model_name: "deepseek-r1"
    litellm_params:
      model: "bedrock/deepseek.deepseek-r1-distill-llama-70b"
      aws_region_name: "us-east-1"

Google Vertex AI

Setup
Authentication

Enable the Vertex AI API in your GCP project
Set up authentication (service account recommended)
Add to your LiteLLM config:

model_list:
  # Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  - model_name: "gemini-2.5-flash"
    litellm_params:
      model: "vertex_ai/gemini-2.5-flash"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  # Claude 4 on Vertex AI
  - model_name: "claude-4-vertex"
    litellm_params:
      model: "vertex_ai/claude-4-opus"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
  
  # Multimodal Image Generation
  - model_name: "imagen-4"
    litellm_params:
      model: "vertex_ai/imagen-4"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"

Azure OpenAI

Setup
Authentication

Configure Azure OpenAI with latest models:

model_list:
# Latest O-Series Models
- model_name: "o4-mini"
litellm_params:
  model: "azure/o4-mini-2025-04-16"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
- model_name: "o3-pro"
litellm_params:
  model: "azure/o3-pro-2025-06-10"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
# GPT-4 with Audio Preview
- model_name: "gpt-4o-audio"
litellm_params:
  model: "azure/gpt-4o-audio-preview-2025-06-03"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"

Advanced Features

Reasoning and Thinking Capabilities

Enable advanced reasoning for supported models:

model_list:
  - model_name: "claude-4-reasoning"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      thinking: true
      
  - model_name: "o3-pro-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"
      reasoning_effort: "high"
      
  - model_name: "o4-mini-reasoning"
    litellm_params:
      model: "o4-mini-2025-04-16"
      reasoning_effort: "medium"

Multimodal Support

Configure models for text, image, audio, and video:

model_list:
  # Vision + Audio Models
  - model_name: "gpt-4o-multimodal"
    litellm_params:
      model: "gpt-4o"
      supports_vision: true
      supports_audio: true
      
  # Gemini with Enhanced Multimodal
  - model_name: "gemini-2.5-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      supports_vision: true
      supports_pdf_input: true
      supports_audio: true

MCP Gateway Integration

Enable Model Context Protocol for enhanced tool use:

general_settings:
  enable_mcp_gateway: true
  
mcp_servers:
  - server_name: "filesystem"
    server_command: ["uvx", "mcp-server-filesystem", "/path/to/allowed/files"]
    
  - server_name: "jira"
    server_command: ["node", "/path/to/jira-mcp-server"]
    auth_type: "api_key"
    auth_value: "your-jira-api-key"

Team Configuration

For team accounts, you can override all Alex Sidebar model endpoints:

Go to Alex Sidebar Admin Portal
Navigate to Models tab
Add your LiteLLM proxy URL as Base URL for each model type
All team members automatically use your proxy

All AI requests from your team go through your infrastructure. You control the data and costs.

See the Team Configuration Guide for detailed instructions on managing team models.

Advanced Configuration

Load Balancing with Fallbacks

Distribute requests across multiple model deployments with intelligent routing:

model_list:
  - model_name: "claude-4-primary"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  - model_name: "claude-4-fallback"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      api_key: "fallback-key"

router_settings:
  routing_strategy: "least-busy"  # or "round-robin", "weighted-round-robin"
  model_group_alias: "claude-4"
  fallbacks: [{"claude-4-primary": ["claude-4-fallback"]}]
  cooldown_time: 60  # seconds before retrying failed deployment

Cost Tracking and Budget Management

Enable comprehensive cost tracking:

general_settings:
  master_key: "your-secret-key"
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  
# Budget controls
litellm_settings:
  max_budget: 1000  # Monthly budget in USD
  budget_duration: "monthly"  # daily, weekly, monthly
  success_callback: ["langfuse", "prometheus"]
  track_cost_callback: true
  
# User-level budgets
user_api_key_config:
  user1:
    budget_duration: "monthly"
    max_budget: 100

Security and Rate Limiting

Secure your LiteLLM deployment with advanced controls:

general_settings:
  master_key: "sk-your-secret-key"
  
  # Enhanced security
  allowed_ips: ["10.0.0.0/8", "172.16.0.0/12"]
  disable_spend_logs: false
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
  # Advanced rate limiting
  max_parallel_requests: 1000
  max_request_per_minute: 10000
  rate_limiting_strategy: "sliding-window"  # New accurate rate limiting
  
# SCIM integration for enterprise SSO
scim_settings:
  enabled: true
  scim_base_url: "https://your-litellm.com/scim/v2"

Vector Store Integration

Connect to knowledge bases and vector stores:

vector_stores:
  - store_name: "company_docs"
    store_type: "bedrock_knowledge_base"
    knowledge_base_id: "your-kb-id"
    aws_region: "us-east-1"
    
  - store_name: "technical_docs"
    store_type: "pinecone"
    api_key: "your-pinecone-key"
    environment: "your-environment"

# Auto-activate for specific models
model_list:
  - model_name: "claude-4-with-kb"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      vector_store: "company_docs"

Monitoring & Observability

LiteLLM v1.73.6 provides enhanced monitoring capabilities:

Performance Metrics

2x Higher RPS: Enhanced aiohttp transport for improved performance
50ms Median Latency: Optimized for high-throughput applications
Multi-instance Rate Limiting: Accurate rate limiting across deployments

Dashboard Features

general_settings:
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  ui_access_mode: "admin_only"  # or "all"
  
# Enhanced logging
litellm_settings:
  success_callback: ["langfuse", "prometheus", "datadog"]
  failure_callback: ["slack", "pagerduty"]
  
# Session tracking
session_config:
  enable_session_logs: true
  session_retention_days: 30

Real-time Monitoring

# Prometheus metrics
prometheus_settings:
  enabled: true
  track_end_users: false  # Opt-in to prevent large metric sets
  
# Health checks
health_check:
  enabled: true
  check_interval: 300  # seconds
  models_to_check: ["claude-4-sonnet", "gemini-2.5-pro", "o4-mini-2025-04-16", "o3-pro-2025-06-10"]

Troubleshooting

Connection refused error

Verify LiteLLM is running and accessible
Check firewall rules and security groups
Ensure you’re using the correct URL format with /v1 suffix
For Docker: Check port mapping and container status

Authentication errors

Bedrock: Verify AWS credentials, IAM permissions, and model access
Vertex: Check service account permissions and project settings
Azure: Ensure API keys and resource endpoints are correct
Verify master key matches if configured

Model not found or deprecated

Check model name matches exactly in config.yaml
Verify the model is enabled in your cloud provider console
Update to latest model versions (e.g., claude-4 instead of claude-3)
Check region/location settings and model availability

High latency or rate limiting

Enable aiohttp transport: USE_AIOHTTP_TRANSPORT=True
Implement load balancing across multiple deployments
Adjust max_parallel_requests and rate limiting settings
Consider regional deployment distribution

Cost tracking issues

Ensure database connection is properly configured
Check that track_cost_callback: true is set
Verify model pricing information is up to date
Review spend logs retention settings

Common Use Cases for iOS Teams

Scenario 1: Company Uses AWS with Latest Models

Your company has AWS Bedrock with Claude 4 models. Instead of buying Anthropic API keys:

Deploy LiteLLM v1.73.6-stable on an EC2 instance
Configure it to use your Bedrock Claude 4 models with reasoning capabilities
Developers connect Alex Sidebar to your LiteLLM endpoint
All costs go to your AWS bill with detailed tracking

Scenario 2: Multi-Cloud Model Testing

Test latest models across providers without changing code:

model_list:
  - model_name: "best-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"  # OpenAI's latest reasoning model
      reasoning_effort: "high"
      
  - model_name: "best-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"  # Google's latest multimodal
      
  - model_name: "best-coding"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"  # Claude 4 for coding

Scenario 3: Development vs Production with New Models

# Dev environment - use efficient models
- model_name: "dev-model"
  litellm_params:
    model: "bedrock/anthropic.claude-3-haiku-20240307-v1:0"
    max_tokens: 1000

# Staging - test new capabilities
- model_name: "staging-model"
  litellm_params:
    model: "vertex_ai/gemini-2.5-flash"

# Production - use most capable models  
- model_name: "prod-model"
  litellm_params:
    model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
    thinking: true

Scenario 4: Enterprise Security and Compliance

general_settings:
  master_key: "sk-your-secure-key"
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
# PII masking and content filtering
guardrail_settings:
  presidio:
    mask_entities: ["PERSON", "EMAIL", "PHONE_NUMBER"]
    block_entities: ["MEDICAL_LICENSE"]
  
  bedrock_guardrails:
    guardrail_id: "your-guardrail-id"
    guardrail_version: "1"

Enterprise Features (LiteLLM v1.73.6+)

SCIM Integration

Automatic user provisioning from your identity provider:

Okta, Azure AD, OneLogin support
Automatic team creation and user assignment
Deprovisioning when users are removed

Advanced Analytics

Team and tag-based usage tracking
Daily spend analysis by model and user
Session grouping and analysis
Audit logs for compliance

Enhanced Security

Vector store permissions by team/user
MCP server access controls
IP allowlisting and rate limiting
End-to-end encryption options

Next Steps

Review LiteLLM’s official documentation for detailed configuration options
Check the Proxy UI to monitor costs and usage with the new dashboard
Explore Vector Store integration for RAG applications
Join the Alex Sidebar Discord for help with enterprise setups
Contact team@alexcodes.app for business support and enterprise features

LiteLLM v1.73.6-stable gives you control over your AI infrastructure with the latest models and enterprise-grade features, working seamlessly with all Alex Sidebar capabilities.

Alex Sidebar

New Additions

API Keys

Chat

Working with Completions

Configuration

Window Settings

Support

​What is LiteLLM?

​Why Use LiteLLM?

Use Your Company's AI

Data Never Leaves Your Infrastructure

Track Costs by Project

One Interface for All Models

​Quick Start

​Provider-Specific Setup

​Amazon Bedrock

​Google Vertex AI

​Azure OpenAI

​Advanced Features

​Reasoning and Thinking Capabilities

​Multimodal Support

​MCP Gateway Integration

​Team Configuration

​Advanced Configuration

​Load Balancing with Fallbacks

​Cost Tracking and Budget Management

​Security and Rate Limiting

​Vector Store Integration

​Monitoring & Observability

​Performance Metrics

​Dashboard Features

​Real-time Monitoring

​Troubleshooting

​Common Use Cases for iOS Teams

​Scenario 1: Company Uses AWS with Latest Models

​Scenario 2: Multi-Cloud Model Testing

​Scenario 3: Development vs Production with New Models

​Scenario 4: Enterprise Security and Compliance

​Enterprise Features (LiteLLM v1.73.6+)

​SCIM Integration

​Advanced Analytics

​Enhanced Security

​Next Steps

What is LiteLLM?

Why Use LiteLLM?

Quick Start

Provider-Specific Setup

Amazon Bedrock

Google Vertex AI

Azure OpenAI

Advanced Features

Reasoning and Thinking Capabilities

Multimodal Support

MCP Gateway Integration

Team Configuration

Advanced Configuration

Load Balancing with Fallbacks

Cost Tracking and Budget Management

Security and Rate Limiting

Vector Store Integration

Monitoring & Observability

Performance Metrics

Dashboard Features

Real-time Monitoring

Troubleshooting

Common Use Cases for iOS Teams

Scenario 1: Company Uses AWS with Latest Models

Scenario 2: Multi-Cloud Model Testing

Scenario 3: Development vs Production with New Models

Scenario 4: Enterprise Security and Compliance

Enterprise Features (LiteLLM v1.73.6+)

SCIM Integration

Advanced Analytics

Enhanced Security

Next Steps