How to Set Up LiteLLM on Google Cloud Platform: A Complete Guide
Setting up a unified API gateway for multiple Large Language Models (LLMs) can be challenging, especially when you need to manage different providers, handle authentication, implement load balancing, and track costs. LiteLLM solves these problems by providing a single interface to access over 100 LLM providers, and Google Cloud Platform offers the perfect serverless infrastructure to host it.
In this comprehensive guide, we'll walk through setting up LiteLLM on Google Cloud Platform using Cloud Run, from basic deployment to production-ready configurations with monitoring, security, and scalability.
What is LiteLLM?
LiteLLM is an open-source proxy server that provides a unified OpenAI-compatible API interface for accessing multiple LLM providers. It acts as a translation layer between your applications and various AI model providers, including OpenAI, Anthropic, Google, Cohere, and many others.
Key Features
- Unified Interface: All LLM providers are accessible through a single OpenAI-compatible API, eliminating the need to learn multiple provider-specific APIs.
- Load Balancing: Distribute requests across multiple model deployments to avoid rate limits and improve performance.
- Cost Tracking: Monitor spending across different models and projects with built-in usage tracking.
- Fallback Support: Automatically switch to backup models when primary ones fail or reach rate limits.
- Security: Built-in authentication, rate limiting, and secure credential management.
Why Choose Google Cloud Platform?
Google Cloud Platform provides several advantages for hosting LiteLLM:
- Serverless Scaling: Cloud Run automatically scales your LiteLLM proxy based on demand, scaling to zero when not in use.
- Cost Efficiency: Pay only for the resources you use, with generous free tiers for development and testing.
- Enterprise Security: Built-in security features, IAM integration, and compliance certifications.
- Global Infrastructure: Deploy across multiple regions for low latency worldwide.
- Integrated Services: Seamless integration with Cloud SQL, Secret Manager, and monitoring services.
Prerequisites
Before we begin, ensure you have:
- Google Cloud Account with billing enabled
- gcloud CLI installed and configured
- Docker installed locally (optional, for custom builds)
- API keys for the LLM providers you plan to use
Enable Required APIs
gcloud services enable run.googleapis.com
gcloud services enable cloudbuild.googleapis.com
gcloud services enable secretmanager.googleapis.com
gcloud services enable sql-component.googleapis.comStep-by-Step Deployment Process
Let's walk through the complete deployment process, starting with a basic setup and progressing to production configurations.
Step 1: Project Setup
Create a new Google Cloud project or select an existing one:
# Create a new project
gcloud projects create your-litellm-project
# Set the project as default
gcloud config set project your-litellm-project
# Set your preferred region
gcloud config set run/region us-central1Step 2: Basic Configuration
Create a basic configuration file for LiteLLM. This example shows how to configure multiple LLM providers:
# basic_config.yaml
model_list:
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-3
litellm_params:
model: claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gemini-pro
litellm_params:
model: gemini/gemini-pro
api_key: os.environ/GEMINI_API_KEY
litellm_settings:
set_verbose: true
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URLStep 3: Secure Credential Management
Never store API keys directly in your configuration. Use Google Secret Manager:
# Store your API keys securely
echo "your-openai-api-key" | gcloud secrets create openai-api-key --data-file=-
echo "your-anthropic-api-key" | gcloud secrets create anthropic-api-key --data-file=-
echo "sk-your-strong-master-key" | gcloud secrets create litellm-master-key --data-file=-Step 4: Deploy to Cloud Run
Deploy LiteLLM using Cloud Run with the official Docker image:
gcloud run deploy litellm-proxy \
--image ghcr.io/berriai/litellm:main-latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars="LITELLM_MASTER_KEY=projects/your-project/secrets/litellm-master-key/versions/latest" \
--set-env-vars="OPENAI_API_KEY=projects/your-project/secrets/openai-api-key/versions/latest" \
--set-env-vars="ANTHROPIC_API_KEY=projects/your-project/secrets/anthropic-api-key/versions/latest" \
--memory 2Gi \
--cpu 1 \
--port 4000Step 5: Test Your Deployment
Once deployed, test your LiteLLM proxy:
# Get the service URL
SERVICE_URL=$(gcloud run services describe litellm-proxy --region=us-central1 --format='value(status.url)')
# Test the endpoint
curl -X POST "$SERVICE_URL/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-master-key" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'Production Configuration
For production deployments, you'll need additional components like databases, caching, and monitoring. Here's a comprehensive production configuration:
# production_config.yaml
model_list:
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: os.environ/OPENAI_API_KEY
model_info:
max_tokens: 4096
cost_per_1k_tokens: 0.03
- model_name: claude-3
litellm_params:
model: claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY
model_info:
max_tokens: 4096
cost_per_1k_tokens: 0.015
- model_name: gemini-pro
litellm_params:
model: gemini/gemini-pro
api_key: os.environ/GEMINI_API_KEY
model_info:
max_tokens: 8192
cost_per_1k_tokens: 0.0005
litellm_settings:
set_verbose: false
json_logs: true
success_callback: ["datadog", "langfuse"]
failure_callback: ["datadog", "langfuse"]
cache_params:
type: redis
host: os.environ/REDIS_HOST
port: 6379
password: os.environ/REDIS_PASSWORD
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
store_model_in_db: true
store_model_predictions: true
store_model_feedback: true
router_settings:
routing_strategy: least-busy
model_group_alias:
gpt-4: ["gpt-4", "gpt-4-turbo"]
claude: ["claude-3-sonnet", "claude-3-haiku"]Database Setup for Production
LiteLLM can use Cloud SQL for logging and storing model configurations:
# Create a Cloud SQL instance
gcloud sql instances create litellm-db \
--database-version=POSTGRES_15 \
--tier=db-f1-micro \
--region=us-central1
# Create a database
gcloud sql databases create litellm --instance=litellm-db
# Create a user
gcloud sql users create litellm-user \
--instance=litellm-db \
--password=your-secure-passwordRedis Setup for Caching
For high-performance caching, set up Redis using Memorystore:
# Create a Redis instance
gcloud redis instances create litellm-cache \
--size=1 \
--region=us-central1 \
--network=defaultAdvanced Cloud Run Deployment
Deploy with production settings:
gcloud run deploy litellm-proxy \
--image ghcr.io/berriai/litellm:main-latest \
--platform managed \
--region us-central1 \
--min-instances 1 \
--max-instances 100 \
--cpu 2 \
--memory 4Gi \
--concurrency 80 \
--timeout 3600 \
--set-env-vars="DATABASE_URL=postgresql://litellm-user:password@db-ip:5432/litellm" \
--set-env-vars="REDIS_HOST=redis-ip" \
--set-env-vars="STORE_MODEL_IN_DB=True"Security Best Practices
Security is crucial when deploying LiteLLM in production. Follow this comprehensive security checklist:
Key Security Configurations
Strong Master Keys: Generate cryptographically strong master keys for LiteLLM authentication:
# Generate a strong master key
openssl rand -base64 32 | gcloud secrets create litellm-master-key --data-file=-IAM and Service Accounts: Create dedicated service accounts with minimal permissions:
# Create a service account for LiteLLM
gcloud iam service-accounts create litellm-service \
--display-name="LiteLLM Service Account"
# Grant necessary permissions
gcloud projects add-iam-policy-binding your-project \
--member="serviceAccount:litellm-service@your-project.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"Network Security: Configure VPC and firewall rules to restrict access:
# Create VPC connector for private resources
gcloud compute networks vpc-access connectors create litellm-connector \
--region=us-central1 \
--subnet=default \
--subnet-project=your-projectMonitoring and Observability
Effective monitoring is essential for production deployments. LiteLLM integrates with various observability platforms.
Cloud Monitoring Setup
Enable comprehensive monitoring for your LiteLLM deployment:
# Enable monitoring APIs
gcloud services enable monitoring.googleapis.com
gcloud services enable logging.googleapis.comLogging Configuration
Configure structured logging in your LiteLLM configuration:
litellm_settings:
set_verbose: false
json_logs: true
success_callback: ["datadog", "langfuse"]
failure_callback: ["datadog", "langfuse"]Key Metrics to Monitor
- Request latency and throughput
- Error rates by provider and model
- Cost tracking across different models
- Rate limit violations
- Resource utilization (CPU, memory)
Troubleshooting Common Issues
Even with careful setup, you may encounter issues. Here's a comprehensive troubleshooting guide:
Debugging Tips
Enable Debug Logging: For detailed troubleshooting, enable verbose logging:
gcloud run services update litellm-proxy \
--set-env-vars="LITELLM_LOG_LEVEL=DEBUG"Check Cloud Run Logs: View detailed logs from the Cloud Console:
gcloud logs read --limit=50 --format="table(timestamp,severity,textPayload)" \
--filter="resource.type=cloud_run_revision"Test Individual Components: Verify each component separately:
# Test database connectivity
gcloud sql connect litellm-db --user=litellm-user
# Test Redis connectivity
redis-cli -h redis-ip pingCost Optimization
Understanding and optimizing costs is crucial for sustainable deployments:
Cloud Run Pricing Factors
- CPU allocation: Choose appropriate CPU based on your workload
- Memory allocation: Start with 2GB and adjust based on usage
- Request volume: Monitor your request patterns
- Cold starts: Use minimum instances for frequently accessed services
Cost-Saving Strategies
Right-sizing Resources: Monitor actual resource usage and adjust allocations:
# Update resource allocation based on monitoring data
gcloud run services update litellm-proxy \
--cpu 1 \
--memory 2Gi \
--concurrency 80Efficient Caching: Implement Redis caching to reduce provider API calls and costs.
Load Balancing: Distribute requests across multiple deployments to optimize costs and performance.
Scaling and Multi-Region Deployment
For global applications, consider multi-region deployments:
Multi-Region Setup
Deploy LiteLLM across multiple regions for reduced latency:
# Deploy to multiple regions
for region in us-central1 europe-west1 asia-southeast1; do
gcloud run deploy litellm-proxy-$region \
--image ghcr.io/berriai/litellm:main-latest \
--region $region \
--platform managed
doneGlobal Load Balancing
Use Cloud Load Balancing to route traffic to the nearest region:
# Create a global load balancer
gcloud compute url-maps create litellm-lb \
--default-service=litellm-backend-serviceMaintenance and Updates
Regular Maintenance Tasks
- Update LiteLLM images regularly for security patches and new features
- Monitor API key expiration and rotate them proactively
- Review logs for errors and performance issues
- Update model configurations as new models become available
Automated Updates
Set up automated deployments using Cloud Build:
# cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/gcloud'
args: ['run', 'deploy', 'litellm-proxy',
'--image', 'ghcr.io/berriai/litellm:main-latest',
'--region', 'us-central1']Conclusion
Setting up LiteLLM on Google Cloud Platform provides a powerful, scalable, and cost-effective solution for managing multiple LLM providers. By following this guide, you've learned how to:
- Deploy LiteLLM on Cloud Run with proper security configurations
- Implement production-ready setups with databases and caching
- Monitor and troubleshoot your deployment effectively
- Scale globally with multi-region deployments
The combination of LiteLLM's unified interface and Google Cloud's serverless infrastructure creates an ideal platform for building robust AI applications. Whether you're starting with a simple development setup or deploying enterprise-grade infrastructure, this foundation will serve your LLM integration needs effectively.
Remember to regularly update your deployment, monitor costs and performance, and follow security best practices to maintain a healthy and efficient LiteLLM proxy on Google Cloud Platform.