Configuration Options Reference
Complete reference of all configuration options available in Synthetic Data Studio.
Environment Variables​
Core Application Settings​
SECRET_KEY​
- Type:
string - Required: Yes
- Description: Secret key for JWT token signing
- Default: None
- Example:
your-256-bit-secret-key-here - Security: Must be kept secret and randomly generated
ALGORITHM​
- Type:
string - Required: No
- Description: JWT algorithm for token signing
- Default:
HS256 - Allowed Values:
HS256,HS384,HS512 - Example:
HS256
ACCESS_TOKEN_EXPIRE_MINUTES​
- Type:
integer - Required: No
- Description: JWT access token expiration time
- Default:
30 - Range:
5-1440(24 hours) - Example:
60
REFRESH_TOKEN_EXPIRE_DAYS​
- Type:
integer - Required: No
- Description: Refresh token expiration time
- Default:
7 - Range:
1-365 - Example:
30
Database Configuration​
DATABASE_URL​
- Type:
string - Required: Yes
- Description: Database connection URL
- Default:
sqlite:///./synth_studio.db - Supported: SQLite, PostgreSQL, MySQL
- Examples:
- SQLite:
sqlite:///./synth_studio.db - PostgreSQL:
postgresql://user:pass@localhost:5432/db - MySQL:
mysql://user:pass@localhost:3306/db
- SQLite:
Server Configuration​
HOST​
- Type:
string - Required: No
- Description: Server bind address
- Default:
0.0.0.0 - Example:
127.0.0.1
PORT​
- Type:
integer - Required: No
- Description: Server port
- Default:
8000 - Range:
1024-65535 - Example:
8080
DEBUG​
- Type:
boolean - Required: No
- Description: Enable debug mode
- Default:
false - Note: Disable in production
RELOAD​
- Type:
boolean - Required: No
- Description: Enable auto-reload on code changes
- Default:
false - Note: Disable in production
File Storage​
UPLOAD_DIR​
- Type:
string - Required: No
- Description: Directory for uploaded files
- Default:
./uploads - Example:
/app/uploads
MAX_FILE_SIZE​
- Type:
string - Required: No
- Description: Maximum file size for uploads
- Default:
100MB - Format: Size with unit (MB, GB)
- Example:
500MB
ALLOWED_EXTENSIONS​
- Type:
string - Required: No
- Description: Comma-separated list of allowed file extensions
- Default:
csv,json,xlsx,parquet - Example:
csv,json,xlsx
External Services​
AWS S3 Configuration​
USE_S3​
- Type:
boolean - Required: No
- Description: Enable AWS S3 storage
- Default:
false
AWS_ACCESS_KEY_ID​
- Type:
string - Required: Conditional (if USE_S3=true)
- Description: AWS access key ID
- Example:
AKIAEXAMPLEKEY
AWS_SECRET_ACCESS_KEY​
- Type:
string - Required: Conditional (if USE_S3=true)
- Description: AWS secret access key
- Security: Must be kept secret
AWS_DEFAULT_REGION​
- Type:
string - Required: Conditional (if USE_S3=true)
- Description: AWS region
- Default:
us-east-1 - Example:
eu-west-1
S3_BUCKET​
- Type:
string - Required: Conditional (if USE_S3=true)
- Description: S3 bucket name
- Example:
my-synth-studio-bucket
Google Cloud Storage​
USE_GCS​
- Type:
boolean - Required: No
- Description: Enable Google Cloud Storage
- Default:
false
GOOGLE_APPLICATION_CREDENTIALS​
- Type:
string - Required: Conditional (if USE_GCS=true)
- Description: Path to GCS service account JSON file
- Example:
./service-account.json
GCS_BUCKET​
- Type:
string - Required: Conditional (if USE_GCS=true)
- Description: GCS bucket name
- Example:
my-synth-studio-bucket
AI/LLM Services​
Google Gemini​
USE_GEMINI​
- Type:
boolean - Required: No
- Description: Enable Google Gemini AI
- Default:
false
GEMINI_API_KEY​
- Type:
string - Required: Conditional (if USE_GEMINI=true)
- Description: Google Gemini API key
- Security: Must be kept secret
GEMINI_MODEL​
- Type:
string - Required: No
- Description: Gemini model to use
- Default:
gemini-1.5-flash - Options:
gemini-1.5-flash,gemini-1.5-pro
GEMINI_MAX_TOKENS​
- Type:
integer - Required: No
- Description: Maximum tokens for Gemini responses
- Default:
2048 - Range:
1-8192
GEMINI_TEMPERATURE​
- Type:
float - Required: No
- Description: Response creativity (0.0 = deterministic, 1.0 = creative)
- Default:
0.7 - Range:
0.0-2.0
Groq​
USE_GROQ​
- Type:
boolean - Required: No
- Description: Enable Groq AI
- Default:
false
GROQ_API_KEY​
- Type:
string - Required: Conditional (if USE_GROQ=true)
- Description: Groq API key
- Security: Must be kept secret
GROQ_MODEL​
- Type:
string - Required: No
- Description: Groq model to use
- Default:
llama-3.1-70b-versatile - Options:
llama-3.1-70b-versatile,llama-3.1-8b-instant,mixtral-8x7b-32768
GROQ_MAX_TOKENS​
- Type:
integer - Required: No
- Description: Maximum tokens for Groq responses
- Default:
4096 - Range:
1-8192
GROQ_TEMPERATURE​
- Type:
float - Required: No
- Description: Response creativity
- Default:
0.1 - Range:
0.0-2.0
OpenAI​
USE_OPENAI​
- Type:
boolean - Required: No
- Description: Enable OpenAI
- Default:
false
OPENAI_API_KEY​
- Type:
string - Required: Conditional (if USE_OPENAI=true)
- Description: OpenAI API key
- Security: Must be kept secret
OPENAI_MODEL​
- Type:
string - Required: No
- Description: OpenAI model to use
- Default:
gpt-4 - Options:
gpt-4,gpt-3.5-turbo
OPENAI_MAX_TOKENS​
- Type:
integer - Required: No
- Description: Maximum tokens for OpenAI responses
- Default:
2048
OPENAI_TEMPERATURE​
- Type:
float - Required: No
- Description: Response creativity
- Default:
0.3
Synthesis Configuration​
Default Parameters​
DEFAULT_GENERATOR_TYPE​
- Type:
string - Required: No
- Description: Default synthesis method
- Default:
ctgan - Options:
ctgan,tvae,gaussian_copula
DEFAULT_EPOCHS​
- Type:
integer - Required: No
- Description: Default training epochs
- Default:
50 - Range:
1-1000
DEFAULT_BATCH_SIZE​
- Type:
integer - Required: No
- Description: Default batch size
- Default:
500 - Range:
10-10000
DEFAULT_NUM_ROWS​
- Type:
integer - Required: No
- Description: Default number of synthetic rows
- Default:
1000 - Range:
100-1000000
GPU Configuration​
USE_GPU​
- Type:
boolean - Required: No
- Description: Enable GPU acceleration
- Default:
true
CUDA_VISIBLE_DEVICES​
- Type:
string - Required: No
- Description: GPU device IDs to use
- Default:
0 - Example:
0,1(use GPUs 0 and 1)
Differential Privacy Defaults​
DEFAULT_EPSILON​
- Type:
float - Required: No
- Description: Default privacy budget
- Default:
10.0 - Range:
0.1-100.0
DEFAULT_DELTA​
- Type:
string - Required: No
- Description: Default failure probability
- Default:
auto - Options:
autoor numeric value (e.g.,1e-5)
DEFAULT_MAX_GRAD_NORM​
- Type:
float - Required: No
- Description: Gradient clipping threshold
- Default:
1.0 - Range:
0.1-10.0
Evaluation Configuration​
Statistical Tests​
KS_TEST_SIGNIFICANCE​
- Type:
float - Required: No
- Description: Significance level for KS test
- Default:
0.05 - Range:
0.001-0.1
CHI_SQUARE_SIGNIFICANCE​
- Type:
float - Required: No
- Description: Significance level for Chi-square test
- Default:
0.05 - Range:
0.001-0.1
WASSERSTEIN_THRESHOLD​
- Type:
float - Required: No
- Description: Acceptable Wasserstein distance
- Default:
0.1 - Range:
0.01-1.0
ML Utility​
ML_UTILITY_TEST_SIZE​
- Type:
float - Required: No
- Description: Test set proportion for ML evaluation
- Default:
0.2 - Range:
0.1-0.5
ML_UTILITY_RANDOM_STATE​
- Type:
integer - Required: No
- Description: Random seed for reproducible results
- Default:
42
Background Processing​
Redis Configuration​
REDIS_URL​
- Type:
string - Required: No
- Description: Redis connection URL
- Default:
redis://localhost:6379/0 - Example:
redis://username:password@host:port/db
Celery Configuration​
CELERY_BROKER_URL​
- Type:
string - Required: No
- Description: Celery message broker URL
- Default:
redis://localhost:6379/0
CELERY_RESULT_BACKEND​
- Type:
string - Required: No
- Description: Celery result backend URL
- Default:
redis://localhost:6379/0
CELERY_TASK_SERIALIZER​
- Type:
string - Required: No
- Description: Task serialization format
- Default:
json - Options:
json,pickle,yaml
Logging Configuration​
LOG_LEVEL​
- Type:
string - Required: No
- Description: Logging level
- Default:
INFO - Options:
DEBUG,INFO,WARNING,ERROR,CRITICAL
LOG_FORMAT​
- Type:
string - Required: No
- Description: Log message format
- Default:
%(asctime)s - %(name)s - %(levelname)s - %(message)s - Example:
[%(levelname)s] %(message)s
LOG_FILE​
- Type:
string - Required: No
- Description: Log file path (optional)
- Default: None
- Example:
./logs/app.log
Security Configuration​
CORS Settings​
ALLOWED_ORIGINS​
- Type:
string - Required: No
- Description: Comma-separated allowed origins
- Default:
*(allow all in development) - Example:
http://localhost:3000,https://myapp.com
ALLOW_CREDENTIALS​
- Type:
boolean - Required: No
- Description: Allow credentials in CORS
- Default:
true
ALLOW_METHODS​
- Type:
string - Required: No
- Description: Comma-separated allowed HTTP methods
- Default:
GET,POST,PUT,DELETE,OPTIONS
ALLOW_HEADERS​
- Type:
string - Required: No
- Description: Comma-separated allowed headers
- Default:
*
Monitoring & Metrics​
ENABLE_METRICS​
- Type:
boolean - Required: No
- Description: Enable Prometheus metrics
- Default:
false
METRICS_PORT​
- Type:
integer - Required: No
- Description: Metrics server port
- Default:
9090
HEALTH_CHECK_INTERVAL​
- Type:
integer - Required: No
- Description: Health check interval (seconds)
- Default:
30
Configuration File Examples​
Development Configuration​
# Development Environment
DEBUG=true
RELOAD=true
LOG_LEVEL=DEBUG
DATABASE_URL=sqlite:///./dev.db
# AI Services (optional)
USE_GEMINI=true
GEMINI_API_KEY=your-dev-key
# File limits
MAX_FILE_SIZE=50MB
Production Configuration​
# Production Environment
DEBUG=false
RELOAD=false
LOG_LEVEL=WARNING
DATABASE_URL=postgresql://user:pass@db-host:5432/synth_studio
# Security
SECRET_KEY=your-production-secret-key
ALLOWED_ORIGINS=https://your-app.com,https://admin.your-app.com
# External Services
USE_S3=true
AWS_ACCESS_KEY_ID=production-key
AWS_SECRET_ACCESS_KEY=production-secret
S3_BUCKET=prod-synth-studio
# AI Services
USE_GEMINI=true
GEMINI_API_KEY=production-gemini-key
# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090
# Background Jobs
REDIS_URL=redis://prod-redis:6379/0
CELERY_BROKER_URL=redis://prod-redis:6379/0
Testing Configuration​
# Testing Environment
DEBUG=false
TESTING=true
DATABASE_URL=sqlite:///./test.db
SECRET_KEY=test-secret-key
# Disable external services in tests
USE_S3=false
USE_GEMINI=false
USE_GROQ=false
# Fast test execution
DEFAULT_EPOCHS=5
DEFAULT_BATCH_SIZE=100
Configuration Validation​
The application validates configuration on startup. Invalid configurations will prevent startup with clear error messages.
Common Validation Errors​
- Missing SECRET_KEY: Required for JWT token signing
- Invalid DATABASE_URL: Must be properly formatted connection string
- Invalid file paths: UPLOAD_DIR must exist and be writable
- Conflicting settings: Cannot enable multiple AI providers with same model
Configuration Testing​
# Test configuration loading
python -c "from app.core.config import settings; print(' Config loaded successfully')"
# Validate database connection
python -c "from app.database.database import engine; print(' Database connected')"
# Test AI service connections (if enabled)
python -c "from app.services.llm.chat_service import ChatService; print(' AI services initialized')"
Environment-Specific Overrides​
Using Multiple .env Files​
# .env.base - Shared configuration
SECRET_KEY=base-secret
DATABASE_URL=sqlite:///./app.db
# .env.development - Development overrides
DEBUG=true
LOG_LEVEL=DEBUG
# .env.production - Production overrides
DEBUG=false
LOG_LEVEL=WARNING
DATABASE_URL=postgresql://...
Loading Environment Files​
# In config.py
from pydantic import BaseSettings
class Settings(BaseSettings):
# Configuration fields...
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
Security Considerations​
Secret Management​
- Never commit secrets to version control
- Use environment variables for sensitive data
- Rotate secrets regularly
- Use different secrets for different environments
File Permissions​
- Ensure UPLOAD_DIR is writable by application user
- Restrict access to configuration files
- Use secure file permissions (e.g., 600 for secret files)
Network Security​
- Restrict ALLOWED_ORIGINS in production
- Use HTTPS/TLS encryption
- Implement proper firewall rules
- Regular security updates
Performance Tuning​
Database Optimization​
- Use connection pooling for high traffic
- Enable database query logging in development
- Configure appropriate connection limits
Memory Management​
- Adjust batch sizes based on available RAM
- Monitor memory usage in production
- Configure garbage collection settings
GPU Optimization​
- Set CUDA_VISIBLE_DEVICES for multi-GPU systems
- Monitor GPU memory usage
- Use appropriate batch sizes for GPU memory
Troubleshooting Configuration​
Debug Configuration Loading​
# Enable debug logging
LOG_LEVEL=DEBUG
# Check loaded configuration
python -c "
from app.core.config import settings
import json
print(json.dumps(settings.dict(), indent=2, default=str))
"
Common Issues​
Configuration not loading: Check .env file exists and is readable Database connection failed: Verify DATABASE_URL format and credentials AI services not working: Check API keys and network connectivity File upload issues: Verify UPLOAD_DIR exists and is writable
Configuration Reset​
# Reset to defaults
rm .env
cp .env.example .env
# Or manually set minimal config
export SECRET_KEY="temp-key-for-testing"
export DATABASE_URL="sqlite:///./temp.db"
Need help with configuration? Check the Configuration Guide or create an issue on GitHub.