Skip to main content

Configuration Guide

Learn how to configure Synthetic Data Studio for different environments and use cases.

Configuration Overview​

Synthetic Data Studio uses environment variables for configuration. The main configuration file is .env in the backend directory.

Basic Configuration​

Creating the Environment File​

# Copy the example file
cp .env.example .env

Essential Settings​

# ===========================================
# SYNTHETIC DATA STUDIO CONFIGURATION
# ===========================================

# Database Configuration
DATABASE_URL=sqlite:///./synth_studio.db

# Security Settings
SECRET_KEY=your-super-secret-key-here-change-this-in-production
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30

# File Upload Settings
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=100MB
ALLOWED_EXTENSIONS=csv,json,xlsx

# Server Settings
HOST=0.0.0.0
PORT=8000
DEBUG=true

Database Configuration​

SQLite (Development/Default)​

DATABASE_URL=sqlite:///./synth_studio.db

Pros: Simple, no additional setup, file-based Cons: Not suitable for production, limited concurrency

PostgreSQL (Production)​

DATABASE_URL=postgresql://username:password@localhost:5432/synth_studio

Setup:

# Install PostgreSQL driver
pip install psycopg2-binary

# Create database
createdb synth_studio

# Or via psql
psql -c "CREATE DATABASE synth_studio;"

MySQL/MariaDB​

DATABASE_URL=mysql://username:password@localhost:3306/synth_studio

Setup:

# Install MySQL driver
pip install pymysql

# Create database
mysql -u root -p -e "CREATE DATABASE synth_studio;"

� Security Configuration​

JWT Authentication​

# JWT Secret Key (REQUIRED - Generate a strong random key)
SECRET_KEY=your-256-bit-secret-key-here

# Token Expiration
ACCESS_TOKEN_EXPIRE_MINUTES=30

[!NOTE] Algorithm is currently fixed to HS256.

Generating a secure secret key:

import secrets
print(secrets.token_hex(32)) # 256-bit key

CORS Settings​

# CORS Origins (comma-separated)
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080

# CORS Credentials
ALLOW_CREDENTIALS=true

# CORS Methods
ALLOW_METHODS=GET,POST,PUT,DELETE,OPTIONS

# CORS Headers
ALLOW_HEADERS=*

File Storage Configuration​

Local Storage (Default)​

# Upload Directory
UPLOAD_DIR=./uploads

[!NOTE] File size limit (100MB) and allowed extensions (CSV, JSON) are currently enforced by the application and are not configurable via environment variables.

AWS S3 Storage​

# Enable S3 Storage
USE_S3=true

# AWS Credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_DEFAULT_REGION=us-east-1

# S3 Bucket
S3_BUCKET=your-synth-studio-bucket

# Optional: Custom S3 Endpoint (for MinIO, etc.)
S3_ENDPOINT_URL=https://minio.example.com

Google Cloud Storage​

# Enable GCS
USE_GCS=true

# GCS Credentials (JSON key file path)
GOOGLE_APPLICATION_CREDENTIALS=./service-account.json

# GCS Bucket
GCS_BUCKET=your-synth-studio-bucket

AI/LLM Configuration​

Google Gemini (Free Tier)​

# Enable Gemini
USE_GEMINI=true

# API Key
GEMINI_API_KEY=your-gemini-api-key

# Model Settings
GEMINI_MODEL=gemini-1.5-flash
GEMINI_MAX_TOKENS=2048
GEMINI_TEMPERATURE=0.7

Groq (Free Tier)​

# Enable Groq
USE_GROQ=true

# API Key
GROQ_API_KEY=your-groq-api-key

# Model Settings
GROQ_MODEL=llama-3.1-70b-versatile
GROQ_MAX_TOKENS=4096
GROQ_TEMPERATURE=0.1

OpenAI (Optional)​

# Enable OpenAI
USE_OPENAI=true

# API Key
OPENAI_API_KEY=your-openai-api-key

# Model Settings
OPENAI_MODEL=gpt-4
OPENAI_MAX_TOKENS=2048
OPENAI_TEMPERATURE=0.3

Synthesis Configuration​

[!NOTE] Synthesis parameters (epochs, batch size, privacy budget) are configured per-job via the API. The defaults mentioned below are application-level defaults.

GPU Settings​

# GPU Settings
USE_GPU=true
CUDA_VISIBLE_DEVICES=0

Evaluation Configuration​

Statistical Tests​

# Test Settings
KS_TEST_SIGNIFICANCE=0.05
CHI_SQUARE_SIGNIFICANCE=0.05
WASSERSTEIN_THRESHOLD=0.1

# ML Utility
ML_UTILITY_TEST_SIZE=0.2
ML_UTILITY_RANDOM_STATE=42

Privacy Tests​

# Membership Inference
MI_ATTACK_TRAIN_SIZE=0.5
MI_ATTACK_TEST_SIZE=0.3

# Attribute Inference
AI_ATTACK_SAMPLE_SIZE=1000
AI_ATTACK_SIGNIFICANCE=0.05

Server Configuration​

Development​

# Debug Mode
DEBUG=true

# Auto Reload
RELOAD=true

# Server Settings
HOST=0.0.0.0
PORT=8000

# Logging
LOG_LEVEL=INFO
LOG_FORMAT=%(asctime)s - %(name)s - %(levelname)s - %(message)s

Production​

# Production Settings
DEBUG=false
RELOAD=false

# Server
HOST=0.0.0.0
PORT=8000

# Workers (for Gunicorn)
WORKERS=4
WORKER_CLASS=uvicorn.workers.UvicornWorker

# Logging
LOG_LEVEL=WARNING

Background Jobs​

Celery Configuration​

# Redis Broker (for Celery)
REDIS_URL=redis://localhost:6379/0

# Celery Settings
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# Task Settings
CELERY_TASK_SERIALIZER=json
CELERY_ACCEPT_CONTENT=['json']
CELERY_RESULT_SERIALIZER=json
CELERY_TIMEZONE=UTC

Monitoring & Observability​

Logging​

# Log Level
LOG_LEVEL=INFO

# Log Format
LOG_FORMAT=%(asctime)s - %(name)s - %(levelname)s - %(message)s

# Log File (Optional)
LOG_FILE=./logs/synth_studio.log

# Log Rotation
LOG_MAX_BYTES=10485760 # 10MB
LOG_BACKUP_COUNT=5

Metrics (Optional)​

# Prometheus Metrics
ENABLE_METRICS=true
METRICS_PORT=9090

# Health Checks
HEALTH_CHECK_INTERVAL=30

� Environment-Specific Configurations​

Development Environment​

# .env.development
DEBUG=true
RELOAD=true
DATABASE_URL=sqlite:///./dev.db
LOG_LEVEL=DEBUG
USE_GPU=false

Testing Environment​

# .env.test
DEBUG=false
DATABASE_URL=sqlite:///./test.db
TESTING=true
USE_GPU=false
LOG_LEVEL=WARNING

Production Environment​

# .env.production
DEBUG=false
RELOAD=false
DATABASE_URL=postgresql://user:pass@prod-db:5432/synth_studio
USE_S3=true
LOG_LEVEL=WARNING
ENABLE_METRICS=true

Validation & Troubleshooting​

Configuration Validation​

The application validates configuration on startup. Common issues:

Database Connection:

ERROR: Database connection failed
SOLUTION: Check DATABASE_URL format and credentials

Missing Secret Key:

ERROR: SECRET_KEY not set
SOLUTION: Generate a secure random key

Invalid File Paths:

ERROR: UPLOAD_DIR does not exist
SOLUTION: Create directory or update path

Testing Configuration​

# Test database connection
python -c "from app.database.database import engine; print('DB OK' if engine else 'DB FAIL')"

# Test configuration loading
python -c "from app.core.config import settings; print('Config OK')"

# Test API startup
uvicorn app.main:app --dry-run

Complete Example Configuration​

Here's a complete production-ready configuration:

# ===========================================
# PRODUCTION CONFIGURATION
# ===========================================

# Database
DATABASE_URL=postgresql://synth_user:secure_password@db.example.com:5432/synth_studio

# Security
SECRET_KEY=256-bit-hex-key-here
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=60

# File Storage
USE_S3=true
AWS_ACCESS_KEY_ID=AKIAEXAMPLE
AWS_SECRET_ACCESS_KEY=secret-key-here
AWS_DEFAULT_REGION=us-east-1
S3_BUCKET=synth-studio-production

# AI Services
USE_GEMINI=true
GEMINI_API_KEY=gemini-key-here
USE_GROQ=true
GROQ_API_KEY=groq-key-here

# Server
DEBUG=false
HOST=0.0.0.0
PORT=8000
WORKERS=4

# Logging
LOG_LEVEL=INFO
LOG_FILE=./logs/app.log

# Background Jobs
REDIS_URL=redis://redis.example.com:6379/0
CELERY_BROKER_URL=redis://redis.example.com:6379/0

# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090

Need help? Check the Troubleshooting Guide or create an issue on GitHub.