Configuration Guide

Learn how to configure Synthetic Data Studio for different environments and use cases.

Configuration Overview

Synthetic Data Studio uses environment variables for configuration. The main configuration file is .env in the backend directory.

Basic Configuration

Creating the Environment File

# Copy the example file
cp .env.example .env

Essential Settings

# ===========================================
# SYNTHETIC DATA STUDIO CONFIGURATION
# ===========================================

# Database Configuration
DATABASE_URL=sqlite:///./synth_studio.db

# Security Settings
SECRET_KEY=your-super-secret-key-here-change-this-in-production
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=30

# File Upload Settings
UPLOAD_DIR=./uploads
MAX_FILE_SIZE=100MB
ALLOWED_EXTENSIONS=csv,json,xlsx

# Server Settings
HOST=0.0.0.0
PORT=8000
DEBUG=true

Database Configuration

SQLite (Development/Default)

DATABASE_URL=sqlite:///./synth_studio.db

Pros: Simple, no additional setup, file-based Cons: Not suitable for production, limited concurrency

PostgreSQL (Production)

DATABASE_URL=postgresql://username:password@localhost:5432/synth_studio

Setup:

# Install PostgreSQL driver
pip install psycopg2-binary

# Create database
createdb synth_studio

# Or via psql
psql -c "CREATE DATABASE synth_studio;"

MySQL/MariaDB

DATABASE_URL=mysql://username:password@localhost:3306/synth_studio

Setup:

# Install MySQL driver
pip install pymysql

# Create database
mysql -u root -p -e "CREATE DATABASE synth_studio;"

Security Configuration

JWT Authentication

# JWT Secret Key (REQUIRED - Generate a strong random key)
SECRET_KEY=your-256-bit-secret-key-here

# Token Expiration
ACCESS_TOKEN_EXPIRE_MINUTES=30

[!NOTE] Algorithm is currently fixed to HS256.

Generating a secure secret key:

import secrets
print(secrets.token_hex(32))  # 256-bit key

CORS Settings

# CORS Origins (comma-separated)
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080

# CORS Credentials
ALLOW_CREDENTIALS=true

# CORS Methods
ALLOW_METHODS=GET,POST,PUT,DELETE,OPTIONS

# CORS Headers
ALLOW_HEADERS=*

File Storage Configuration

Local Storage (Default)

# Upload Directory
UPLOAD_DIR=./uploads

[!NOTE] File size limit (100MB) and allowed extensions (CSV, JSON) are currently enforced by the application and are not configurable via environment variables.

AWS S3 Storage

# Enable S3 Storage
USE_S3=true

# AWS Credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_DEFAULT_REGION=us-east-1

# S3 Bucket
S3_BUCKET=your-synth-studio-bucket

# Optional: Custom S3 Endpoint (for MinIO, etc.)
S3_ENDPOINT_URL=https://minio.example.com

Google Cloud Storage

# Enable GCS
USE_GCS=true

# GCS Credentials (JSON key file path)
GOOGLE_APPLICATION_CREDENTIALS=./service-account.json

# GCS Bucket
GCS_BUCKET=your-synth-studio-bucket

AI/LLM Configuration

Google Gemini (Free Tier)

# Enable Gemini
USE_GEMINI=true

# API Key
GEMINI_API_KEY=your-gemini-api-key

# Model Settings
GEMINI_MODEL=gemini-1.5-flash
GEMINI_MAX_TOKENS=2048
GEMINI_TEMPERATURE=0.7

Groq (Free Tier)

# Enable Groq
USE_GROQ=true

# API Key
GROQ_API_KEY=your-groq-api-key

# Model Settings
GROQ_MODEL=llama-3.1-70b-versatile
GROQ_MAX_TOKENS=4096
GROQ_TEMPERATURE=0.1

OpenAI (Optional)

# Enable OpenAI
USE_OPENAI=true

# API Key
OPENAI_API_KEY=your-openai-api-key

# Model Settings
OPENAI_MODEL=gpt-4
OPENAI_MAX_TOKENS=2048
OPENAI_TEMPERATURE=0.3

Synthesis Configuration

[!NOTE] Synthesis parameters (epochs, batch size, privacy budget) are configured per-job via the API. The defaults mentioned below are application-level defaults.

GPU Settings

# GPU Settings
USE_GPU=true
CUDA_VISIBLE_DEVICES=0

Evaluation Configuration

Statistical Tests

# Test Settings
KS_TEST_SIGNIFICANCE=0.05
CHI_SQUARE_SIGNIFICANCE=0.05
WASSERSTEIN_THRESHOLD=0.1

# ML Utility
ML_UTILITY_TEST_SIZE=0.2
ML_UTILITY_RANDOM_STATE=42

Privacy Tests

# Membership Inference
MI_ATTACK_TRAIN_SIZE=0.5
MI_ATTACK_TEST_SIZE=0.3

# Attribute Inference
AI_ATTACK_SAMPLE_SIZE=1000
AI_ATTACK_SIGNIFICANCE=0.05

Server Configuration

Development

# Debug Mode
DEBUG=true

# Auto Reload
RELOAD=true

# Server Settings
HOST=0.0.0.0
PORT=8000

# Logging
LOG_LEVEL=INFO
LOG_FORMAT=%(asctime)s - %(name)s - %(levelname)s - %(message)s

Production

# Production Settings
DEBUG=false
RELOAD=false

# Server
HOST=0.0.0.0
PORT=8000

# Workers (for Gunicorn)
WORKERS=4
WORKER_CLASS=uvicorn.workers.UvicornWorker

# Logging
LOG_LEVEL=WARNING

Background Jobs

Celery Configuration

# Redis Broker (for Celery)
REDIS_URL=redis://localhost:6379/0

# Celery Settings
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# Task Settings
CELERY_TASK_SERIALIZER=json
CELERY_ACCEPT_CONTENT=['json']
CELERY_RESULT_SERIALIZER=json
CELERY_TIMEZONE=UTC

Monitoring & Observability

Logging

# Log Level
LOG_LEVEL=INFO

# Log Format
LOG_FORMAT=%(asctime)s - %(name)s - %(levelname)s - %(message)s

# Log File (Optional)
LOG_FILE=./logs/synth_studio.log

# Log Rotation
LOG_MAX_BYTES=10485760  # 10MB
LOG_BACKUP_COUNT=5

Metrics (Optional)

# Prometheus Metrics
ENABLE_METRICS=true
METRICS_PORT=9090

# Health Checks
HEALTH_CHECK_INTERVAL=30

Environment-Specific Configurations

Development Environment

# .env.development
DEBUG=true
RELOAD=true
DATABASE_URL=sqlite:///./dev.db
LOG_LEVEL=DEBUG
USE_GPU=false

Testing Environment

# .env.test
DEBUG=false
DATABASE_URL=sqlite:///./test.db
TESTING=true
USE_GPU=false
LOG_LEVEL=WARNING

Production Environment

# .env.production
DEBUG=false
RELOAD=false
DATABASE_URL=postgresql://user:pass@prod-db:5432/synth_studio
USE_S3=true
LOG_LEVEL=WARNING
ENABLE_METRICS=true

Validation & Troubleshooting

Configuration Validation

The application validates configuration on startup. Common issues:

Database Connection:

ERROR: Database connection failed
SOLUTION: Check DATABASE_URL format and credentials

Missing Secret Key:

ERROR: SECRET_KEY not set
SOLUTION: Generate a secure random key

Invalid File Paths:

ERROR: UPLOAD_DIR does not exist
SOLUTION: Create directory or update path

Testing Configuration

# Test database connection
python -c "from app.database.database import engine; print('DB OK' if engine else 'DB FAIL')"

# Test configuration loading
python -c "from app.core.config import settings; print('Config OK')"

# Test API startup
uvicorn app.main:app --dry-run

Complete Example Configuration

Here's a complete production-ready configuration:

# ===========================================
# PRODUCTION CONFIGURATION
# ===========================================

# Database
DATABASE_URL=postgresql://synth_user:secure_password@db.example.com:5432/synth_studio

# Security
SECRET_KEY=256-bit-hex-key-here
ALGORITHM=HS256
ACCESS_TOKEN_EXPIRE_MINUTES=60

# File Storage
USE_S3=true
AWS_ACCESS_KEY_ID=AKIAEXAMPLE
AWS_SECRET_ACCESS_KEY=secret-key-here
AWS_DEFAULT_REGION=us-east-1
S3_BUCKET=synth-studio-production

# AI Services
USE_GEMINI=true
GEMINI_API_KEY=gemini-key-here
USE_GROQ=true
GROQ_API_KEY=groq-key-here

# Server
DEBUG=false
HOST=0.0.0.0
PORT=8000
WORKERS=4

# Logging
LOG_LEVEL=INFO
LOG_FILE=./logs/app.log

# Background Jobs
REDIS_URL=redis://redis.example.com:6379/0
CELERY_BROKER_URL=redis://redis.example.com:6379/0

# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090

Need help? Check the Troubleshooting Guide or create an issue on GitHub.

Configuration Overview​

Basic Configuration​

Creating the Environment File​

Essential Settings​

Database Configuration​

SQLite (Development/Default)​

PostgreSQL (Production)​

MySQL/MariaDB​

Security Configuration​

JWT Authentication​

CORS Settings​

File Storage Configuration​

Local Storage (Default)​

AWS S3 Storage​

Google Cloud Storage​

AI/LLM Configuration​

Google Gemini (Free Tier)​

Groq (Free Tier)​

OpenAI (Optional)​

Synthesis Configuration​

GPU Settings​

Evaluation Configuration​

Statistical Tests​

Privacy Tests​

Server Configuration​

Development​

Production​

Background Jobs​

Celery Configuration​

Monitoring & Observability​

Logging​

Metrics (Optional)​

Environment-Specific Configurations​

Development Environment​

Testing Environment​

Production Environment​

Validation & Troubleshooting​

Configuration Validation​

Testing Configuration​

Complete Example Configuration​