Quick Start Tutorial

Get up and running with Synthetic Data Studio in 5 minutes! This tutorial will guide you through generating your first synthetic dataset.

What You'll Learn

By the end of this tutorial, you'll know how to:

Start the Synthetic Data Studio server
Upload a sample dataset
Generate synthetic data using CTGAN
Evaluate the quality of your synthetic data
Download the results

Step 1: Start the Server

First, make sure you have completed the Installation Guide.

# Navigate to the backend directory
cd synthetic-data-studio/backend

# Activate virtual environment
# Windows:
.venv\Scripts\activate
# Linux/macOS:
source .venv/bin/activate

# Start the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

You should see output like:

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Step 2: Access the API

Open your browser and visit: http://localhost:8000/docs

You'll see the FastAPI interactive documentation. This is your playground for testing the API!

Step 3: Upload a Dataset

Let's use the sample customer data that comes with the project.

Note: All API requests require authentication. First, register/login to get your JWT token.

Option A: Use the Swagger UI

In the API docs, find the POST /datasets/upload endpoint
Click "Try it out"
Upload your own CSV file (must have headers)

Option B: Use curl

# Upload your dataset
curl -X POST "http://localhost:8000/datasets/upload" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your-dataset.csv"

Expected Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "sample_data.csv",
  "row_count": 1000,
  "column_count": 8,
  "file_size": 45632,
  "upload_timestamp": "2025-11-27T10:30:00Z"
}

Copy the id from the response - you'll need it in the next steps.

Step 4: Explore Your Data

Let's profile the uploaded dataset to understand its structure.

Generate a Data Profile

Find POST /datasets/{dataset_id}/profile in the API docs
Replace {dataset_id} with your dataset ID
Click "Try it out"

Expected Response:

{
  "dataset_id": "550e8400-e29b-41d4-a716-446655440000",
  "row_count": 1000,
  "column_count": 8,
  "columns": [
    {
      "name": "customer_id",
      "type": "integer",
      "nullable": false,
      "unique_count": 1000
    },
    {
      "name": "age",
      "type": "integer",
      "nullable": false,
      "min": 18,
      "max": 80,
      "mean": 42.5
    }
  ],
  "correlations": {...}
}

Step 5: Generate Synthetic Data

Now for the exciting part - generating synthetic data!

Basic CTGAN Generation

Find POST /generators/dataset/{dataset_id}/generate in the API docs
Use your dataset ID
Set these parameters:
- generator_type: "ctgan"
- num_rows: 500 (half the size for quick demo)

Request Body:

{
  "generator_type": "ctgan",
  "num_rows": 500,
  "epochs": 10,
  "batch_size": 100
}

Expected Response:

{
  "message": "Generation started",
  "generator_id": "660e8400-e29b-41d4-a716-446655440001",
  "estimated_time": "2-3 minutes"
}

The generation runs asynchronously. Check the status:

# Check generator status
curl http://localhost:8000/generators/660e8400-e29b-41d4-a716-446655440001

Wait for "status": "completed".

Step 6: Evaluate Quality

Let's assess how good our synthetic data is.

Quick Statistical Evaluation

Find POST /evaluations/quick/{generator_id} in the API docs
Use your generator ID

Expected Response:

{
  "generator_id": "660e8400-e29b-41d4-a716-446655440001",
  "quality_level": "Good",
  "overall_score": 0.85,
  "statistical_similarity": {
    "ks_test": 0.92,
    "chi_square": 0.88,
    "wasserstein_distance": 0.15
  },
  "recommendations": [
    "Data quality looks good for most use cases",
    "Consider increasing training epochs for better similarity"
  ]
}

Step 7: Download Results

Download Synthetic Dataset

Find GET /datasets/{dataset_id}/download in the API docs
Use the output_dataset_id from your generator (check the generator details)

# Download the synthetic data
curl -O http://localhost:8000/datasets/{output_dataset_id}/download

Congratulations!

You've successfully:

Started Synthetic Data Studio
Uploaded a real dataset
Generated synthetic data with CTGAN
Evaluated data quality
Downloaded your results

Next Steps

Try Advanced Features

Differential Privacy Generation:

{
  "generator_type": "dp-ctgan",
  "num_rows": 500,
  "target_epsilon": 10.0,
  "epochs": 20
}

AI-Powered Chat:

curl -X POST http://localhost:8000/llm/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "How good is my synthetic data?",
    "evaluation_id": "your-evaluation-id"
  }'

Explore More

User Guides: Learn about all platform features
API Examples: Code examples and API usage
Tutorials: Step-by-step learning paths
Privacy Features: Differential privacy deep dive

Troubleshooting

Common Issues

Server won't start:

# Check if port 8000 is available
netstat -an | grep 8000

# Try a different port
uvicorn app.main:app --reload --host 0.0.0.0 --port 8001

Upload fails:

Check file size (max 100MB by default)
Ensure CSV format with headers
Verify file path is correct

Generation takes too long:

Reduce epochs to 5-10 for testing
Use smaller batch_size
Try TVAE instead of CTGAN (faster)

Evaluation fails:

Ensure generator status is "completed"
Check that synthetic data was generated
Verify dataset IDs are correct

Get Help

API Docs: http://localhost:8000/docs (comprehensive endpoint reference)
Issues: GitHub Issues
Discussions: GitHub Discussions

Ready for more? Try the Basic Synthesis Tutorial for a deeper dive!

What You'll Learn​

Step 1: Start the Server​

Step 2: Access the API​

Step 3: Upload a Dataset​

Option A: Use the Swagger UI​

Option B: Use curl​

Step 4: Explore Your Data​

Generate a Data Profile​

Step 5: Generate Synthetic Data​

Basic CTGAN Generation​

Step 6: Evaluate Quality​

Quick Statistical Evaluation​

Step 7: Download Results​

Download Synthetic Dataset​

Congratulations!​

Next Steps​

Try Advanced Features​

Explore More​

Troubleshooting​

Common Issues​

Get Help​

What You'll Learn

Step 1: Start the Server

Step 2: Access the API

Step 3: Upload a Dataset

Option A: Use the Swagger UI

Option B: Use curl

Step 4: Explore Your Data

Generate a Data Profile

Step 5: Generate Synthetic Data

Basic CTGAN Generation

Step 6: Evaluate Quality

Quick Statistical Evaluation

Step 7: Download Results

Download Synthetic Dataset

Congratulations!

Next Steps

Try Advanced Features

Explore More

Troubleshooting

Common Issues

Get Help