LLM API Testing Guide
Quick Start​
Option 1: Postman​
- Import
LLM_API_Tests.postman_collection.jsoninto Postman - Replace placeholder IDs with your actual IDs:
YOUR_EVALUATION_ID_HEREYOUR_GENERATOR_ID_HEREYOUR_DATASET_ID_HERE
- Run requests!
Option 2: Swagger UI (Recommended for Quick Testing)​
- Open http://localhost:8000/docs
- Find the endpoint you want to test
- Click "Try it out"
- Fill in parameters
- Click "Execute"
All 11 LLM Endpoints​
1. Interactive Chat​
Endpoint: POST /llm/chat
Example Request:
{
"message": "What's the overall quality of this synthetic data?",
"evaluation_id": "abc-123",
"history": []
}
Use Cases:
- Ask questions about evaluation results
- Get recommendations
- Understand quality metrics
- Explore trade-offs
2. Improvement Suggestions​
Endpoint: POST /llm/suggest-improvements/{evaluation_id}
Example:
POST /llm/suggest-improvements/abc-123
Returns:
{
"suggestions": [
"Increase training epochs from 50 to 100 for better convergence",
"Reduce batch size to 250 to improve gradient stability",
"Consider using DP-CTGAN with epsilon=8 for stronger privacy"
]
}
3. Metric Explanations​
Endpoint: GET /llm/explain-metric
Example:
GET /llm/explain-metric?metric_name=ks_statistic&metric_value=0.087
Common Metrics to Explain:
ks_statistic- Kolmogorov-Smirnov testutility_ratio- ML utility scoreepsilon- Privacy budgetwasserstein_distance- Distribution distancedcr_score- Distance to closest record
4. Natural Language Insights​
Endpoint: POST /evaluations/{evaluation_id}/explain
Example:
POST /evaluations/abc-123/explain
Returns:
{
"executive_summary": "The synthetic data shows excellent quality...",
"key_findings": [
" Statistical similarity: 93% of tests passed",
" ML utility: Models trained on synthetic data achieve 89% of original performance"
],
"recommendations": [
"Safe for production use",
"Consider differential privacy for additional protection"
],
"business_impact": "High-quality synthetic data ready for analytics and ML training"
}
5. Compare Evaluations​
Endpoint: POST /evaluations/compare
Example Request:
["eval-1", "eval-2", "eval-3"]
Returns:
{
"summary": "Evaluation 1 (CTGAN) offers best quality, Evaluation 2 (DP-CTGAN) offers best privacy",
"winner": 1,
"trade_offs": [
"CTGAN: Higher quality (0.93) but no formal privacy",
"DP-CTGAN: Strong privacy (ε=10) with slight quality trade-off (0.89)"
],
"recommendation": "Use DP-CTGAN for production, CTGAN for development/testing"
}
6. Model Card Generation​
Endpoint: POST /generators/{generator_id}/model-card
Example:
POST /generators/gen-456/model-card
Returns: Markdown-formatted model card with:
- Model details and purpose
- Intended use cases
- Performance metrics
- Privacy considerations
- Compliance mappings
- Usage guidelines
Use Cases:
- Documentation for stakeholders
- Compliance audits
- Model registry
- Team onboarding
7. Audit Narratives​
Endpoint: GET /generators/{generator_id}/audit-narrative
Example:
GET /generators/gen-456/audit-narrative
Returns:
# Audit Trail: Healthcare Data Generator
**10:00 AM** - Generator Created
A DP-CTGAN generator was initialized for healthcare data synthesis...
**10:15 AM** - Training Started
Model training began with 300 epochs and batch size 500...
**11:30 AM** - Training Completed
Training completed successfully. Privacy budget spent: ε=9.8...
8. GDPR Compliance Report​
Endpoint: POST /generators/{generator_id}/compliance-report?framework=GDPR
Example:
POST /generators/gen-456/compliance-report?framework=GDPR
Returns:
{
"framework": "GDPR",
"compliance_level": "Partial",
"controls_addressed": [
{
"control_id": "Article 25",
"control_name": "Data Protection by Design",
"how_addressed": "DP-CTGAN implements differential privacy",
"evidence": "epsilon=10.0, delta=1e-5"
}
],
"gaps": ["Explicit consent mechanism not implemented"],
"recommendations": [
"Add data retention policies",
"Implement right to erasure"
]
}
Supported Frameworks:
- GDPR
- HIPAA
- CCPA
- SOC2
9. Enhanced PII Detection​
Endpoint: POST /datasets/{dataset_id}/pii-detection-enhanced
Example:
POST /datasets/data-789/pii-detection-enhanced
Returns:
{
"analysis": {
"total_columns": 10,
"columns_with_pii": 3,
"overall_risk_level": "Medium",
"high_risk_columns": ["user_id"],
"medium_risk_columns": ["customer_code", "account_number"],
"column_analyses": {
"user_id": {
"contains_pii": true,
"pii_type": "indirect_identifier",
"confidence": 0.95,
"risk_level": "High",
"explanation": "Unique identifier that could enable re-identification",
"recommendations": ["Remove before synthesis", "Use DP with ε≤8"]
}
},
"recommendations": [
"High-risk PII detected in 1 column: user_id. Consider removing.",
"Use DP-CTGAN with epsilon ≤ 10 for privacy protection."
]
}
}
Detects:
- Direct identifiers (names, emails, SSNs)
- Indirect identifiers (user codes, IDs)
- Quasi-identifiers (combinations)
- Encoded/obfuscated PII
Testing Workflow​
Step 1: Get Your IDs​
First, get actual IDs from your system:
# Get datasets
GET /datasets/
# Get generators
GET /generators/
# Get evaluations
GET /evaluations/generator/{generator_id}
Step 2: Test in Order​
-
Start with Enhanced PII Detection
POST /datasets/{id}/pii-detection-enhanced→ Understand your data privacy risks
-
Generate Synthetic Data
POST /generators/dataset/{id}/generate→ Create synthetic data
-
Run Evaluation
POST /evaluations/run→ Assess quality
-
Get Natural Language Insights
POST /evaluations/{id}/explain→ Understand results
-
Ask Questions via Chat
POST /llm/chat→ Interactive exploration
-
Generate Documentation
POST /generators/{id}/model-card
POST /generators/{id}/compliance-report?framework=GDPR→ Compliance docs
Tips for Testing​
1. Use Real Data​
- Test with actual evaluation results for best insights
- LLM responses are context-aware
2. Try Different Questions​
Chat examples:
- "Is this data safe for production?"
- "How does the privacy level compare to industry standards?"
- "What's the trade-off between quality and privacy?"
- "Should I use CTGAN or DP-CTGAN?"
3. Test Fallbacks​
- Disconnect internet to test fallback mechanisms
- All endpoints have rule-based fallbacks
4. Check Response Times​
- Groq is ultra-fast (usually less than 1 second)
- Chat: ~500ms
- Model cards: ~2-3 seconds
- Compliance reports: ~1-2 seconds
Common Issues​
Issue: "Evaluation not found"​
Solution: Use actual evaluation ID from your database
Issue: "No LLM providers available"​
Solution: Check that GROQ_API key is set in .env
Issue: "Endpoints not showing in /docs"​
Solution: Restart the server:
python -m uvicorn app.main:app --reload
Issue: "Rate limit exceeded"​
Solution: Groq free tier: 30 req/min. Wait or upgrade.
Quick Test Script​
# 1. Get a dataset ID
curl http://localhost:8000/datasets/ | jq '.[0].id'
# 2. Run enhanced PII detection
curl -X POST http://localhost:8000/datasets/YOUR_ID/pii-detection-enhanced
# 3. Get a generator ID
curl http://localhost:8000/generators/ | jq '.[0].id'
# 4. Generate model card
curl -X POST http://localhost:8000/generators/YOUR_ID/model-card
# 5. Get an evaluation ID
curl http://localhost:8000/evaluations/generator/YOUR_GEN_ID | jq '.[0].id'
# 6. Get natural language insights
curl -X POST http://localhost:8000/evaluations/YOUR_EVAL_ID/explain
# 7. Chat about the evaluation
curl -X POST http://localhost:8000/llm/chat \
-H "Content-Type: application/json" \
-d '{"message": "What is the overall quality?", "evaluation_id": "YOUR_EVAL_ID"}'
Next Steps​
- Import Postman collection
- Replace placeholder IDs
- Test each endpoint
- Try different questions in chat
- Generate compliance docs
- Integrate into your workflow
Happy Testing!