Developer Documentation
Production-ready REST API for AI model inference with enterprise authentication, observability, and global edge infrastructure.
Quickstart
Set up authentication, deploy a model, and hit your first endpoint in under five minutes.
API Reference
Complete endpoint documentation with request/response schemas and examples.
Authentication
Manage scoped API keys, rotate credentials, and audit usage across teams.
Webhooks
Subscribe to inference lifecycle events and monitor production traffic in real-time.
Quickstart Guide
Get your first AI inference running in 3 simple steps
Deploy a Model
Choose a model from the gallery and deploy it. You'll receive a deployment ID and endpoint URL.
Get Your API Key
Generate an API key from your dashboard to authenticate your requests.
Make Your First Request
Send a POST request to your endpoint with your input data.
curl -X POST https://api.efgwatch.com/v1/inference/dep_abc123 \
-H "Authorization: Bearer EFG_xxx" \
-d '{"input":{"prompt":"Hello EFG!"}}'Example Response
{
"id": "req_xyz789",
"deployment_id": "dep_abc123",
"status": "completed",
"output": {
"text": "Hello! I'm an AI assistant powered by EFG. How can I help you today?"
},
"metrics": {
"latency_ms": 245,
"tokens_generated": 18,
"tokens_per_second": 73.5
},
"created_at": "2025-01-20T10:30:00Z",
"completed_at": "2025-01-20T10:30:00.245Z"
}API Reference
Complete endpoint documentation for the EFGWatch API
Authentication
EFGWatch uses API keys to authenticate requests. Include your API key in the Authorization header.
Authorization: Bearer EFG_sk_live_xxxxxxxxxxxxxxxxKeep your API keys secure
Never expose API keys in client-side code or public repositories. Use environment variables.
Core Endpoints
/v1/inference/:deployment_idRun inference on a deployed model with input data.
{
"input": {
"prompt": "Explain quantum computing in simple terms"
},
"parameters": {
"max_tokens": 100,
"temperature": 0.7,
"top_p": 0.9
}
}/v1/deploymentsList all your active model deployments.
{
"deployments": [
{
"id": "dep_abc123",
"model": "llama3-8b",
"gpu_tier": "A100",
"status": "running",
"endpoint": "https://api.efgwatch.com/v1/inference/dep_abc123",
"created_at": "2025-01-15T10:00:00Z"
}
],
"total": 1
}/v1/deploymentsCreate a new model deployment.
{
"model_id": "llama3-8b",
"gpu_tier": "A100",
"replicas": 1,
"autoscaling": {
"enabled": true,
"min_replicas": 1,
"max_replicas": 5
}
}/v1/usageRetrieve usage statistics and billing information.
{
"period": "2025-01",
"total_requests": 15420,
"total_tokens": 3847500,
"gpu_hours": 127.5,
"cost_usd": 89.25,
"by_deployment": [
{
"deployment_id": "dep_abc123",
"requests": 15420,
"tokens": 3847500,
"gpu_hours": 127.5
}
]
}Streaming Responses
Stream responses token-by-token for real-time user experiences. Set stream: true in your request.
curl -N https://api.efgwatch.com/v1/inference/dep_abc123 \
-H "Authorization: Bearer EFG_sk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"input": {"prompt": "Write a haiku about AI"},
"stream": true
}'Webhooks
Subscribe to events like deployment state changes, inference completion, and billing alerts.
Available Events:
deployment.createdNew deployment initiated
deployment.readyDeployment ready for inference
deployment.failedDeployment failed to start
inference.completedInference request completed
inference.failedInference request failed
usage.thresholdUsage threshold exceeded
Example Webhook Payload:
{
"event": "inference.completed",
"timestamp": "2025-01-20T10:30:00Z",
"data": {
"inference_id": "req_xyz789",
"deployment_id": "dep_abc123",
"status": "completed",
"metrics": {
"latency_ms": 245,
"tokens_generated": 18
}
}
}Error Handling
EFGWatch uses conventional HTTP response codes and provides detailed error messages.
Error Response Format:
{
"error": {
"code": "invalid_api_key",
"message": "The API key provided is invalid or has been revoked",
"type": "authentication_error",
"request_id": "req_xyz789"
}
}Rate Limits
Rate limits are tier-based and returned in response headers.
Free
100
requests per hour
Pro
1,000
requests per hour
Enterprise
Custom
requests per custom
Rate Limit Headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 987
X-RateLimit-Reset: 1642694400Official SDKs
Production-ready SDKs with built-in retries, streaming support, and comprehensive type safety.
npm install @efgwatch/sdkpip install efgwatchgo get github.com/efgwatch/go-sdkimplementation "com.efgwatch:sdk:1.0.0"Security & Compliance
All endpoints enforce TLS 1.3+ encryption
Support for signed requests and mutual TLS
Complete audit trails and access logs
SOC2 Type II certification (in progress)
GDPR and HIPAA compliance support
Need help with security reviews? Contact our security team
Performance & Monitoring
Real-time metrics and performance monitoring for all deployments.
P50 Latency
<200ms
Median response time
P99 Latency
<500ms
99th percentile
Uptime
99.9%
SLA guarantee
Ready to start building?
Get $10 in free credits and deploy your first AI model in minutes.