API v1.0

Developer Documentation

Production-ready REST API for AI model inference with enterprise authentication, observability, and global edge infrastructure.

REST API•Webhooks•Streaming•SDKs

Quickstart

Set up authentication, deploy a model, and hit your first endpoint in under five minutes.

API Reference

Complete endpoint documentation with request/response schemas and examples.

Authentication

Manage scoped API keys, rotate credentials, and audit usage across teams.

Webhooks

Subscribe to inference lifecycle events and monitor production traffic in real-time.

Quickstart Guide

Get your first AI inference running in 3 simple steps

Deploy a Model

Choose a model from the gallery and deploy it. You'll receive a deployment ID and endpoint URL.

Endpoint:

https://api.efgwatch.com/v1/inference/dep_abc123

Get Your API Key

Generate an API key from your dashboard to authenticate your requests.

API Key:

EFG_sk_live_xxxxxxxxxxxxxxxx

Make Your First Request

Send a POST request to your endpoint with your input data.

curl -X POST https://api.efgwatch.com/v1/inference/dep_abc123 \
  -H "Authorization: Bearer EFG_xxx" \
  -d '{"input":{"prompt":"Hello EFG!"}}'

Example Response

{
  "id": "req_xyz789",
  "deployment_id": "dep_abc123",
  "status": "completed",
  "output": {
    "text": "Hello! I'm an AI assistant powered by EFG. How can I help you today?"
  },
  "metrics": {
    "latency_ms": 245,
    "tokens_generated": 18,
    "tokens_per_second": 73.5
  },
  "created_at": "2025-01-20T10:30:00Z",
  "completed_at": "2025-01-20T10:30:00.245Z"
}

API Reference

Complete endpoint documentation for the EFGWatch API

Authentication

EFGWatch uses API keys to authenticate requests. Include your API key in the Authorization header.

Authorization: Bearer EFG_sk_live_xxxxxxxxxxxxxxxx

Keep your API keys secure

Never expose API keys in client-side code or public repositories. Use environment variables.

Core Endpoints

POST/v1/inference/:deployment_id

Run inference on a deployed model with input data.

{
  "input": {
    "prompt": "Explain quantum computing in simple terms"
  },
  "parameters": {
    "max_tokens": 100,
    "temperature": 0.7,
    "top_p": 0.9
  }
}

GET/v1/deployments

List all your active model deployments.

{
  "deployments": [
    {
      "id": "dep_abc123",
      "model": "llama3-8b",
      "gpu_tier": "A100",
      "status": "running",
      "endpoint": "https://api.efgwatch.com/v1/inference/dep_abc123",
      "created_at": "2025-01-15T10:00:00Z"
    }
  ],
  "total": 1
}

POST/v1/deployments

Create a new model deployment.

{
  "model_id": "llama3-8b",
  "gpu_tier": "A100",
  "replicas": 1,
  "autoscaling": {
    "enabled": true,
    "min_replicas": 1,
    "max_replicas": 5
  }
}

GET/v1/usage

Retrieve usage statistics and billing information.

{
  "period": "2025-01",
  "total_requests": 15420,
  "total_tokens": 3847500,
  "gpu_hours": 127.5,
  "cost_usd": 89.25,
  "by_deployment": [
    {
      "deployment_id": "dep_abc123",
      "requests": 15420,
      "tokens": 3847500,
      "gpu_hours": 127.5
    }
  ]
}

Streaming Responses

Stream responses token-by-token for real-time user experiences. Set stream: true in your request.

curl -N https://api.efgwatch.com/v1/inference/dep_abc123 \
  -H "Authorization: Bearer EFG_sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {"prompt": "Write a haiku about AI"},
    "stream": true
  }'

Webhooks

Subscribe to events like deployment state changes, inference completion, and billing alerts.

Available Events:

deployment.created
New deployment initiated
deployment.ready
Deployment ready for inference
deployment.failed
Deployment failed to start
inference.completed
Inference request completed
inference.failed
Inference request failed
usage.threshold
Usage threshold exceeded

Example Webhook Payload:

{
  "event": "inference.completed",
  "timestamp": "2025-01-20T10:30:00Z",
  "data": {
    "inference_id": "req_xyz789",
    "deployment_id": "dep_abc123",
    "status": "completed",
    "metrics": {
      "latency_ms": 245,
      "tokens_generated": 18
    }
  }
}

Error Handling

EFGWatch uses conventional HTTP response codes and provides detailed error messages.

200OK - Request successful

400Bad Request - Invalid parameters

401Unauthorized - Invalid API key

404Not Found - Resource not found

429Too Many Requests - Rate limit exceeded

500Internal Server Error

Error Response Format:

{
  "error": {
    "code": "invalid_api_key",
    "message": "The API key provided is invalid or has been revoked",
    "type": "authentication_error",
    "request_id": "req_xyz789"
  }
}

Rate Limits

Rate limits are tier-based and returned in response headers.

Free

100

requests per hour

Pro

1,000

requests per hour

Enterprise

Custom

requests per custom

Rate Limit Headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 987
X-RateLimit-Reset: 1642694400

Official SDKs

Production-ready SDKs with built-in retries, streaming support, and comprehensive type safety.

JavaScript/TypeScript

npm install @efgwatch/sdk

Python

pip install efgwatch

go get github.com/efgwatch/go-sdk

Kotlin

implementation "com.efgwatch:sdk:1.0.0"

Security & Compliance

All endpoints enforce TLS 1.3+ encryption

Support for signed requests and mutual TLS

Complete audit trails and access logs

SOC2 Type II certification (in progress)

GDPR and HIPAA compliance support

Need help with security reviews? Contact our security team

Performance & Monitoring

Real-time metrics and performance monitoring for all deployments.

P50 Latency

<200ms

Median response time

P99 Latency

<500ms

99th percentile

Uptime

99.9%

SLA guarantee

Ready to start building?

Get $10 in free credits and deploy your first AI model in minutes.

Get Started Free Talk to Solutions Engineer Browse Models