API v1.0

Developer Documentation

Production-ready REST API for AI model inference with enterprise authentication, observability, and global edge infrastructure.

REST APIWebhooksStreamingSDKs

Quickstart

Set up authentication, deploy a model, and hit your first endpoint in under five minutes.

API Reference

Complete endpoint documentation with request/response schemas and examples.

Authentication

Manage scoped API keys, rotate credentials, and audit usage across teams.

Webhooks

Subscribe to inference lifecycle events and monitor production traffic in real-time.

Quickstart Guide

Get your first AI inference running in 3 simple steps

1

Deploy a Model

Choose a model from the gallery and deploy it. You'll receive a deployment ID and endpoint URL.

Endpoint:
https://api.efgwatch.com/v1/inference/dep_abc123
2

Get Your API Key

Generate an API key from your dashboard to authenticate your requests.

API Key:
EFG_sk_live_xxxxxxxxxxxxxxxx
3

Make Your First Request

Send a POST request to your endpoint with your input data.

curl -X POST https://api.efgwatch.com/v1/inference/dep_abc123 \
  -H "Authorization: Bearer EFG_xxx" \
  -d '{"input":{"prompt":"Hello EFG!"}}'

Example Response

{
  "id": "req_xyz789",
  "deployment_id": "dep_abc123",
  "status": "completed",
  "output": {
    "text": "Hello! I'm an AI assistant powered by EFG. How can I help you today?"
  },
  "metrics": {
    "latency_ms": 245,
    "tokens_generated": 18,
    "tokens_per_second": 73.5
  },
  "created_at": "2025-01-20T10:30:00Z",
  "completed_at": "2025-01-20T10:30:00.245Z"
}

API Reference

Complete endpoint documentation for the EFGWatch API

Authentication

EFGWatch uses API keys to authenticate requests. Include your API key in the Authorization header.

Authorization: Bearer EFG_sk_live_xxxxxxxxxxxxxxxx

Keep your API keys secure

Never expose API keys in client-side code or public repositories. Use environment variables.

Core Endpoints

POST/v1/inference/:deployment_id

Run inference on a deployed model with input data.

{
  "input": {
    "prompt": "Explain quantum computing in simple terms"
  },
  "parameters": {
    "max_tokens": 100,
    "temperature": 0.7,
    "top_p": 0.9
  }
}
GET/v1/deployments

List all your active model deployments.

{
  "deployments": [
    {
      "id": "dep_abc123",
      "model": "llama3-8b",
      "gpu_tier": "A100",
      "status": "running",
      "endpoint": "https://api.efgwatch.com/v1/inference/dep_abc123",
      "created_at": "2025-01-15T10:00:00Z"
    }
  ],
  "total": 1
}
POST/v1/deployments

Create a new model deployment.

{
  "model_id": "llama3-8b",
  "gpu_tier": "A100",
  "replicas": 1,
  "autoscaling": {
    "enabled": true,
    "min_replicas": 1,
    "max_replicas": 5
  }
}
GET/v1/usage

Retrieve usage statistics and billing information.

{
  "period": "2025-01",
  "total_requests": 15420,
  "total_tokens": 3847500,
  "gpu_hours": 127.5,
  "cost_usd": 89.25,
  "by_deployment": [
    {
      "deployment_id": "dep_abc123",
      "requests": 15420,
      "tokens": 3847500,
      "gpu_hours": 127.5
    }
  ]
}

Streaming Responses

Stream responses token-by-token for real-time user experiences. Set stream: true in your request.

curl -N https://api.efgwatch.com/v1/inference/dep_abc123 \
  -H "Authorization: Bearer EFG_sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {"prompt": "Write a haiku about AI"},
    "stream": true
  }'

Webhooks

Subscribe to events like deployment state changes, inference completion, and billing alerts.

Available Events:

  • deployment.created

    New deployment initiated

  • deployment.ready

    Deployment ready for inference

  • deployment.failed

    Deployment failed to start

  • inference.completed

    Inference request completed

  • inference.failed

    Inference request failed

  • usage.threshold

    Usage threshold exceeded

Example Webhook Payload:

{
  "event": "inference.completed",
  "timestamp": "2025-01-20T10:30:00Z",
  "data": {
    "inference_id": "req_xyz789",
    "deployment_id": "dep_abc123",
    "status": "completed",
    "metrics": {
      "latency_ms": 245,
      "tokens_generated": 18
    }
  }
}

Error Handling

EFGWatch uses conventional HTTP response codes and provides detailed error messages.

200OK - Request successful
400Bad Request - Invalid parameters
401Unauthorized - Invalid API key
404Not Found - Resource not found
429Too Many Requests - Rate limit exceeded
500Internal Server Error

Error Response Format:

{
  "error": {
    "code": "invalid_api_key",
    "message": "The API key provided is invalid or has been revoked",
    "type": "authentication_error",
    "request_id": "req_xyz789"
  }
}

Rate Limits

Rate limits are tier-based and returned in response headers.

Free

100

requests per hour

Pro

1,000

requests per hour

Enterprise

Custom

requests per custom

Rate Limit Headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 987
X-RateLimit-Reset: 1642694400

Official SDKs

Production-ready SDKs with built-in retries, streaming support, and comprehensive type safety.

JavaScript/TypeScript
npm install @efgwatch/sdk
Python
pip install efgwatch
Go
go get github.com/efgwatch/go-sdk
Kotlin
implementation "com.efgwatch:sdk:1.0.0"

Security & Compliance

All endpoints enforce TLS 1.3+ encryption

Support for signed requests and mutual TLS

Complete audit trails and access logs

SOC2 Type II certification (in progress)

GDPR and HIPAA compliance support

Need help with security reviews? Contact our security team

Performance & Monitoring

Real-time metrics and performance monitoring for all deployments.

P50 Latency

<200ms

Median response time

P99 Latency

<500ms

99th percentile

Uptime

99.9%

SLA guarantee

Ready to start building?

Get $10 in free credits and deploy your first AI model in minutes.

EFGWatch - Run AI Models Instantly | Efficient • Fast • GPU