API Reference
GraphQL primary API + REST endpoints for agents and webhooks.
Overview
| Endpoint | Protocol | Auth | Purpose |
|---|---|---|---|
/health, /ready | GET | None | Health + readiness probes (returns version on /health) |
/graphql | POST | JWT Bearer | Frontend API (queries, mutations) |
/graphql/ws | WebSocket | JWT | Real-time subscriptions |
/api/v1/operator/* | REST | X-API-Key or Bearer | Agent communication (status, heartbeat, register, services) |
/api/v1/health/report | REST POST | X-API-Key or Bearer | External / bare-metal service health push (auto-creates service) |
/api/v1/storage/report | REST POST | X-API-Key or Bearer | Storage metrics push (auto-creates service with storage tag) |
/api/v1/backup/report | REST POST | X-API-Key or Bearer | Backup completion webhook (service / slaGroup / namespace) |
/api/v1/sla/exclusion-window/start | REST POST | X-API-Key or Bearer | Start a maintenance window (pauses SLA timer) |
/api/v1/sla/exclusion-window/stop | REST POST | X-API-Key or Bearer | Stop a maintenance window |
/api/v1/sla/report/generate | REST POST | X-API-Key or Bearer | Manually trigger the daily SLA report |
/api/v1/templates/export | GET | X-API-Key or Bearer | Export workflows / catalog / SLA definitions as YAML |
/api/v1/templates/import | POST | X-API-Key or Bearer | Import workflows / catalog / SLA definitions from YAML |
/api/v1/auth/providers | GET | None | Active auth providers (login page uses this) |
/api/v1/license/activate | REST POST | X-API-Key or Bearer | License activation (authenticated since v4.1.2) |
Authentication
Frontend (JWT)
Frontend tokens are HMAC-SHA256-signed, using ITOPS_JWT_SECRET (min 32 chars). Tokens expire by default in 1 hour; refresh via refreshToken.
# Login
POST /graphql
{
"query": "mutation { login(email: \"user@example.com\", password: \"...\") { token refreshToken user { id email } } }"
}
# Use token
POST /graphql
Authorization: Bearer <jwt-token>
Content-Type: application/json
License tokens are different: Ed25519-signed (not HMAC), issued by the ITOps license-gen CLI, validated once at backend startup. They go in ITOPS_LICENSE_KEY, not in the Authorization header.
Agent / webhook callers (API Key)
Both headers are accepted. Pick one — CronJobs and shell scripts usually prefer X-API-Key.
X-API-Key: <operator-api-key>
# or
Authorization: Bearer <operator-api-key>
GraphQL API
Key Queries
# SLA Groups with backup status
{
slaGroups {
id name displayName tier status currentUptime
services {
serviceName status replicas readyReplicas
backupStatus {
backupExpected lastBackupAt lastBackupStatus
backupMaxAgeDays backupOverdue
}
}
}
}
# Real-time snapshot trend (5-min buckets)
{
slaSnapshotTrend(serviceId: "uuid", hoursBack: 1) {
periodKey periodStart actualValue targetValue
incidentCount downtimeMinutes status
}
}
# SLA trend data (daily/monthly)
{
slaTrendData(filter: { periodType: "DAILY", monthsBack: 1 }) {
periodKey periodStart actualValue status
}
}
# Dashboard stats
{
slaDashboardStats {
totalServices servicesWithSla metCount
atRiskCount breachedCount averageUptime
}
}
Operator REST API
POST /api/v1/operator/status
Sync service statuses from agent. Called every 30 seconds.
{
"nodeId": "myorg/platform/prod/cluster1",
"operatorVersion": "1.0.0",
"services": [
{
"name": "my-api",
"status": "OPERATIONAL",
"replicas": 3,
"readyReplicas": 3,
"slaGroup": "payment-system",
"workloadType": "Deployment"
}
],
"slaGroups": [
{
"name": "payment-system",
"displayName": "Payment System",
"tier": "critical"
}
]
}
POST /api/v1/operator/heartbeat
{
"nodeId": "myorg/platform/prod/cluster1",
"version": "1.0.0",
"watchedServices": 7,
"healthyServices": 6,
"unhealthyServices": 1
}
POST /api/v1/operator/register
Register new service discovered from it-ops.yaml ConfigMap.
{
"name": "my-database",
"displayName": "My Database",
"nodeId": "myorg/platform/prod/cluster1",
"criticality": "critical",
"operations": {
"backup": {
"expected": true,
"maxAgeDays": 1
}
}
}
GET /api/v1/operator/services?nodeId=...
Returns expected services for the agent to monitor.
Backup Webhook
POST /api/v1/backup/report
Three addressing modes. For service-level reports always include nodeId — missing nodeId falls back to "unknown" and the service lands under a red badge in the UI (detection over silence).
# Service-level (include nodeId for correct placement)
{
"service": "my-database",
"nodeId": "myorg/platform/prod/cluster1",
"status": "success", // success | failed | partial
"sizeBytes": 5242880,
"message": "pg_dump completed"
}
# SLA Group-level (propagates to every member with backup.expected=true)
{
"slaGroup": "payment-system",
"status": "success"
}
# Namespace-level
{
"namespace": "production",
"status": "success"
}
# Response
{
"success": true,
"message": "backup report recorded for 3 services",
"affected": 3,
"services": ["db-1", "db-2", "cache-1"]
}
Health Push (external / bare-metal)
POST /api/v1/health/report
Push health for services that aren't running in Kubernetes (VMs, physical hardware, external SaaS). First push auto-creates the service and its hierarchy node; subsequent pushes update status only.
{
"service": "galera-node1",
"nodeId": "myorg/infra/prod/baremetal",
"status": "OPERATIONAL", // OPERATIONAL | DEGRADED | DOWN | MAINTENANCE | UNKNOWN
"message": "wsrep_cluster_size=3",
"criticality": "critical",
"slaGroup": "database-cluster",
"serviceType": "database",
"tags": ["database", "baremetal"]
}
Storage Push
POST /api/v1/storage/report
Push disk / storage usage. Auto-creates the service with the storage tag on first push so it shows up on the Storage tab immediately.
{
"service": "postgresql",
"nodeId": "myorg/platform/prod/cluster1",
"allocatedBytes": 107374182400,
"usedBytes": 53687091200,
"storageType": "pvc", // disk | pvc | s3 | rds | efs | ...
"mountPath": "/var/lib/postgresql"
}
# Response
{
"success": true,
"serviceName": "postgresql",
"freePercent": 50,
"status": "healthy" // healthy (>30% free) | warning (10-30%) | critical (<10%)
}
SLA Exclusion Windows (Maintenance)
# Start a maintenance window (pauses SLA calculation for the service)
POST /api/v1/sla/exclusion-window/start
{
"service": "postgresql",
"nodeId": "myorg/platform/prod/cluster1",
"reason": "scheduled patching",
"expectedEndAt": "2026-04-20T02:00:00Z"
}
# Stop the active window
POST /api/v1/sla/exclusion-window/stop
{
"service": "postgresql",
"nodeId": "myorg/platform/prod/cluster1"
}
License API
POST /api/v1/license/activate
{
"licenseKey": "eyJhbGciOiJFZERTQSIs..."
}
// Response
{
"success": true,
"message": "License activated",
"customer": "My Company",
"plugins": ["ticketing", "sla", "audit"]
}
Outbound HTTP safety (webhooks & workflow HTTP steps)
Every outbound HTTP call the backend makes on behalf of an admin — webhooks and workflow HTTP_REQUEST steps — goes through a shared SSRF validator. A webhook URL that hits any of the following returns a ExecFailed execution with a clear error in the webhook history UI (no socket is opened):
- Non-
http(s)schemes:file://,ldap://,gopher://,ftp:// - Loopback:
127.0.0.0/8,::1 - Private (RFC1918):
10/8,172.16/12,192.168/16 - Link-local / cloud metadata:
169.254.0.0/16(AWS IMDS, GCP metadata) - Unspecified / multicast / CGNAT
- 302/307 redirects to any of the above (the check re-runs on every hop)
For legitimate in-cluster targets, use the host allowlist env var: ITOPS_SECURITY_WEBHOOK_HOST_ALLOWLIST=host1,host2,.... In dev environments the whole block can be lifted with ITOPS_SECURITY_ALLOW_PRIVATE_WEBHOOKS=true — don't do this in production.
WebSocket Subscriptions
Connect to /graphql/ws for real-time updates via GraphQL subscriptions.
# Events available:
- ticket:created, ticket:updated, ticket:deleted
- sla:alert, sla:incident
- license:updated
- service:status_changed