Agent Deploy
The agent watches K8s resources and reports service status every 30 seconds. It discovers services from ConfigMaps labeled with itops.io/config: "true". Install one agent per cluster.
Agent values.yaml
node:
id: "myorg/myplatform/prod/cluster1" # 4-level hierarchy path (REQUIRED)
name: "production-cluster"
itops:
url: "https://api.yourdomain.com" # ITOps Core API URL (REQUIRED)
apiKey:
value: "your-operator-api-key" # Must match ITOPS_SECURITY_OPERATOR_API_KEY
# OR use existing secret:
# existingSecret: "itops-api-key"
# existingSecretKey: "api-key"
slaGroups: # Optional: define SLA groups from agent
- name: "payment-system"
displayName: "Payment System"
tier: "critical"
targets:
uptime: 99.99
watch:
namespaces: [] # Empty = watch all namespaces
Important:
The
apiKey.value must match the ITOPS_SECURITY_OPERATOR_API_KEY value set in the ITOps platform chart. If not set, the agent gets CreateContainerConfigError.
Service Config
Each service is configured via a ConfigMap that the agent discovers. The agent watches for ConfigMaps with label <labelPrefix>/config: "true" (prefix defaults to itops.io, configurable via watch.labelPrefix in the agent Helm values) and reads the data under one of these keys: it-ops.yaml, itops.yaml, it-ops.yml, itops.yml.
Required fields:
Legacy fallback: A single-line
hierarchy block (all 5 levels) and service.name are mandatory. Without them the agent will not register the service.
service.workloadType + service.workloadName are needed for health monitoring (status, replicas).
Legacy fallback: A single-line
placement.node: "org/platform/env/cluster" is still accepted by the parser for backwards compat with older manifests, but the structured hierarchy block is preferred going forward.
SLA groups — agent block vs service field:
The agent-level
slaGroups: block (seen above) is where you define groups and their uptime targets, once per cluster. Per-service ConfigMaps then just reference an existing group via service.slaGroup: "payment-system" to add that service as a member. If two agents (or an agent plus a push webhook) reference the same group name across clusters, they all merge into the same group row — sla_groups is UNIQUE(name).
ConfigMap Template (recommended)
# templates/itops-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Chart.Name }}-itops
labels:
itops.io/config: "true" # REQUIRED - agent discovers by this label
data:
it-ops.yaml: | # REQUIRED - must be "it-ops.yaml" (with hyphen)
version: "1"
hierarchy: # REQUIRED - all 5 levels
organization: {{ .Values.itops.organization | default "myorg" }}
platform: {{ .Values.itops.platform | default "myplatform" }}
environment: {{ .Values.itops.environment | default "prod" }}
cluster: {{ .Values.itops.cluster | default "cluster1" }}
service: {{ .Chart.Name }}
service:
name: {{ .Chart.Name }} # REQUIRED - service identifier
criticality: {{ .Values.itops.criticality | default "medium" }}
slaGroup: {{ .Values.itops.slaGroup | default "" }}
workloadType: "deployment" # REQUIRED for health - deployment/statefulset/daemonset
workloadName: {{ .Chart.Name }} # REQUIRED for health - K8s workload name
operations:
backup:
expected: {{ .Values.itops.backup.expected | default false }}
maxAgeDays: {{ .Values.itops.backup.maxAgeDays | default 1 }}
Service values.yaml
# helmcharts/my-service/values.yaml
itops:
organization: "myorg"
platform: "myplatform"
environment: "prod"
cluster: "cluster1"
criticality: "critical" # critical / high / medium / low
slaGroup: "payment-system" # SLA group membership (optional)
backup:
expected: true # backup monitoring enabled
maxAgeDays: 1 # alert if older than N days
Common Mistakes
| Mistake | Result | Fix |
|---|---|---|
Label itops.io/managed: "true" | Agent ignores ConfigMap | Use itops.io/config: "true" |
Missing hierarchy block AND placement.node | Parser error: "hierarchy or placement.node is required" | Add the hierarchy block (or a placement.node fallback) |
Missing service.name | Parser error: "service.name is required" | Add name under service: block |
Missing workloadName | Service status stays UNKNOWN | Add workloadType + workloadName |
Label prefix mismatch (custom watch.labelPrefix) | Agent doesn't see the ConfigMap | ConfigMap label must use the same prefix, e.g. acme.io/config: "true" |