GitOps Repository Structure
How to organize your ITOps configuration in a GitOps repository. Everything is declarative, version-controlled, and ArgoCD-syncable.
Recommended Repository Layout
infra-gitops-repo/
├── platform/ # ITOps platform installation
│ ├── itops-values.yaml # Core platform Helm values
│ ├── itops-agent-values.yaml # Agent Helm values (per cluster)
│ └── sla-portal-values.yaml # SLA Portal Helm values
│
├── services/ # Service definitions (ConfigMaps)
│ ├── postgresql-itops.yaml # Database service
│ ├── redis-itops.yaml # Cache service
│ ├── payment-api-itops.yaml # Application service
│ └── galera-itops.yaml # Bare metal service
│
├── monitoring/ # Push webhook CronJobs
│ ├── storage-reporter.yaml # Storage metrics push
│ ├── backup-reporter.yaml # Backup completion push
│ └── health-push.yaml # Bare metal health push
│
└── sla/ # SLA configuration
└── sla-portal-values.yaml # SLA targets (error budgets)
Why Separate Directories?
| Directory | What | Lifecycle | Who changes it |
|---|---|---|---|
platform/ | Helm values for ITOps core, agent, portal | Rarely (upgrades) | Platform team |
services/ | ConfigMaps for each monitored service | When services are added/removed | Service owners |
monitoring/ | CronJobs for storage, backup, health push | When monitoring requirements change | Operations team |
sla/ | SLA Portal targets and error budgets | When SLA contracts change | Management / compliance |
Security defaults (since v4.1.3 / chart 1.10.0)
The chart ships with defense-in-depth on by default — you don't opt in, you opt out. Specifically:
- SSRF validator on every outbound HTTP (webhooks, workflow
HTTP_REQUESTsteps) blocks loopback, RFC1918, link-local and cloud metadata targets, including on redirects. - License activation authenticated (X-API-Key or admin JWT).
- Pod
securityContext: non-root UID 1000, read-only rootfs, drop all capabilities, seccompRuntimeDefault. - Dedicated ServiceAccount with
automountServiceAccountToken: false— core pod can't talk to the K8s API. - Egress NetworkPolicy: deny everything except DNS, the bundled PostgreSQL, explicitly allowed namespaces (default:
sla-portal), and the public internet excluding private ranges. - Writable paths:
/var/log/itops(SLA reports) and/tmpare emptyDir-mounted. Anything else is read-only.
If a webhook target or workflow HTTP step needs to reach an in-cluster private service, set the specific ITOPS_SECURITY_WEBHOOK_HOST_ALLOWLIST env var AND add the namespace to networkPolicy.allowedEgressNamespaces. The Installation page has full recipes.
1. Platform Installation
Install ITOps from the Helm repo. The values file is your IaC definition.
# platform/itops-values.yaml
imagePullSecrets:
- name: ghcr-secret
ui:
apiUrl: "https://api.yourdomain.com"
wsHost: "api.yourdomain.com"
env:
ITOPS_SERVER_ENVIRONMENT: production
ITOPS_FEATURE_LOCAL_AUTH: "true"
ITOPS_FEATURE_OPERATOR_API: "true"
secretEnv:
ITOPS_DATABASE_PASSWORD: "strong-password"
ITOPS_JWT_SECRET: "random-jwt-secret"
ITOPS_SECURITY_OPERATOR_API_KEY: "random-api-key"
ITOPS_LICENSE_KEY: "eyJhbGci..."
ingress:
hosts:
- host: api.yourdomain.com
paths: [{ path: /, pathType: Prefix }]
uiIngress:
hosts:
- host: app.yourdomain.com
paths: [{ path: /, pathType: Prefix }]
# Install / upgrade
helm repo add itops https://charts.mlops.hu
helm upgrade --install itops itops/itops -n itops --create-namespace -f platform/itops-values.yaml
1b. Auth providers (GitOps, since v4.1.5 / chart 1.13.0)
Authentication providers — local login, LDAP, future SSO — are now declared in the same values file and bootstrapped into the database on every pod start. The admin UI shows them read-only: to add, edit, or remove a provider you change the Helm values and redeploy. No manual UI clicks, no config drift.
# platform/itops-auth-values.yaml
auth:
local:
enabled: true
isDefault: true
passwordPolicy:
minLength: 12
requireUppercase: true
requireDigit: true
ldap:
enabled: true
name: "corporate-ldap"
displayName: "Corporate LDAP"
host: "ldap.corp.internal"
port: 389
bindDn: "cn=service-itops,ou=ServiceAccounts,dc=corp,dc=internal"
bindPasswordSecret: # recommended — external-secrets or similar
name: my-ldap-creds
key: bindPassword
baseDn: "dc=corp,dc=internal"
userFilter: "(sAMAccountName=%s)" # AD
groupFilter: "(member=%s)"
userAttrs:
username: "sAMAccountName"
email: "mail"
displayName: "displayName"
# Optional explicit mappings. If omitted, LDAP groups whose CN matches an
# ITOps group name (e.g. cn=itops-admins) auto-join. The itops- prefix
# protects you from a generic company "admins" group accidentally
# granting platform admin.
groupMappings: []
Apply with:
helm upgrade itops itops/itops -n itops \
-f platform/itops-values.yaml \
-f platform/itops-auth-values.yaml
External secrets (Vault / ESO / SOPS)
For production, do not hand LDAP bind passwords or DB credentials to Helm directly. The chart gives you three GitOps-safe knobs:
# 1) Per-key reference. Each ITOPS_* env var can point to an existing Secret.
# Works great with external-secrets-operator / Vault CSI / SOPS-decoded
# manifests.
secretRefs:
ITOPS_DATABASE_PASSWORD:
name: itops-db-creds # existing K8s Secret
key: password
ITOPS_JWT_SECRET:
name: itops-jwt
key: secret
ITOPS_SECURITY_OPERATOR_API_KEY:
name: itops-operator-api-key
key: key
# 2) Bulk envFrom. Every key in the Secret becomes an env var of the same
# name. Minimum plumbing — perfect when ESO syncs a whole bag at once.
extraEnvFrom:
- secretRef:
name: itops-bulk-secrets # ExternalSecret → K8s Secret → here
# 3) LDAP bind password via existing Secret (no plaintext in values).
auth:
ldap:
enabled: true
host: ldap.corp.internal
baseDn: dc=corp,dc=internal
bindDn: cn=svc-itops,ou=Service,dc=corp,dc=internal
bindPasswordSecret:
name: itops-ldap-creds # existing K8s Secret
key: bindPassword
All three coexist. If the same env var is set in both secretEnv (plain) and secretRefs (reference), the reference wins.
Group semantics
ITOps ships with four admin-relevant built-in groups. The admin UI Groups page shows each one's internal UUID (copy button) — use it in groupMappings when you need explicit control.
| ITOps group | What it grants | LDAP CN to auto-match |
|---|---|---|
itops-admins | Full platform admin | cn=itops-admins |
itops-trust-admins | Trusted admin ops (PKI / HSM) | cn=itops-trust-admins |
itops-operators | Day-to-day service/ticket ops | cn=itops-operators |
itops-users | Read-only | cn=itops-users |
Auto-match is bidirectional (since 4.1.12): an LDAP user whose groups include cn=itops-admins joins the ITOps itops-admins group at login via JIT provisioning. Remove the LDAP group, and on next login the ITOps membership is revoked — LDAP is the source of truth. The revocation only touches groups considered "managed" (explicitly mapped in groupMappings, or itops-* named in CN auto-match mode); manually-added custom groups are never clobbered.
Authoritative reconciliation (since 4.1.14)
GitOps reconciliation is authoritative: the state of the Helm values is the state of the system, including deletions. When you remove an entry from values and redeploy:
| Entity | Behaviour on removal |
|---|---|
Auth provider (auth.ldap) | GitOps-managed row (config._managedByGitOps=true) is DELETED on next core startup. Admin-created rows without the flag are never touched. |
Group mapping (groupMappings[]) | Transactional replace — mapping table re-synced in full every reconcile cycle. |
SLA group (agent's slaGroups) | Memberships pruned per-node immediately. When the last node stops contributing to a group, the group row itself is DELETED on the next sync cycle. |
| Group-member link (node X services in group Y) | Pruned per-node every sync. |
| User (LDAP JIT-provisioned) | User row is NOT deleted — historical ownership of tickets / workflows / audit entries is preserved. The user simply can't log in anymore. Group memberships are revoked on the last attempted login. |
Safety: the reconciler only prunes when it successfully processed at least one entry this cycle. If the config file is missing or empty, nothing is deleted — a misconfigured deploy can never wipe login. Historical SLA snapshots are indexed by service_id, not group_id, so deleting an SLA group has no effect on past uptime numbers.
Result on the admin UI: you only ever see what is currently declared in GitOps. No ghost providers, no stale SLA groups carrying over from a previous layout.
2. Service Definitions (ConfigMaps)
Each monitored service gets a ConfigMap. The agent discovers them automatically by the itops.io/config: "true" label.
# services/postgresql-itops.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgresql-itops
namespace: production
labels:
itops.io/config: "true"
data:
it-ops.yaml: |
version: "1"
hierarchy:
organization: "myorg"
platform: "myplatform"
environment: "prod"
cluster: "cluster1"
service: "postgresql"
service:
name: "postgresql"
criticality: "critical"
slaGroup: "payment-system"
workloadType: "statefulset"
workloadName: "postgresql"
tags:
- database
- storage
metadata:
serviceType: "PostgreSQL"
usedBy:
- name: "payment-api"
displayName: "Payment API"
- name: "user-service"
displayName: "User Service"
operations:
backup:
expected: true
maxAgeDays: 1
Apply via ArgoCD (recommended for production) or kubectl apply -f services/ for one-off testing. Mixing kubectl apply with an ArgoCD-managed path causes drift and is not GitOps — pick one.
Bare-Metal Auto-Register (no ConfigMap)
Services that aren't running in Kubernetes skip the ConfigMap step entirely. The first call to /api/v1/health/report or /api/v1/storage/report auto-creates the service (source=external, hierarchy node from nodeId). Subsequent pushes update status. This is ideal for databases on VMs, external S3 buckets, RDS instances, routers, etc.
# Bare-metal service — CronJob does everything, no ConfigMap needed
curl -X POST https://api.yourdomain.com/api/v1/health/report \
-H "X-API-Key: $API_KEY" \
-d '{
"service": "galera-node1",
"nodeId": "myorg/infra/prod/baremetal",
"status": "OPERATIONAL",
"criticality": "critical",
"slaGroup": "database-cluster",
"tags": ["database", "baremetal"]
}'
3. Monitoring CronJobs (Push Webhooks)
CronJobs push storage, backup, and health data to the ITOps API. They use the same OPERATOR_API_KEY as the agent.
Storage Reporter
# monitoring/storage-reporter.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: itops-storage-reporter
namespace: itops
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: reporter
image: curlimages/curl:latest
env:
- name: API_URL
value: "http://itops-core.itops:8080"
- name: API_KEY
valueFrom:
secretKeyRef:
name: itops-secrets
key: ITOPS_SECURITY_OPERATOR_API_KEY
command: ["/bin/sh", "-c"]
args:
- |
# PostgreSQL storage
curl -s -X POST "$API_URL/api/v1/storage/report" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"service":"postgresql","nodeId":"myorg/myplatform/prod/cluster1","allocatedBytes":107374182400,"usedBytes":64424509440,"storageType":"pvc"}'
Backup Reporter
# monitoring/backup-reporter.yaml
# Add to your existing backup CronJob (pg_dump, mysqldump, etc.)
# After successful backup, report to ITOps:
curl -s -X POST "$API_URL/api/v1/backup/report" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d "{\"service\":\"postgresql\",\"nodeId\":\"myorg/myplatform/prod/cluster1\",\"status\":\"success\",\"sizeBytes\":$BACKUP_SIZE}"
Bare Metal Health Push
# monitoring/health-push.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: itops-health-push
namespace: itops
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: health
image: curlimages/curl:latest
env:
- name: API_URL
value: "http://itops-core.itops:8080"
- name: API_KEY
valueFrom:
secretKeyRef:
name: itops-secrets
key: ITOPS_SECURITY_OPERATOR_API_KEY
command: ["/bin/sh", "-c"]
args:
- |
# Galera cluster health
curl -s -X POST "$API_URL/api/v1/health/report" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"service":"galera-node1","status":"OPERATIONAL","message":"cluster_size=3","nodeId":"myorg/infra/prod/baremetal","criticality":"critical","slaGroup":"database-cluster","serviceType":"database"}'
4. SLA Portal Targets
Define SLO objectives in the SLA Portal Helm values. Error budgets are calculated automatically.
# sla/sla-portal-values.yaml
ingress:
host: sla.yourdomain.com
apiKey: "your-sla-portal-key"
slaTargets:
payment-system:
uptime: 99.99
label: "Payment System SLA"
infrastructure:
uptime: 99.9
label: "Infrastructure SLA"
helm upgrade --install sla-portal itops/sla-portal -n sla-portal --create-namespace -f sla/sla-portal-values.yaml
5. Template Export / Import (workflows, catalog, SLA defs)
Workflows, ticket catalog items and SLA definitions are created via the admin UI (not Helm) but can be round-tripped through YAML for GitOps backup. The core exposes GET /api/v1/templates/export and POST /api/v1/templates/import. Use these in CI so the admin-UI state is reproducible:
# Nightly export to Git (CronJob or CI job)
curl -sX GET https://api.yourdomain.com/api/v1/templates/export \
-H "X-API-Key: $API_KEY" \
-o ./gitops/templates/itops-config-$(date +%F).yaml
# Then git add / commit / push
# Restore after rebuild (disaster recovery or fresh env bootstrap)
curl -sX POST https://api.yourdomain.com/api/v1/templates/import \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/yaml" \
--data-binary @./gitops/templates/itops-config.yaml
This is the GitOps "escape hatch" for admin-UI state — everything the backend creates at runtime (workflow templates, catalog items, SLA definitions with custom tiers) can be exported and re-applied declaratively.
Data Flow Summary
| Data | Source | Destination | Method |
|---|---|---|---|
| Service registration | ConfigMap (Git) | Agent → Core API | Auto-discovery |
| Service health (K8s) | K8s API | Agent → Core API | Agent sync (30s) |
| Service health (bare metal) | CronJob (Git) | Health webhook → Core API | Push (configurable) |
| Storage metrics | CronJob (Git) | Storage webhook → Core API | Push (15 min) |
| Backup status | Backup script (Git) | Backup webhook → Core API | Push (after backup) |
| SLA targets | Helm values (Git) | SLA Portal env var | Helm install |
| SLA reports | Core API | SLA Portal | Daily push (07:00) |
ArgoCD Setup
Three ArgoCD applications, one for each concern:
# App 1: Platform
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: itops-platform
spec:
source:
repoURL: https://charts.mlops.hu
chart: itops
targetRevision: "1.9.14"
helm:
valueFiles: ["$values/platform/itops-values.yaml"]
# App 2: Agent (one per K8s cluster — install in each cluster's ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: itops-agent
spec:
source:
repoURL: https://charts.mlops.hu
chart: itops-agent
targetRevision: "1.2.0"
helm:
valueFiles: ["$values/platform/itops-agent-values.yaml"]
# App 3: Service ConfigMaps + Monitoring CronJobs
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: itops-services
spec:
source:
repoURL: https://github.com/myorg/infra-gitops.git
path: services/
targetRevision: main
# App 4: SLA Portal
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sla-portal
spec:
source:
repoURL: https://charts.mlops.hu
chart: sla-portal
targetRevision: "1.3.1"
helm:
valueFiles: ["$values/sla/sla-portal-values.yaml"]