# Cortex Integration

### Overview

[Cortex](https://cortexmetrics.io/) is a CNCF incubation project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus metrics. It is compatible with the Prometheus API and is used in production systems including Amazon Managed Service for Prometheus (AMP).

Since Cortex uses the same Alertmanager webhook format as Prometheus, the integration flow is identical: Cortex evaluates alert rules and routes firing alerts through Prometheus Alertmanager, which delivers structured webhook payloads to the platform.

This integration supports automatic alert creation on firing events and automatic resolution when Alertmanager sends a resolved notification.

### Integration Flow

1. Cortex scrapes metrics from configured targets at a defined interval.
2. Cortex evaluates alert rules continuously. When a condition is met, it sends the alert to Alertmanager.
3. Alertmanager groups the alerts and delivers a webhook POST request to the platform endpoint.
4. When the alert condition clears, Alertmanager sends a resolved notification and the platform automatically closes the alert.

The payload delivered to the platform follows the standard Prometheus Alertmanager webhook format (version 4).

### Webhook Payload Schema

```json
{
  "receiver": "string",
  "status": "firing | resolved",
  "alerts": [
    {
      "status": "firing | resolved",
      "labels": {
        "alertname": "string",
        "severity": "string"
      },
      "annotations": {
        "summary": "string",
        "description": "string"
      },
      "startsAt": "ISO8601 timestamp",
      "endsAt": "ISO8601 timestamp",
      "generatorURL": "string",
      "fingerprint": "string"
    }
  ],
  "groupLabels": {},
  "commonLabels": {
    "alertname": "string",
    "severity": "string"
  },
  "commonAnnotations": {
    "summary": "string",
    "description": "string"
  },
  "externalURL": "string",
  "version": "4",
  "groupKey": "string",
  "truncatedAlerts": 0
}
```

***

### Setup

#### Step 1 — Create an Alert Source on the Platform

1. Navigate to **Sources → Add Source**.
2. Search for **Cortex** and select it.
3. Give the source a name and click **Save**.
4. Copy **ITOC360 URL** and **Token**.

#### Step 2 — Install and Configure Alertmanager

Cortex does not deliver alerts directly to external systems. You must run a Prometheus Alertmanager instance and point Cortex at it.

Install Alertmanager using your preferred method (binary, Docker, Helm). Then configure it to forward alerts to the platform:

```yaml
# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  receiver: itoc360-webhook
  group_wait: 10s
  group_interval: 1m
  repeat_interval: 4h

receivers:
  - name: itoc360-webhook
    webhook_configs:
      - url: "https://api.itoc360.app/functions/v1/events?token=<your-source-token>"
        send_resolved: true
```

`send_resolved: true` is required for automatic alert resolution on the platform.

***

#### Step 3 — Configure Cortex Ruler

Point the Cortex ruler to your Alertmanager and define your rule files:

```yaml
# cortex-config.yaml
ruler:
  alertmanager_url: http://alertmanager:9093
  enable_api: true

ruler_storage:
  backend: local
  local:
    directory: /etc/cortex/rules
```

***

#### Step 4 — Create Alert Rules

Create rule files in the configured rules directory. Each file defines one or more alert groups.

Example: `rules/production.yaml`

```yaml
groups:
  - name: production-critical
    interval: 1m
    rules:
      - alert: HighCPUUsage
        expr: |
          100 - (avg by (instance)
          (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High CPU usage detected"
          description: "Instance {{ $labels.instance }} CPU usage is above 90% for more than 5 minutes."
```

The `severity` label in `labels` is used by the platform for priority mapping (see table below).

***

#### Step 5 — Verify the Integration

After starting Cortex and Alertmanager:

1. Open Alertmanager UI at `http://<alertmanager-host>:9093` — confirm the alert is routed to the webhook receiver.
2. Confirm the alert appears on the platform under the source you created.

***

### Sample Payload

**ALERT (firing):**

```json
{
  "receiver": "itoc360-webhook",
  "status": "firing",
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighCPUUsage",
        "instance": "server-01:9100",
        "severity": "critical"
      },
      "annotations": {
        "summary": "High CPU usage detected",
        "description": "Instance server-01:9100 CPU usage is above 90% for more than 5 minutes."
      },
      "startsAt": "2026-04-07T09:46:06.815Z",
      "endsAt": "0001-01-01T00:00:00Z",
      "generatorURL": "http://cortex:9090/graph?g0.expr=...",
      "fingerprint": "34e164e9af873ac1"
    }
  ],
  "groupLabels": {},
  "commonLabels": {
    "alertname": "HighCPUUsage",
    "severity": "critical"
  },
  "commonAnnotations": {
    "summary": "High CPU usage detected"
  },
  "externalURL": "http://alertmanager:9093",
  "version": "4",
  "groupKey": "{}:{}",
  "truncatedAlerts": 0
}
```

**RESOLVE (resolved):**

```json
{
  "receiver": "itoc360-webhook",
  "status": "resolved",
  "alerts": [
    {
      "status": "resolved",
      "labels": {
        "alertname": "HighCPUUsage",
        "severity": "critical"
      },
      "annotations": {
        "summary": "High CPU usage detected"
      },
      "startsAt": "2026-04-07T09:46:06.815Z",
      "endsAt": "2026-04-07T10:01:00.000Z",
      "fingerprint": "34e164e9af873ac1"
    }
  ],
  "status": "resolved",
  "version": "4"
}
```

***

### Field Mapping Reference

| Payload Field                       | Description                                                           |
| ----------------------------------- | --------------------------------------------------------------------- |
| `status`                            | Top-level event type: `firing` → ALERT, `resolved` → RESOLVE          |
| `alerts[0].fingerprint`             | Unique identifier per alert label set — used for fingerprint matching |
| `alerts[0].labels.alertname`        | Name of the alert rule that fired                                     |
| `alerts[0].labels.severity`         | Severity label from the rule definition — used for priority mapping   |
| `alerts[0].annotations.summary`     | Short human-readable alert title                                      |
| `alerts[0].annotations.description` | Detailed description of the alert condition                           |
| `alerts[0].startsAt`                | ISO 8601 timestamp when the alert started firing                      |
| `alerts[0].endsAt`                  | ISO 8601 timestamp when resolved (`0001-...` means still active)      |
| `commonLabels`                      | Labels shared across all alerts in this group                         |
| `commonAnnotations`                 | Annotations shared across all alerts in this group                    |
| `groupKey`                          | Alertmanager group key used for deduplication                         |

### Priority Mapping

| Cortex `severity` Label | Platform Priority |
| ----------------------- | ----------------- |
| `critical`              | CRITICAL          |
| `error`                 | HIGH              |
| `warning`               | MEDIUM            |
| `info`                  | LOW               |
| (not set)               | MEDIUM (default)  |

You control the `severity` label in your alert rule definitions. Use consistent values across your rule files for predictable priority routing.

### RESOLVE Detection

The platform automatically resolves an alert when Alertmanager sends a payload with `"status": "resolved"`. This requires `send_resolved: true` in your Alertmanager webhook configuration (set in Step 2).

The resolved event is matched to the original alert using the `fingerprint` field, which Alertmanager generates deterministically from the alert's label set. As long as the labels do not change between firing and resolution, the fingerprint will match and the alert will be closed.
