Event Distribution Guide

The Event Distribution system enables automatic publishing of incident timeline events to external systems via HTTP webhooks. Events are delivered in CloudEvents v1.0 format with optional HMAC signature verification for security.

Overview

When incidents are created, updated, or resolved, the platform can automatically notify external systems in real-time. This enables integrations with:

  • Custom monitoring dashboards
  • Slack/Teams bots
  • ITSM platforms (ServiceNow, Jira)
  • Analytics and reporting systems
  • Security information and event management (SIEM) tools

Key Features

  • CloudEvents v1.0 Format: Industry-standard event format for interoperability
  • HMAC Signature Verification: Secure webhooks with SHA-256 signatures
  • Event Type Filtering: Subscribe only to relevant event types using glob patterns
  • Automatic Retry: Failed deliveries retry with exponential backoff
  • Health Monitoring: Track delivery success rates per subscriber

Getting Started

Prerequisites

  • Incidents platform installed and running
  • Event publishing enabled (ENABLE_EVENT_PUBLISHING=true)
  • CLI access for subscriber management

Enable Event Publishing

Event publishing is opt-in. Enable it by setting the environment variable:

# Enable event publishing
export ENABLE_EVENT_PUBLISHING=true

# Start the server
im serve

When enabled, the platform:

  1. Creates outbox entries when timeline events are written
  2. Runs a background worker to deliver events to subscribers
  3. Tracks delivery status and handles retries automatically

Register Your First Subscriber

Use the CLI to register a webhook endpoint:

# Register a subscriber
im subscriber add \
  --name my-webhook \
  --url https://example.com/webhook \
  --types "im.incident.*" \
  --secret "your-webhook-secret"

# Verify registration
im subscriber list

Output:

NAME         URL                          TYPES           ENABLED
my-webhook   https://example.com/webhook  im.incident.*   true

Managing Subscribers

Add a Subscriber

im subscriber add \
  --name <name> \
  --url <webhook-url> \
  --types <event-patterns> \
  --secret <hmac-secret>

Parameters:

Flag Required Description
--name Yes Unique identifier for the subscriber
--url Yes HTTPS webhook URL (HTTP allowed for localhost)
--types No Event type patterns to receive (default: all events)
--secret No Secret key for HMAC signature generation

Examples:

# Receive all incident lifecycle events
im subscriber add --name ops-team --url https://ops.example.com/webhook --types "im.incident.*"

# Receive only declaration and resolution events
im subscriber add --name alerts --url https://alerts.example.com/hook --types "im.incident.declared.v1,im.incident.resolved.v1"

# Receive all events (no filter)
im subscriber add --name audit-log --url https://audit.example.com/events

List Subscribers

# Table format (default)
im subscriber list

# JSON format
im subscriber list --format json

Update a Subscriber

# Update the webhook URL
im subscriber update my-webhook --url https://new-endpoint.example.com/webhook

# Update event type filters
im subscriber update my-webhook --types "im.incident.declared.v1,im.incident.resolved.v1"

# Update the HMAC secret
im subscriber update my-webhook --secret "new-secret-key"

# Disable a subscriber temporarily
im subscriber update my-webhook --enabled=false

Remove a Subscriber

im subscriber remove my-webhook

Event Type Filtering

Subscribers can filter which events they receive using glob patterns. This reduces unnecessary webhook traffic and simplifies event handling.

Pattern Syntax

Pattern Matches Example
* Single segment im.incident.* matches im.incident.declared but not im.incident.declared.v1
** Multiple segments im.** matches all im. events at any depth
Exact Specific type im.incident.declared.v1 matches only that exact type

Common Patterns

# All incident lifecycle events
--types "im.incident.**"

# Only high-severity events (declared, resolved)
--types "im.incident.declared.*,im.incident.resolved.*"

# All events from a specific source
--types "im.**"

# Multiple specific types
--types "im.incident.declared.v1,im.incident.acknowledged.v1,im.incident.resolved.v1"

Event Types Reference

The platform emits these event types:

Event Type Description
im.incident.declared.v1 New incident created
im.incident.acknowledged.v1 Incident acknowledged by responder
im.incident.mitigated.v1 Incident impact mitigated
im.incident.resolved.v1 Incident resolved
im.incident.closed.v1 Incident formally closed
im.incident.note_added.v1 Note added to incident timeline

CloudEvents Payload

Events are delivered as HTTP POST requests in CloudEvents v1.0 structured format.

HTTP Request

POST /your-webhook-endpoint HTTP/1.1
Host: example.com
Content-Type: application/cloudevents+json
User-Agent: incidents-platform/1.0
Ce-Id: evt_550e8400-e29b-41d4-a716-446655440000
Ce-Source: /incidents/INC-12345
Ce-Specversion: 1.0
Ce-Type: im.incident.declared.v1
Ce-Time: 2025-12-26T14:30:00Z
X-CloudEvents-Signature: sha256=5d4...abc
Content-Length: 456

{
  "specversion": "1.0",
  "id": "evt_550e8400-e29b-41d4-a716-446655440000",
  "source": "/incidents/INC-12345",
  "type": "im.incident.declared.v1",
  "time": "2025-12-26T14:30:00Z",
  "datacontenttype": "application/json",
  "data": {
    "incident_id": "INC-12345",
    "title": "Production API latency spike",
    "severity": "SEV-2",
    "actor": "oncall@example.com"
  },
  "imincidentid": "INC-12345"
}

Response Expectations

Your webhook should respond within 10 seconds:

Response Result
2xx Success - event marked as delivered
4xx (except 408, 429) Permanent failure - no retry
5xx, 408, 429 Transient failure - will retry
Timeout (>10s) Transient failure - will retry

HMAC Signature Verification

When a subscriber has a configured secret, the platform signs each webhook request using HMAC-SHA256.

Signature Header

X-CloudEvents-Signature: sha256=5d41402abc4b2a76b9719d911017c592

Verification Process

  1. Extract the signature from the X-CloudEvents-Signature header
  2. Compute HMAC-SHA256 of the raw request body using your secret
  3. Compare signatures using constant-time comparison

Example Verification Code

Go:

import (
    "crypto/hmac"
    "crypto/sha256"
    "encoding/hex"
    "io"
    "net/http"
    "strings"
)

func verifySignature(r *http.Request, secret string) bool {
    // Read the request body
    body, err := io.ReadAll(r.Body)
    if err != nil {
        return false
    }

    // Get the signature header
    sigHeader := r.Header.Get("X-CloudEvents-Signature")
    if !strings.HasPrefix(sigHeader, "sha256=") {
        return false
    }
    expectedSig := strings.TrimPrefix(sigHeader, "sha256=")

    // Compute HMAC-SHA256
    mac := hmac.New(sha256.New, []byte(secret))
    mac.Write(body)
    actualSig := hex.EncodeToString(mac.Sum(nil))

    // Constant-time comparison
    return hmac.Equal([]byte(expectedSig), []byte(actualSig))
}

Python:

import hmac
import hashlib

def verify_signature(body: bytes, secret: str, signature_header: str) -> bool:
    if not signature_header.startswith("sha256="):
        return False

    expected_sig = signature_header[7:]  # Remove "sha256=" prefix

    actual_sig = hmac.new(
        secret.encode('utf-8'),
        body,
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(expected_sig, actual_sig)

Node.js:

const crypto = require("crypto");

function verifySignature(body, secret, signatureHeader) {
  if (!signatureHeader.startsWith("sha256=")) {
    return false;
  }

  const expectedSig = signatureHeader.slice(7);
  const actualSig = crypto.createHmac("sha256", secret).update(body).digest("hex");

  return crypto.timingSafeEqual(Buffer.from(expectedSig), Buffer.from(actualSig));
}

Retry Logic and Failure Handling

The platform automatically retries failed deliveries using exponential backoff.

Retry Schedule

Attempt Delay Cumulative Time
1 1 second 1s
2 2 seconds 3s
3 4 seconds 7s
4 8 seconds 15s
5 16 seconds 31s

After 5 failed attempts, the event is marked as permanently failed.

What Gets Retried

Failure Type Retried?
HTTP 5xx errors Yes
HTTP 408 (Request Timeout) Yes
HTTP 429 (Too Many Requests) Yes
Connection timeout Yes
DNS resolution failure Yes
HTTP 4xx errors (except 408, 429) No
Invalid response No

Monitoring Retries

Failed deliveries are logged at WARNING level:

level=WARN msg="Webhook delivery failed, scheduling retry"
  subscriber_id=my-webhook
  event_id=evt_123
  attempt=2
  next_retry_at=2025-12-26T14:30:05Z
  error="HTTP 503 Service Unavailable"

Permanent failures are logged at ERROR level:

level=ERROR msg="Webhook delivery permanently failed"
  subscriber_id=my-webhook
  event_id=evt_123
  attempts=5
  error="Max retries exceeded: HTTP 503"

Health Monitoring

Monitor subscriber health to identify problematic integrations.

View Health Statistics

# View health for all subscribers
im subscriber health

# JSON output for programmatic access
im subscriber health --format json

Example Output:

SUBSCRIBER    DELIVERED  FAILED  SUCCESS_RATE  STATUS   LAST_DELIVERED
my-webhook    156        4       97.5%         healthy  2m ago
audit-log     892        0       100%          healthy  30s ago
flaky-system  45         23      66.2%         healthy  5m ago
broken-hook   0          50      0%            unhealthy 1h ago

Health Status Criteria

Status Criteria
healthy Success rate >= 50%
unhealthy Success rate < 50%

Programmatic Health Access

Use the REST API for integration with monitoring systems:

# Get all subscriber health
curl https://your-server/api/v1/subscribers/health \
  -H "Authorization: Bearer $TOKEN"

# Get specific subscriber health
curl https://your-server/api/v1/subscribers/my-webhook \
  -H "Authorization: Bearer $TOKEN"

Troubleshooting

Events Not Being Delivered

  1. Check if publishing is enabled:

    echo $ENABLE_EVENT_PUBLISHING
    # Should be "true"
  2. Verify subscriber is registered and enabled:

    im subscriber list
  3. Check event type filters match:

    • Ensure your --types pattern matches the events being generated
    • Use im.incident.** to receive all incident events
  4. Check server logs:

    im logs --component outbox-worker

Signature Verification Failures

  1. Verify you’re using the correct secret:

    • Secrets cannot be retrieved after creation
    • Update with a new secret if needed: im subscriber update <name> --secret <new-secret>
  2. Ensure you’re verifying the raw request body:

    • Don’t parse JSON before verification
    • The signature is computed on the exact bytes sent
  3. Check for encoding issues:

    • Signature is lowercase hexadecimal
    • Compare using constant-time functions to prevent timing attacks

High Failure Rate

  1. Check subscriber health:

    im subscriber health
  2. Verify webhook endpoint is accessible:

    curl -X POST https://your-webhook/endpoint \
      -H "Content-Type: application/json" \
      -d '{"test": true}'
  3. Check for rate limiting:

    • If your endpoint returns 429, the system will retry
    • Consider increasing rate limits on your webhook server
  4. Review failure reasons in logs:

    im logs --component outbox-worker --level error

Performance Issues

  1. Events delayed:

    • Check OUTBOX_POLL_INTERVAL setting (default 1s)
    • Verify database performance
  2. Worker not processing:

    • Ensure server was started with ENABLE_EVENT_PUBLISHING=true
    • Check for worker startup errors in logs

Best Practices

Security

  • Always use HTTPS for production webhook endpoints
  • Configure HMAC secrets for signature verification
  • Rotate secrets periodically (update via im subscriber update)
  • Validate signatures before processing events
  • Use dedicated service accounts for webhook endpoints

Reliability

  • Handle duplicates: Events may be retried, implement idempotent handling using event IDs
  • Respond quickly: Keep webhook processing under 10 seconds
  • Return appropriate status codes: Use 2xx for success, 5xx for transient failures
  • Log event IDs: Include ce-id in logs for debugging

Performance

  • Process asynchronously: Queue webhook events for background processing
  • Filter event types: Only subscribe to events you need
  • Monitor health: Set up alerts on subscriber health degradation

Next Steps