Event Publishing Configuration

This reference documents all configuration options for the HTTP event distribution system, which uses the transactional outbox pattern to reliably deliver incident events to webhook subscribers.

Quick Start

Enable event publishing with minimal configuration:

# Enable event publishing
export ENABLE_EVENT_PUBLISHING=true

# Start the server
im serve

Environment Variables

Core Settings

ENABLE_EVENT_PUBLISHING

Enables or disables the event distribution system.

Property Value
Type Boolean
Default false
Required No

Behavior:

  • When true: Outbox entries are created for timeline events, and the background worker delivers them to subscribers
  • When false: No outbox entries are created, no webhook deliveries occur

Example:

export ENABLE_EVENT_PUBLISHING=true

Notes:

  • Can be changed at runtime with server restart
  • Existing outbox entries are processed when re-enabled
  • No events are lost if disabled temporarily

OUTBOX_POLL_INTERVAL

How frequently the outbox worker checks for pending events to deliver.

Property Value
Type Duration
Default 1s
Required No
Minimum 100ms
Maximum 1m

Format: Go duration string (e.g., 1s, 500ms, 2s)

Example:

# Check every 500 milliseconds for lower latency
export OUTBOX_POLL_INTERVAL=500ms

# Check every 5 seconds for reduced database load
export OUTBOX_POLL_INTERVAL=5s

Trade-offs:

  • Lower interval: Faster event delivery, higher database load
  • Higher interval: Lower database load, increased delivery latency

Recommendations:

  • Production: 1s (default) balances latency and load
  • High-throughput: 500ms if database can handle increased queries
  • Low-resource: 5s if delivery latency is not critical

OUTBOX_BATCH_SIZE

Maximum number of outbox entries to process per polling cycle.

Property Value
Type Integer
Default 100
Required No
Minimum 1
Maximum 1000

Example:

# Process up to 50 entries per cycle
export OUTBOX_BATCH_SIZE=50

# Process up to 500 entries for high-throughput scenarios
export OUTBOX_BATCH_SIZE=500

Notes:

  • Larger batches improve throughput but increase memory usage
  • Entries are processed in parallel within a batch
  • Consider database connection limits when increasing

Retry Configuration

OUTBOX_MAX_RETRIES

Maximum number of delivery attempts before marking an event as permanently failed.

Property Value
Type Integer
Default 5
Required No
Minimum 1
Maximum 10

Example:

# Allow up to 10 retry attempts
export OUTBOX_MAX_RETRIES=10

# Fail fast after 3 attempts
export OUTBOX_MAX_RETRIES=3

Retry Schedule:

With default exponential backoff (1s base, 2x multiplier):

Attempt Delay Cumulative Time
1 1s 1s
2 2s 3s
3 4s 7s
4 8s 15s
5 16s 31s
6 32s 63s
7 64s ~2m
8 128s ~4m
9 256s ~8m
10 512s ~17m

OUTBOX_RETRY_BASE_INTERVAL

Initial retry delay for exponential backoff.

Property Value
Type Duration
Default 1s
Required No
Minimum 100ms
Maximum 1m

Example:

# Start retries at 2 seconds
export OUTBOX_RETRY_BASE_INTERVAL=2s

OUTBOX_RETRY_MAX_INTERVAL

Maximum delay between retry attempts (caps exponential growth).

Property Value
Type Duration
Default 16s
Required No
Minimum 1s
Maximum 1h

Example:

# Cap retry delay at 5 minutes
export OUTBOX_RETRY_MAX_INTERVAL=5m

Notes:

  • Prevents extremely long waits between retries
  • After reaching max interval, all subsequent retries use this value
  • Consider SLA requirements when setting

HTTP Delivery Settings

OUTBOX_DELIVERY_TIMEOUT

Timeout for individual webhook delivery attempts.

Property Value
Type Duration
Default 10s
Required No
Minimum 1s
Maximum 5m

Example:

# Wait up to 30 seconds for slow endpoints
export OUTBOX_DELIVERY_TIMEOUT=30s

# Fail fast after 5 seconds
export OUTBOX_DELIVERY_TIMEOUT=5s

Notes:

  • Timeouts are treated as transient failures and will retry
  • Set based on expected webhook processing time
  • Consider network latency to webhook endpoints

OUTBOX_CONCURRENT_DELIVERIES

Maximum number of concurrent webhook deliveries.

Property Value
Type Integer
Default 10
Required No
Minimum 1
Maximum 100

Example:

# Allow 20 concurrent deliveries
export OUTBOX_CONCURRENT_DELIVERIES=20

Notes:

  • Higher concurrency improves throughput for multiple subscribers
  • Consider system resources and rate limits on webhook endpoints
  • Each delivery uses one HTTP connection

Retention Settings

OUTBOX_RETENTION_DAYS

Number of days to retain delivered outbox entries before cleanup.

Property Value
Type Integer
Default 30
Required No
Minimum 1
Maximum 365

Example:

# Keep entries for 7 days
export OUTBOX_RETENTION_DAYS=7

# Keep entries for 90 days (compliance requirements)
export OUTBOX_RETENTION_DAYS=90

Notes:

  • Only affects delivered entries
  • pending and failed entries are never automatically deleted
  • Cleanup runs periodically (hourly by default)

OUTBOX_CLEANUP_INTERVAL

How frequently the retention cleanup job runs.

Property Value
Type Duration
Default 1h
Required No
Minimum 1m
Maximum 24h

Example:

# Run cleanup every 6 hours
export OUTBOX_CLEANUP_INTERVAL=6h

Logging Settings

OUTBOX_LOG_LEVEL

Log level for the outbox worker component.

Property Value
Type String
Default info
Required No
Values debug, info, warn, error

Example:

# Enable debug logging for troubleshooting
export OUTBOX_LOG_LEVEL=debug

Log Output Examples:

Info level (default):

level=INFO msg="Event delivered" subscriber=my-webhook event_id=evt_123 duration=45ms
level=INFO msg="Worker started" poll_interval=1s batch_size=100

Debug level:

level=DEBUG msg="Polling for pending entries"
level=DEBUG msg="Found pending entries" count=3
level=DEBUG msg="Delivering event" subscriber=my-webhook event_id=evt_123
level=DEBUG msg="Signature generated" algorithm=sha256

Warn level:

level=WARN msg="Delivery failed, scheduling retry" subscriber=my-webhook event_id=evt_123 attempt=2 error="HTTP 503"
level=WARN msg="Subscriber marked unhealthy" subscriber=my-webhook success_rate=45%

Error level:

level=ERROR msg="Delivery permanently failed" subscriber=my-webhook event_id=evt_123 attempts=5 error="Max retries exceeded"

Configuration Examples

Production Configuration

Recommended settings for production environments:

# Enable event publishing
export ENABLE_EVENT_PUBLISHING=true

# Standard polling and batch settings
export OUTBOX_POLL_INTERVAL=1s
export OUTBOX_BATCH_SIZE=100

# Retry configuration
export OUTBOX_MAX_RETRIES=5
export OUTBOX_RETRY_BASE_INTERVAL=1s
export OUTBOX_RETRY_MAX_INTERVAL=16s

# HTTP settings
export OUTBOX_DELIVERY_TIMEOUT=10s
export OUTBOX_CONCURRENT_DELIVERIES=10

# Retention (30 days)
export OUTBOX_RETENTION_DAYS=30
export OUTBOX_CLEANUP_INTERVAL=1h

# Logging
export OUTBOX_LOG_LEVEL=info

High-Throughput Configuration

For environments with high event volume:

export ENABLE_EVENT_PUBLISHING=true

# Faster polling, larger batches
export OUTBOX_POLL_INTERVAL=500ms
export OUTBOX_BATCH_SIZE=500

# More concurrent deliveries
export OUTBOX_CONCURRENT_DELIVERIES=50

# Shorter retention
export OUTBOX_RETENTION_DAYS=7

Low-Latency Configuration

For minimal delivery latency:

export ENABLE_EVENT_PUBLISHING=true

# Very fast polling
export OUTBOX_POLL_INTERVAL=100ms
export OUTBOX_BATCH_SIZE=50

# Quick timeout
export OUTBOX_DELIVERY_TIMEOUT=5s

Development Configuration

For local development:

export ENABLE_EVENT_PUBLISHING=true

# Debug logging
export OUTBOX_LOG_LEVEL=debug

# Faster retries for testing
export OUTBOX_RETRY_BASE_INTERVAL=100ms
export OUTBOX_RETRY_MAX_INTERVAL=1s
export OUTBOX_MAX_RETRIES=3

Docker Compose Example

version: "3.8"

services:
  incidents:
    image: incidents:latest
    ports:
      - "8080:8080"
    environment:
      # Database
      DATABASE_URL: postgres://user:pass@db:5432/incidents

      # Event Publishing
      ENABLE_EVENT_PUBLISHING: "true"
      OUTBOX_POLL_INTERVAL: "1s"
      OUTBOX_BATCH_SIZE: "100"
      OUTBOX_MAX_RETRIES: "5"
      OUTBOX_DELIVERY_TIMEOUT: "10s"
      OUTBOX_CONCURRENT_DELIVERIES: "10"
      OUTBOX_RETENTION_DAYS: "30"
      OUTBOX_LOG_LEVEL: "info"

Kubernetes ConfigMap Example

apiVersion: v1
kind: ConfigMap
metadata:
  name: incidents-config
data:
  ENABLE_EVENT_PUBLISHING: "true"
  OUTBOX_POLL_INTERVAL: "1s"
  OUTBOX_BATCH_SIZE: "100"
  OUTBOX_MAX_RETRIES: "5"
  OUTBOX_DELIVERY_TIMEOUT: "10s"
  OUTBOX_CONCURRENT_DELIVERIES: "10"
  OUTBOX_RETENTION_DAYS: "30"
  OUTBOX_LOG_LEVEL: "info"

Monitoring

Key Metrics

Monitor these aspects of the event publishing system:

Metric Description Alert Threshold
Pending entries Outbox entries waiting delivery > 1000
Delivery latency Time from event creation to delivery > 5s p95
Failure rate Percentage of failed deliveries > 10%
Worker health Worker polling successfully Any failure

Health Checks

# Check subscriber health
im subscriber health

# View worker logs
im logs --component outbox-worker

# Check pending entry count
im subscriber stats

Troubleshooting

Events Not Being Published

  1. Verify publishing is enabled:

    echo $ENABLE_EVENT_PUBLISHING
    # Should output: true
  2. Check worker is running:

    im logs --component outbox-worker
  3. Verify subscribers exist:

    im subscriber list

High Delivery Latency

  1. Check poll interval:

    echo $OUTBOX_POLL_INTERVAL
    # Decrease if too high
  2. Check batch size:

    • Too small: Not processing fast enough
    • Too large: Memory issues
  3. Check concurrent deliveries:

    • Increase for more subscribers
    • Check system resources

Retry Storm

If many events are being retried simultaneously:

  1. Increase retry intervals:

    export OUTBOX_RETRY_BASE_INTERVAL=5s
    export OUTBOX_RETRY_MAX_INTERVAL=5m
  2. Check subscriber health:

    im subscriber health
  3. Consider temporarily disabling problematic subscribers:

    im subscriber update problem-webhook --enabled=false