Event Publishing Configuration
This reference documents all configuration options for the HTTP event distribution system, which uses the transactional outbox pattern to reliably deliver incident events to webhook subscribers.
Quick Start
Enable event publishing with minimal configuration:
# Enable event publishing
export ENABLE_EVENT_PUBLISHING=true
# Start the server
im serveEnvironment Variables
Core Settings
ENABLE_EVENT_PUBLISHING
Enables or disables the event distribution system.
| Property | Value |
|---|---|
| Type | Boolean |
| Default | false |
| Required | No |
Behavior:
- When
true: Outbox entries are created for timeline events, and the background worker delivers them to subscribers - When
false: No outbox entries are created, no webhook deliveries occur
Example:
export ENABLE_EVENT_PUBLISHING=trueNotes:
- Can be changed at runtime with server restart
- Existing outbox entries are processed when re-enabled
- No events are lost if disabled temporarily
OUTBOX_POLL_INTERVAL
How frequently the outbox worker checks for pending events to deliver.
| Property | Value |
|---|---|
| Type | Duration |
| Default | 1s |
| Required | No |
| Minimum | 100ms |
| Maximum | 1m |
Format: Go duration string (e.g., 1s, 500ms, 2s)
Example:
# Check every 500 milliseconds for lower latency
export OUTBOX_POLL_INTERVAL=500ms
# Check every 5 seconds for reduced database load
export OUTBOX_POLL_INTERVAL=5sTrade-offs:
- Lower interval: Faster event delivery, higher database load
- Higher interval: Lower database load, increased delivery latency
Recommendations:
- Production:
1s(default) balances latency and load - High-throughput:
500msif database can handle increased queries - Low-resource:
5sif delivery latency is not critical
OUTBOX_BATCH_SIZE
Maximum number of outbox entries to process per polling cycle.
| Property | Value |
|---|---|
| Type | Integer |
| Default | 100 |
| Required | No |
| Minimum | 1 |
| Maximum | 1000 |
Example:
# Process up to 50 entries per cycle
export OUTBOX_BATCH_SIZE=50
# Process up to 500 entries for high-throughput scenarios
export OUTBOX_BATCH_SIZE=500Notes:
- Larger batches improve throughput but increase memory usage
- Entries are processed in parallel within a batch
- Consider database connection limits when increasing
Retry Configuration
OUTBOX_MAX_RETRIES
Maximum number of delivery attempts before marking an event as permanently failed.
| Property | Value |
|---|---|
| Type | Integer |
| Default | 5 |
| Required | No |
| Minimum | 1 |
| Maximum | 10 |
Example:
# Allow up to 10 retry attempts
export OUTBOX_MAX_RETRIES=10
# Fail fast after 3 attempts
export OUTBOX_MAX_RETRIES=3Retry Schedule:
With default exponential backoff (1s base, 2x multiplier):
| Attempt | Delay | Cumulative Time |
|---|---|---|
| 1 | 1s | 1s |
| 2 | 2s | 3s |
| 3 | 4s | 7s |
| 4 | 8s | 15s |
| 5 | 16s | 31s |
| 6 | 32s | 63s |
| 7 | 64s | ~2m |
| 8 | 128s | ~4m |
| 9 | 256s | ~8m |
| 10 | 512s | ~17m |
OUTBOX_RETRY_BASE_INTERVAL
Initial retry delay for exponential backoff.
| Property | Value |
|---|---|
| Type | Duration |
| Default | 1s |
| Required | No |
| Minimum | 100ms |
| Maximum | 1m |
Example:
# Start retries at 2 seconds
export OUTBOX_RETRY_BASE_INTERVAL=2sOUTBOX_RETRY_MAX_INTERVAL
Maximum delay between retry attempts (caps exponential growth).
| Property | Value |
|---|---|
| Type | Duration |
| Default | 16s |
| Required | No |
| Minimum | 1s |
| Maximum | 1h |
Example:
# Cap retry delay at 5 minutes
export OUTBOX_RETRY_MAX_INTERVAL=5mNotes:
- Prevents extremely long waits between retries
- After reaching max interval, all subsequent retries use this value
- Consider SLA requirements when setting
HTTP Delivery Settings
OUTBOX_DELIVERY_TIMEOUT
Timeout for individual webhook delivery attempts.
| Property | Value |
|---|---|
| Type | Duration |
| Default | 10s |
| Required | No |
| Minimum | 1s |
| Maximum | 5m |
Example:
# Wait up to 30 seconds for slow endpoints
export OUTBOX_DELIVERY_TIMEOUT=30s
# Fail fast after 5 seconds
export OUTBOX_DELIVERY_TIMEOUT=5sNotes:
- Timeouts are treated as transient failures and will retry
- Set based on expected webhook processing time
- Consider network latency to webhook endpoints
OUTBOX_CONCURRENT_DELIVERIES
Maximum number of concurrent webhook deliveries.
| Property | Value |
|---|---|
| Type | Integer |
| Default | 10 |
| Required | No |
| Minimum | 1 |
| Maximum | 100 |
Example:
# Allow 20 concurrent deliveries
export OUTBOX_CONCURRENT_DELIVERIES=20Notes:
- Higher concurrency improves throughput for multiple subscribers
- Consider system resources and rate limits on webhook endpoints
- Each delivery uses one HTTP connection
Retention Settings
OUTBOX_RETENTION_DAYS
Number of days to retain delivered outbox entries before cleanup.
| Property | Value |
|---|---|
| Type | Integer |
| Default | 30 |
| Required | No |
| Minimum | 1 |
| Maximum | 365 |
Example:
# Keep entries for 7 days
export OUTBOX_RETENTION_DAYS=7
# Keep entries for 90 days (compliance requirements)
export OUTBOX_RETENTION_DAYS=90Notes:
- Only affects
deliveredentries pendingandfailedentries are never automatically deleted- Cleanup runs periodically (hourly by default)
OUTBOX_CLEANUP_INTERVAL
How frequently the retention cleanup job runs.
| Property | Value |
|---|---|
| Type | Duration |
| Default | 1h |
| Required | No |
| Minimum | 1m |
| Maximum | 24h |
Example:
# Run cleanup every 6 hours
export OUTBOX_CLEANUP_INTERVAL=6hLogging Settings
OUTBOX_LOG_LEVEL
Log level for the outbox worker component.
| Property | Value |
|---|---|
| Type | String |
| Default | info |
| Required | No |
| Values | debug, info, warn, error |
Example:
# Enable debug logging for troubleshooting
export OUTBOX_LOG_LEVEL=debugLog Output Examples:
Info level (default):
level=INFO msg="Event delivered" subscriber=my-webhook event_id=evt_123 duration=45ms
level=INFO msg="Worker started" poll_interval=1s batch_size=100
Debug level:
level=DEBUG msg="Polling for pending entries"
level=DEBUG msg="Found pending entries" count=3
level=DEBUG msg="Delivering event" subscriber=my-webhook event_id=evt_123
level=DEBUG msg="Signature generated" algorithm=sha256
Warn level:
level=WARN msg="Delivery failed, scheduling retry" subscriber=my-webhook event_id=evt_123 attempt=2 error="HTTP 503"
level=WARN msg="Subscriber marked unhealthy" subscriber=my-webhook success_rate=45%
Error level:
level=ERROR msg="Delivery permanently failed" subscriber=my-webhook event_id=evt_123 attempts=5 error="Max retries exceeded"
Configuration Examples
Production Configuration
Recommended settings for production environments:
# Enable event publishing
export ENABLE_EVENT_PUBLISHING=true
# Standard polling and batch settings
export OUTBOX_POLL_INTERVAL=1s
export OUTBOX_BATCH_SIZE=100
# Retry configuration
export OUTBOX_MAX_RETRIES=5
export OUTBOX_RETRY_BASE_INTERVAL=1s
export OUTBOX_RETRY_MAX_INTERVAL=16s
# HTTP settings
export OUTBOX_DELIVERY_TIMEOUT=10s
export OUTBOX_CONCURRENT_DELIVERIES=10
# Retention (30 days)
export OUTBOX_RETENTION_DAYS=30
export OUTBOX_CLEANUP_INTERVAL=1h
# Logging
export OUTBOX_LOG_LEVEL=infoHigh-Throughput Configuration
For environments with high event volume:
export ENABLE_EVENT_PUBLISHING=true
# Faster polling, larger batches
export OUTBOX_POLL_INTERVAL=500ms
export OUTBOX_BATCH_SIZE=500
# More concurrent deliveries
export OUTBOX_CONCURRENT_DELIVERIES=50
# Shorter retention
export OUTBOX_RETENTION_DAYS=7Low-Latency Configuration
For minimal delivery latency:
export ENABLE_EVENT_PUBLISHING=true
# Very fast polling
export OUTBOX_POLL_INTERVAL=100ms
export OUTBOX_BATCH_SIZE=50
# Quick timeout
export OUTBOX_DELIVERY_TIMEOUT=5sDevelopment Configuration
For local development:
export ENABLE_EVENT_PUBLISHING=true
# Debug logging
export OUTBOX_LOG_LEVEL=debug
# Faster retries for testing
export OUTBOX_RETRY_BASE_INTERVAL=100ms
export OUTBOX_RETRY_MAX_INTERVAL=1s
export OUTBOX_MAX_RETRIES=3Docker Compose Example
version: "3.8"
services:
incidents:
image: incidents:latest
ports:
- "8080:8080"
environment:
# Database
DATABASE_URL: postgres://user:pass@db:5432/incidents
# Event Publishing
ENABLE_EVENT_PUBLISHING: "true"
OUTBOX_POLL_INTERVAL: "1s"
OUTBOX_BATCH_SIZE: "100"
OUTBOX_MAX_RETRIES: "5"
OUTBOX_DELIVERY_TIMEOUT: "10s"
OUTBOX_CONCURRENT_DELIVERIES: "10"
OUTBOX_RETENTION_DAYS: "30"
OUTBOX_LOG_LEVEL: "info"Kubernetes ConfigMap Example
apiVersion: v1
kind: ConfigMap
metadata:
name: incidents-config
data:
ENABLE_EVENT_PUBLISHING: "true"
OUTBOX_POLL_INTERVAL: "1s"
OUTBOX_BATCH_SIZE: "100"
OUTBOX_MAX_RETRIES: "5"
OUTBOX_DELIVERY_TIMEOUT: "10s"
OUTBOX_CONCURRENT_DELIVERIES: "10"
OUTBOX_RETENTION_DAYS: "30"
OUTBOX_LOG_LEVEL: "info"Monitoring
Key Metrics
Monitor these aspects of the event publishing system:
| Metric | Description | Alert Threshold |
|---|---|---|
| Pending entries | Outbox entries waiting delivery | > 1000 |
| Delivery latency | Time from event creation to delivery | > 5s p95 |
| Failure rate | Percentage of failed deliveries | > 10% |
| Worker health | Worker polling successfully | Any failure |
Health Checks
# Check subscriber health
im subscriber health
# View worker logs
im logs --component outbox-worker
# Check pending entry count
im subscriber statsTroubleshooting
Events Not Being Published
-
Verify publishing is enabled:
echo $ENABLE_EVENT_PUBLISHING # Should output: true -
Check worker is running:
im logs --component outbox-worker -
Verify subscribers exist:
im subscriber list
High Delivery Latency
-
Check poll interval:
echo $OUTBOX_POLL_INTERVAL # Decrease if too high -
Check batch size:
- Too small: Not processing fast enough
- Too large: Memory issues
-
Check concurrent deliveries:
- Increase for more subscribers
- Check system resources
Retry Storm
If many events are being retried simultaneously:
-
Increase retry intervals:
export OUTBOX_RETRY_BASE_INTERVAL=5s export OUTBOX_RETRY_MAX_INTERVAL=5m -
Check subscriber health:
im subscriber health -
Consider temporarily disabling problematic subscribers:
im subscriber update problem-webhook --enabled=false
Related Documentation
- Event Distribution Guide - User guide for event distribution
- Webhook Integration - Building webhook consumers
- CLI Reference - Command-line interface documentation