ITSM Bi-Directional Sync Guide

This guide covers the concepts, configuration, and best practices for bi-directional synchronization between the Incidents platform and external ITSM systems.

Overview

Bi-directional sync enables real-time incident synchronization between the Incidents platform and external ITSM systems like ServiceNow, Jira Service Management, and PagerDuty.

Key Features

  • Real-time Updates: Changes propagate within 5 seconds
  • Field Ownership: Designate authoritative sources per field
  • Conflict Resolution: Handle simultaneous edits gracefully
  • Drift Detection: Reconciliation to catch missed updates
  • Dead Letter Queue: Retry failed syncs automatically

Supported Platforms

Platform Outbound Inbound Bi-directional
ServiceNow Yes Yes Yes
Jira SM Yes Yes Yes
PagerDuty Yes Yes Yes

Architecture

Data Flow

┌─────────────────┐                    ┌─────────────────┐
│    Incidents    │◄──── Webhook ─────│   External      │
│    Platform     │                    │   ITSM System   │
│                 │──── Outbox ───────►│                 │
└─────────────────┘                    └─────────────────┘
        │                                      │
        │    Field Ownership Rules             │
        │    Conflict Detection                │
        │    Drift Reconciliation              │
        └──────────────────────────────────────┘

Components

  1. Integration Configuration: Stores connection settings and credentials
  2. Field Ownership Engine: Determines authoritative source per field
  3. Outbound Sync Service: Pushes changes to external systems
  4. Inbound Webhook Handler: Processes updates from external systems
  5. Conflict Detector: Identifies simultaneous updates
  6. Drift Detector: Reconciles state between systems
  7. Dead Letter Queue: Manages failed sync attempts

Sync Directions

Outbound Only

Platform changes push to external system, webhooks are ignored:

im integration configure prod-snow --direction outbound

Inbound Only

Webhooks update platform, platform changes don’t sync out:

im integration configure prod-snow --direction inbound

Bi-directional

Full two-way sync with conflict detection:

im integration configure prod-snow --direction bidirectional

Field Ownership

Field ownership determines which system is authoritative for each incident field.

Ownership Types

Owner Behavior
platform Platform changes sync out; inbound updates are ignored
external External changes sync in; outbound updates are skipped

Configuration

# Set platform as owner of severity (priority 10)
im integration field-ownership set prod-snow severity --owner platform --priority 10

# Set external as owner of assigned_to (priority 5)
im integration field-ownership set prod-snow assigned_to --owner external --priority 5

# List all ownership rules
im integration field-ownership list prod-snow

Default Ownership Rules

Each integration type has sensible defaults:

ServiceNow:

  • Platform owns: severity, title, description
  • External owns: assigned_to, work_notes

Jira SM:

  • Platform owns: severity, priority
  • External owns: assignee, comments

PagerDuty:

  • Platform owns: title, description
  • External owns: escalation_policy, assigned_to

Conflict Resolution

When both systems update the same field within 5 seconds, a conflict is detected.

Resolution Strategies

Last-Write-Wins (Default)

The most recent update takes precedence:

im integration configure prod-snow --conflict-strategy last-write-wins

Ownership-Priority

The system with higher ownership priority for the field wins:

im integration configure prod-snow --conflict-strategy ownership-priority

Use this when field ownership should always be respected, even during simultaneous edits.

Manual Review

Conflicts are queued for human review:

im integration configure prod-snow --conflict-strategy manual-review

Managing Conflicts

# List pending conflicts
im conflict list --status pending

# View conflict details
im conflict show conflict-123

# Resolve using platform value
im conflict resolve conflict-123 --use-platform

# Resolve using external value
im conflict resolve conflict-123 --use-external

# Ignore the conflict (keep current state)
im conflict resolve conflict-123 --ignore

Drift Detection

Drift occurs when systems become out of sync due to missed webhooks, network issues, or direct database edits.

On-Demand Reconciliation

# Check for drift without making changes
im integration reconcile prod-snow --dry-run

# Auto-heal detected drift
im integration reconcile prod-snow --auto-heal

# View what would change
im integration reconcile prod-snow --dry-run --format json

Scheduled Reconciliation

Configure automatic reconciliation:

curl -X POST http://localhost:8080/api/v1/integrations/{id}/reconcile/schedule \
  -H "Content-Type: application/json" \
  -d '{
    "interval": "1h",
    "auto_heal": true,
    "enabled": true
  }'

Reconciliation History

im integration reconcile history prod-snow

Dead Letter Queue

Failed sync attempts are moved to the DLQ after 5 retries.

Managing the DLQ

# List failed entries
im integration dlq prod-snow

# Retry a specific entry
im integration dlq-retry prod-snow entry-123

# Discard permanently failed entry
im integration dlq-discard prod-snow entry-456

Automatic Retry

The system automatically retries with exponential backoff:

  • Attempt 1: Immediate
  • Attempt 2: 30 seconds
  • Attempt 3: 2 minutes
  • Attempt 4: 8 minutes
  • Attempt 5: 30 minutes
  • After 5 failures: Move to DLQ

Monitoring

Sync Status

# View integration health
im integration sync-status prod-snow

# Get detailed metrics
im integration sync-status prod-snow --format json

Metrics Available

Metric Description
outbound_success Successful outbound syncs
outbound_failed Failed outbound syncs
inbound_success Successful inbound syncs
inbound_failed Failed inbound syncs
conflicts_detected Total conflicts detected
conflicts_resolved Conflicts auto-resolved
drift_discrepancies Drift items found

OpenTelemetry Tracing

All sync operations emit OpenTelemetry spans with attributes:

  • sync.integration_id: Integration identifier
  • sync.integration_type: servicenow, jira, pagerduty
  • sync.direction: outbound, inbound
  • sync.incident_id: Platform incident ID
  • sync.external_id: External system ID

Security

Credential Storage

Credentials are encrypted at rest and never returned in API responses.

Webhook Validation

Each platform uses different signature validation:

Platform Method
ServiceNow HMAC-SHA256
Jira JWT (HS256)
PagerDuty HMAC-SHA256

RBAC

Access is controlled via OPA policies:

Role Permissions
platform.administrator Full access to all integrations
integration.manager Manage integrations in assigned scope
incident_commander Read sync status, resolve conflicts for assigned incidents
viewer Read-only access to sync status

Best Practices

1. Start with Outbound Only

Test outbound sync before enabling bi-directional to verify field mappings.

2. Define Clear Ownership

Explicitly set field ownership before enabling bi-directional sync.

3. Monitor Conflicts

Start with manual-review strategy to understand your conflict patterns.

4. Schedule Regular Reconciliation

Run drift detection at least hourly to catch missed updates.

5. Handle DLQ Promptly

Monitor the dead letter queue and investigate failures before they accumulate.

Troubleshooting

Sync Not Working

  1. Check integration is enabled: im integration get prod-snow
  2. Verify webhook is configured in external system
  3. Test connectivity: im integration test prod-snow
  4. Check logs for errors

Field Not Syncing

  1. Verify field ownership: im integration field-ownership list prod-snow
  2. Check field mapping exists
  3. Ensure field is included in webhook payload

Rate Limiting

  1. Check current rate limit settings
  2. Reduce sync frequency
  3. Enable rate limit backoff

Webhook Failures

  1. Verify signature/JWT secret
  2. Check network connectivity
  3. Validate payload format