PagerDuty Troubleshooting

Webhook Issues

Signature Validation Errors (401 Unauthorized)

Symptoms:

  • Webhooks return HTTP 401 Unauthorized
  • Logs show INVALID_SIGNATURE or MISSING_SIGNATURE errors

Common Causes and Solutions:

  1. Incorrect Webhook Secret

    # Verify the configured secret matches PagerDuty
    ./im connector pagerduty status
    
    # Rotate to a new secret if needed
    ./im connector pagerduty rotate-secret
  2. Secret Not Yet Active

    • After creating a webhook secret in PagerDuty, wait 1-2 minutes
    • Check that the secret is marked as active in your configuration
  3. Multiple Secrets During Rotation

    • During key rotation, both old and new secrets are temporarily valid
    • Ensure the rotation completes before removing the old secret
  4. Payload Tampering

    • Proxies or WAFs that modify request bodies will break signatures
    • Ensure webhooks bypass content-modifying middleware

V2 Webhook Format Errors (400 Bad Request)

Symptoms:

  • Webhooks return HTTP 400 Bad Request
  • Logs show V2_WEBHOOK_UNSUPPORTED error

Solution:

Webhooks Not Received

Symptoms:

  • PagerDuty shows webhook deliveries, but no events appear
  • No entries in timeline for PagerDuty events

Debugging Steps:

  1. Check Webhook Endpoint Accessibility

    # Test endpoint from external network
    curl -X POST https://your-domain/webhooks/pagerduty/v3 \
      -H "Content-Type: application/json" \
      -d '{}'
    # Should return 401 (not 404 or connection timeout)
  2. Verify Firewall Rules

    • PagerDuty webhooks come from IP ranges documented in their API docs
    • Ensure these IPs are allowed through your firewall
  3. Check TLS/SSL

    • PagerDuty requires valid TLS certificates
    • Self-signed certificates will cause webhook delivery failures

Incident Correlation Issues

Events Not Correlating to Existing Incidents

Symptoms:

  • Each PagerDuty event creates a new platform incident
  • Events from the same PagerDuty incident should share one platform incident

Causes:

  1. Missing or Changed Incident Key

    • PagerDuty uses incident_key for deduplication
    • Events without incident keys cannot be correlated by fingerprint
  2. Different Service IDs

    • Correlation uses service ID + incident key as fingerprint
    • Events from different services create separate incidents

Debugging:

# Check event fingerprints in logs
grep "fingerprint" /var/log/incidents/pagerduty.log

Auto-Declaration Not Working

Symptoms:

  • PagerDuty incident.triggered events don’t create platform incidents

Solutions:

  1. Check Auto-Declare Configuration

    ./im connector pagerduty status
    # Verify auto_declare: true
  2. Check Auto-Declare Filter

    • If filters are configured, verify the event matches the criteria
    • Filters can restrict by urgency or service ID

Metrics and Observability

Finding PagerDuty Metrics

The integration exposes these Prometheus metrics:

Metric Description
pagerduty_webhooks_received_total Total webhooks received by event type
pagerduty_webhook_processing_duration_seconds Processing time histogram
pagerduty_incidents_auto_declared_total Auto-declared incidents count
# Query metrics endpoint
curl -s http://localhost:9090/metrics | grep pagerduty_

Tracing Webhook Processing

Enable debug logging for detailed traces:

export LOG_LEVEL=debug
./im server start

Look for trace IDs in logs:

trace_id=abc123 span_id=def456 pagerduty.event_id=P123ABC

Connection Issues

API Connection Failures

Symptoms:

  • ./im connector test prod-pagerduty fails
  • Logs show connection timeout or refused errors

Solutions:

  1. Verify API Key

    # Test API key directly
    curl -X GET 'https://api.pagerduty.com/abilities' \
      -H 'Authorization: Token token=YOUR_API_KEY' \
      -H 'Content-Type: application/json'
  2. Check Network Access

    • Ensure outbound HTTPS (port 443) is allowed
    • Check proxy settings if behind corporate firewall
  3. API Key Permissions

    • Some operations require Admin or Manager role
    • Read operations work with any valid API key

Getting Help

If issues persist:

  1. Enable debug logging and collect relevant log sections
  2. Check the GitHub Issues for similar problems
  3. Open a new issue with:
    • Error messages
    • Configuration (redact secrets!)
    • Steps to reproduce