escalation Package
Overview
Package escalation provides automated incident escalation based on SLA breaches and configurable rules.
This package implements intelligent incident escalation that automatically monitors incidents for SLA breaches, applies configurable escalation rules, and executes appropriate escalation actions to ensure timely incident resolution and management visibility. It integrates with SLA monitoring, push notifications, and timeline services for comprehensive escalation management.
Key Features:
- Automated SLA breach monitoring with configurable thresholds and time windows
- Rule-based escalation with flexible conditions and multiple action types
- Multiple escalation actions: notifications, assignments, severity increases, and paging
- Comprehensive escalation history tracking and audit trails
- Integration with push notification system for immediate alert delivery
- Timeline integration for escalation event logging and incident context
- Real-time escalation monitoring with configurable check intervals
Architecture:
The escalation system follows a monitoring and action-based architecture:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ SLA Monitoring │───►│ Escalation │───►│ Action │
│ (Threshold) │ │ Rules Engine │ │ Execution │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Active │ │ Rule │ │ Notifications │
│ Incidents │ │ Evaluation │ │ & Timeline │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Escalation Types:
- Time-based: Escalation triggered by SLA time thresholds (acknowledge, resolve)
- Threshold-based: Escalation at configurable percentages of SLA time limits
- Rule-based: Complex escalation conditions with multiple criteria and actions
- Severity-based: Different escalation paths based on incident severity levels
Escalation Actions:
- severity_increase: Automatically increase incident severity to draw attention
- notify: Send push notifications to specific users, teams, or all responders
- assign: Reassign incidents to different users or escalation teams
- page: Trigger paging systems for critical escalations requiring immediate attention
Rule Configuration: Escalation rules support flexible configuration including:
- SLA type targeting (acknowledge or resolve SLAs)
- Percentage thresholds for escalation timing (e.g., 80% of SLA time)
- Multiple escalation actions per rule with different targets
- Rule activation/deactivation for dynamic escalation management
- Historical tracking to prevent duplicate escalations
Example usage:
// Create escalation service with dependencies
escalationService := escalation.NewService(db, slaService, timelineService, pushService)
// Start escalation monitoring with 5-minute check interval
err := escalationService.Start(5 * time.Minute)
if err != nil {
log.Fatal(err)
}
// Add escalation rule for SEV-1 incidents
rule := &escalation.EscalationRule{
Name: "SEV-1 Acknowledge Escalation",
Description: "Escalate SEV-1 incidents at 80% of acknowledge SLA",
SLAType: "acknowledge",
ThresholdPercent: 80.0,
EscalationActions: []escalation.EscalationAction{
{
Type: "severity_increase",
Target: "",
},
{
Type: "notify",
Target: "incident-manager",
Parameters: map[string]interface{}{
"priority": "urgent",
"message": "SEV-1 incident requires immediate attention",
},
},
{
Type: "assign",
Target: "escalation-team",
},
},
Active: true,
}
err = escalationService.AddRule(rule)
if err != nil {
log.Fatal(err)
}
// Get escalation history for an incident
history, err := escalationService.GetEscalationHistory("INC-123")
if err != nil {
log.Fatal(err)
}
for _, event := range history {
fmt.Printf("Escalation: %s - %s (%s)\n",
event.RuleName, event.Reason, event.ExecutedAt)
}
Integration Points: The escalation service integrates with multiple platform components:
- SLA Service: Monitors SLA status and breach conditions for escalation triggers
- Push Notification Service: Delivers escalation alerts to mobile devices and teams
- Timeline Service: Records all escalation events in incident timelines for audit trails
- Database: Stores escalation rules, history, and configuration for persistence
- Incident Management: Updates incident severity and assignments based on escalation actions
Monitoring and Analytics: The escalation system provides comprehensive operational visibility:
- Escalation frequency and effectiveness tracking across incident types
- Rule performance analysis to optimize escalation thresholds and actions
- SLA breach correlation to identify systemic issues and improvement opportunities
- User and team escalation load balancing for optimal incident distribution
- Historical trending for escalation policy refinement and process improvement
Performance and Reliability: The escalation service is designed for reliable, high-performance operation:
- Efficient incident polling with configurable check intervals
- Rule evaluation optimization to minimize database load
- Concurrent-safe rule management with read-write locking
- Graceful error handling to prevent escalation system failures
- Comprehensive logging and monitoring for operational visibility
Import Path: github.com/systmms/incidents/internal/escalation
Types
CentrifugePublisher
CentrifugePublisher interface for publishing real-time escalation events
{<nil> 442 type 0 [0x1400078a080] 0}EscalationAction
EscalationAction defines specific actions to execute during incident escalation.
This structure specifies individual escalation actions including action type, target specification, and action-specific parameters for flexible escalation behavior. Multiple actions can be configured per rule to provide comprehensive escalation response including notifications, assignments, and system updates.
Action Types:
- “notify”: Send push notifications or alerts to specified targets
- “assign”: Reassign incident to different users, teams, or escalation groups
- “page”: Trigger paging systems for urgent escalations requiring immediate attention
- “severity_increase”: Automatically increase incident severity level
Target Specification: Targets vary by action type and support flexible addressing:
- User IDs for individual notifications or assignments
- Team names for group notifications and team assignments
- “all” for broadcast notifications to all responders
- Service endpoints for paging system integration
{<nil> 24807 type 0 [0x140000b4d40] 0}EscalationEvent
EscalationEvent represents a completed escalation execution with comprehensive context.
This structure captures the complete record of an escalation event including the triggering rule, execution timestamp, escalation reasoning, and all executed actions. It provides comprehensive audit trails and historical tracking for escalation analysis, policy optimization, and compliance documentation.
Event Tracking: Events are automatically created for every escalation execution:
- Rule identification and escalation reasoning
- Complete action list with execution details
- Precise execution timestamps for timing analysis
- Incident association for historical correlation
Analytics and Reporting: Events support comprehensive escalation analytics:
- Escalation frequency and effectiveness tracking
- Rule performance analysis and optimization insights
- Time-to-escalation metrics for SLA policy tuning
- User and team escalation load distribution analysis
{<nil> 26355 type 0 [0x140000b5080] 0}EscalationRule
EscalationRule defines comprehensive escalation conditions and actions for automated incident management.
This structure encapsulates the complete specification for when and how incidents should be escalated, including SLA thresholds, escalation timing, and multiple action types. Rules provide flexible, configurable escalation policies that can be tailored to different incident types, severity levels, and organizational response requirements.
Rule Evaluation: Rules are continuously evaluated against active incidents to determine escalation needs:
- SLA threshold monitoring based on configurable percentage of SLA time
- Historical escalation tracking to prevent duplicate escalations
- Active/inactive rule management for dynamic escalation policy control
- Rule priority and ordering for complex escalation scenarios
Action Execution: When escalation conditions are met, rules execute multiple actions simultaneously:
- Notification delivery to specified users, teams, or all responders
- Incident reassignment to escalation teams or managers
- Severity increases to draw attention and prioritize response
- Paging system integration for critical escalations requiring immediate response
{<nil> 22434 type 0 [0x140000b49c0] 0}Methods
CreateEscalationRuleFromTemplate
CreateEscalationRuleFromTemplate creates a customizable rule from a template
{<nil> <nil> CreateEscalationRuleFromTemplate 0x140002b7800 <nil>}GetDefaultRules
GetDefaultRules returns a set of common escalation rules
{<nil> <nil> GetDefaultRules 0x140002b6900 <nil>}PushNotifier
PushNotifier interface for sending push notifications during escalation actions.
This interface abstracts the push notification system for escalation delivery, enabling flexible notification backends while providing consistent escalation alert functionality. It supports both targeted user notifications and broadcast alerts for comprehensive escalation communication.
{<nil> 30252 type 0 [0x140000b5800] 0}RealtimeEscalationService
RealtimeEscalationService extends the base escalation service with real-time capabilities
{<nil> 272 type 0 [0x14000709e80] 0}Methods
NewRealtimeEscalationService
NewRealtimeEscalationService creates a new real-time escalation service
{<nil> <nil> NewRealtimeEscalationService 0x140004da520 <nil>}Service
Service handles automated incident escalation based on SLA breaches and configurable rules.
The Service provides comprehensive escalation management including real-time SLA monitoring, rule-based escalation execution, and integration with notification systems. It continuously monitors active incidents against configured escalation rules and executes appropriate actions when escalation conditions are met.
Core Responsibilities:
- Continuous monitoring of active incidents for SLA breach conditions
- Rule-based escalation evaluation with configurable thresholds and conditions
- Multi-action escalation execution including notifications, assignments, and system updates
- Comprehensive escalation history tracking and audit trail maintenance
- Integration with push notification system for immediate alert delivery
- Timeline integration for escalation event logging and incident context
Service Architecture: The service operates as a background monitoring system with:
- Configurable check intervals for incident SLA evaluation
- Thread-safe rule management with concurrent read/write access
- Integrated notification delivery through push notification service
- Comprehensive logging and error handling for operational visibility
- Database persistence for rule storage and escalation history
Concurrency and Safety: The service is designed for safe concurrent operation:
- Read-write mutex protection for rule management operations
- Thread-safe incident evaluation and escalation execution
- Graceful shutdown handling with proper resource cleanup
- Isolated error handling to prevent cascade failures
{<nil> 28917 type 0 [0x140000b54c0] 0}Methods
NewService
NewService creates a new escalation service with comprehensive dependency integration.
This constructor initializes the escalation service with all required dependencies for SLA monitoring, escalation rule management, notification delivery, and audit logging. The service is immediately ready for rule configuration and monitoring startup.
Dependencies:
- db: Database connection for rule storage and escalation history persistence
- slaService: SLA monitoring service for breach detection and threshold evaluation
- timelineService: Timeline service for escalation event logging and audit trails
- pushService: Push notification service for escalation alert delivery
Initialization: The constructor performs comprehensive service initialization:
- Empty rule map creation with thread-safe access control
- Stop channel creation for graceful service shutdown
- Structured logger initialization with escalation service identification
- Service state preparation for rule loading and monitoring startup
Service Lifecycle: After creation, the service follows this lifecycle:
- Rule loading from database with loadRules()
- Monitoring startup with Start() and configured check interval
- Continuous operation with automatic escalation evaluation
- Graceful shutdown with Stop() and resource cleanup
Returns a fully initialized Service ready for rule configuration and monitoring.
{<nil> <nil> NewService 0x14000236b40 <nil>}Functions
ValidateEscalationRule
ValidateEscalationRule validates that an escalation rule is properly configured
{<nil> <nil> ValidateEscalationRule 0x140002bf660 <nil>}calculateUrgency
{<nil> <nil> calculateUrgency 0x1400022ad00 <nil>}getSuggestedSeverity
{<nil> <nil> getSuggestedSeverity 0x1400022b720 <nil>}getWarningLevel
{<nil> <nil> getWarningLevel 0x1400022af20 <nil>}validateEscalationAction
validateEscalationAction validates a single escalation action
{<nil> <nil> validateEscalationAction 0x140002c4a60 <nil>}Generated automatically from Go source code. Last updated: 2025-08-25T07:51:05-04:00