Docker Deployment

Overview

Docker deployment provides a fast, consistent way to run the incident management platform across different environments. This guide covers both single-container deployment for testing and multi-service deployment for production use.

Deployment Options:

  • Single container with SQLite (fastest for testing)
  • Docker Compose with PostgreSQL and Redis (recommended for production)
  • Multi-stage build with optimized images
  • Monitoring and observability stack included

Quick Start

Option 1: Single Container (SQLite)

Perfect for development and testing:

# Pull the latest image
docker pull incidents:latest

# Run with SQLite database
docker run -d \
  --name incident-server \
  -p 8080:8080 \
  incidents:latest

# Access at http://localhost:8080

Full production-ready stack:

# Clone repository (if not already done)
git clone https://github.com/incidents/incidents.git
cd incidents

# Start full stack
docker compose up -d

# Services available:
# - Incident Management: http://localhost:8080
# - PostgreSQL: localhost:5432
# - Redis: localhost:6379
# - Grafana: http://localhost:3100
# - Prometheus: http://localhost:9090

Option 3: Core Services Only

Start just the essential services:

# Start incident server, database, and cache
docker compose up -d incident-server postgres redis

# Verify services are running
docker compose ps

Building the Image

Build from Source

# Build the Docker image
docker build -t incidents:latest .

# Build with specific version tag
docker build -t incident-management:v1.0.0 .

# Build for specific architecture
docker buildx build --platform linux/amd64,linux/arm64 -t incidents:latest .

Multi-stage Build

The Dockerfile uses multi-stage builds for optimization:

# Stage 1: Build the Go application
FROM golang:1.26-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=1 GOOS=linux go build -ldflags="-s -w" -o incident-server ./cmd/im

# Stage 2: Runtime image
FROM alpine:latest
RUN apk --no-cache add ca-certificates tzdata
WORKDIR /app
COPY --from=builder /app/incident-server .
COPY --from=builder /app/web ./web
COPY --from=builder /app/policies ./policies
EXPOSE 8080
CMD ["./incident-server", "serve"]

Docker Compose Configuration

Basic docker-compose.yml

version: "3.8"

services:
  incident-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - PORT=8080
      - DATABASE_URL=postgres://incidents:incidents_password@postgres:5432/incidents?sslmode=disable
      - REDIS_URL=redis://redis:6379
      - JWT_SECRET=your-jwt-secret-here
      - LOG_LEVEL=info
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped

  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=incidents
      - POSTGRES_USER=incidents
      - POSTGRES_PASSWORD=incidents_password
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U incidents -d incidents"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Production docker-compose.yml

Enhanced configuration with monitoring and security:

version: "3.8"

services:
  incident-server:
    build: .
    ports:
      - "8080:8080"
    environment:
      - PORT=8080
      - DATABASE_URL=postgres://incidents:${POSTGRES_PASSWORD}@postgres:5432/incidents?sslmode=disable
      - REDIS_URL=redis://redis:6379
      - JWT_SECRET=${JWT_SECRET}
      - LOG_LEVEL=info
      - PROMETHEUS_ENABLED=true
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:14268/api/traces
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: "1"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M

  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=incidents
      - POSTGRES_USER=incidents
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U incidents -d incidents"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
      - ./redis.conf:/usr/local/etc/redis/redis.conf:ro
    command: redis-server /usr/local/etc/redis/redis.conf
    restart: unless-stopped

  # Monitoring Stack
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/etc/prometheus/console_libraries"
      - "--web.console.templates=/etc/prometheus/consoles"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3100:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_INSTALL_PLUGINS=grafana-piechart-panel
    volumes:
      - grafana_data:/var/lib/grafana
      - ./config/grafana/dashboards:/var/lib/grafana/dashboards:ro
      - ./config/grafana/datasources:/etc/grafana/provisioning/datasources:ro
    restart: unless-stopped

  # Push Notification Service
  gorush:
    image: appleboy/gorush:latest
    ports:
      - "8088:8088"
    volumes:
      - ./config/gorush.yml:/config/gorush.yml:ro
    command: -c /config/gorush.yml
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:
  prometheus_data:
  grafana_data:

Environment Variables

Core Configuration

# Server Configuration
PORT=8080                    # Server port
LOG_LEVEL=info              # Logging level: debug, info, warn, error

# Database Configuration
DATABASE_URL=postgres://incidents:password@localhost:5432/incidents?sslmode=disable
REDIS_URL=redis://localhost:6379

# Security
JWT_SECRET=your-secret-key-here              # JWT signing key (required)
SESSION_KEY=your-session-key-here            # Session encryption key

# External Integrations
GORUSH_URL=http://gorush:8088               # Push notification service
WEBHOOK_SECRET=your-webhook-secret          # Webhook validation secret

# Observability
PROMETHEUS_ENABLED=true                     # Enable Prometheus metrics
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:14268/api/traces

Environment File (.env)

Create a .env file for consistent configuration:

# Database
POSTGRES_PASSWORD=secure_password_here
DATABASE_URL=postgres://incidents:${POSTGRES_PASSWORD}@postgres:5432/incidents?sslmode=disable

# Redis
REDIS_URL=redis://redis:6379

# Application
JWT_SECRET=your-strong-jwt-secret-key-here
SESSION_KEY=your-session-encryption-key-here
LOG_LEVEL=info

# Monitoring
GRAFANA_PASSWORD=admin_password_here
PROMETHEUS_ENABLED=true

# Push Notifications
GORUSH_URL=http://gorush:8088

# External Services
PAGERDUTY_API_KEY=your-pagerduty-key
SLACK_BOT_TOKEN=your-slack-token

Service Configuration

PostgreSQL Configuration

Create scripts/init.sql for database initialization:

-- Create extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements";

-- Create application user with limited privileges
CREATE USER app_user WITH PASSWORD 'app_password';
GRANT CONNECT ON DATABASE incidents TO app_user;
GRANT USAGE ON SCHEMA public TO app_user;
GRANT CREATE ON SCHEMA public TO app_user;

-- Performance tuning
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';
ALTER SYSTEM SET max_connections = '200';
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
ALTER SYSTEM SET maintenance_work_mem = '64MB';
ALTER SYSTEM SET random_page_cost = '1.1';

-- Restart required for some settings to take effect
SELECT pg_reload_conf();

Redis Configuration

Create config/redis.conf:

# Basic configuration
port 6379
bind 127.0.0.1 0.0.0.0
protected-mode no

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Memory management
maxmemory 256mb
maxmemory-policy allkeys-lru

# Security
requirepass your_redis_password_here

# Logging
loglevel notice
logfile ""

# Performance
tcp-backlog 511
tcp-keepalive 300

Prometheus Configuration

Create config/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "incidents_rules.yml"

scrape_configs:
  - job_name: "incident-management"
    static_configs:
      - targets: ["incident-server:8080"]
    metrics_path: "/metrics"
    scrape_interval: 10s

  - job_name: "postgres"
    static_configs:
      - targets: ["postgres:5432"]

  - job_name: "redis"
    static_configs:
      - targets: ["redis:6379"]

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

Volume Management

Data Persistence

Ensure data persistence across container restarts:

# Create named volumes
docker volume create postgres_data
docker volume create redis_data
docker volume create grafana_data

# Verify volumes
docker volume ls

# Inspect volume details
docker volume inspect postgres_data

Backup and Restore

PostgreSQL Backup

# Backup database
docker exec postgres-container pg_dump -U incidents incidents > backup.sql

# Backup with compression
docker exec postgres-container pg_dump -U incidents -Z 9 incidents > backup.sql.gz

# Automated backup script
cat > backup.sh << 'EOF'
#!/bin/bash
DATE=$(date +"%Y%m%d_%H%M%S")
docker exec postgres-container pg_dump -U incidents incidents | gzip > "backup_${DATE}.sql.gz"
find . -name "backup_*.sql.gz" -mtime +7 -delete
EOF
chmod +x backup.sh

PostgreSQL Restore

# Restore from backup
docker exec -i postgres-container psql -U incidents -d incidents < backup.sql

# Restore from compressed backup
gunzip -c backup.sql.gz | docker exec -i postgres-container psql -U incidents -d incidents

Networking

Port Configuration

Service Internal Port External Port Description
Incident Server 8080 8080 Main API and web interface
PostgreSQL 5432 5432 Database (dev only)
Redis 6379 6379 Cache (dev only)
Prometheus 9090 9090 Metrics collection
Grafana 3000 3100 Monitoring dashboards
Gorush 8088 8088 Push notifications

Custom Networks

Create custom networks for service isolation:

version: "3.8"

services:
  incident-server:
    # ... service configuration
    networks:
      - app-network
      - db-network

  postgres:
    # ... service configuration
    networks:
      - db-network

  redis:
    # ... service configuration
    networks:
      - cache-network

networks:
  app-network:
    driver: bridge
  db-network:
    driver: bridge
    internal: true # Database not accessible from outside
  cache-network:
    driver: bridge
    internal: true # Cache not accessible from outside

Health Checks

Application Health Check

services:
  incident-server:
    # ... other configuration
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Custom Health Check Script

Create scripts/healthcheck.sh:

#!/bin/sh
set -e

# Check main API endpoint
curl -f http://localhost:8080/health || exit 1

# Check database connectivity
curl -f http://localhost:8080/health/db || exit 1

# Check Redis connectivity
curl -f http://localhost:8080/health/redis || exit 1

echo "All health checks passed"

Monitoring & Observability

Metrics Collection

The incident server exposes Prometheus metrics at /metrics:

# Add to docker-compose.yml
services:
  incident-server:
    environment:
      - PROMETHEUS_ENABLED=true
      - METRICS_PATH=/metrics
      - METRICS_INTERVAL=15s

Log Management

Configure structured logging:

services:
  incident-server:
    environment:
      - LOG_FORMAT=json
      - LOG_LEVEL=info
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Grafana Dashboards

Import pre-built dashboards:

# Copy dashboards to volume
cp -r config/grafana/dashboards/* /var/lib/docker/volumes/grafana_data/_data/dashboards/

# Restart Grafana to load dashboards
docker compose restart grafana

Security Considerations

Container Security

services:
  incident-server:
    # Run as non-root user
    user: "1000:1000"

    # Read-only filesystem
    read_only: true

    # Temporary filesystem for writable areas
    tmpfs:
      - /tmp
      - /var/cache

    # Security options
    security_opt:
      - no-new-privileges:true

    # Capabilities
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Secrets Management

Use Docker secrets for sensitive data:

version: "3.8"

services:
  incident-server:
    secrets:
      - postgres_password
      - jwt_secret
    environment:
      - DATABASE_URL=postgres://incidents:$(cat /run/secrets/postgres_password)@postgres:5432/incidents
      - JWT_SECRET_FILE=/run/secrets/jwt_secret

secrets:
  postgres_password:
    file: ./secrets/postgres_password.txt
  jwt_secret:
    file: ./secrets/jwt_secret.txt

Network Security

# Restrict external access
services:
  postgres:
    ports: [] # No external ports
    expose:
      - "5432" # Internal access only

  redis:
    ports: [] # No external ports
    expose:
      - "6379" # Internal access only

Troubleshooting

Common Issues

Container Won’t Start

# Check container logs
docker logs incident-server

# Check container status
docker ps -a

# Inspect container configuration
docker inspect incident-server

Database Connection Issues

# Test database connectivity
docker exec incident-server pg_isready -h postgres -p 5432 -U incidents

# Check PostgreSQL logs
docker logs postgres-container

# Verify environment variables
docker exec incident-server env | grep DATABASE_URL

Performance Issues

# Check resource usage
docker stats

# Monitor container logs in real-time
docker logs -f incident-server

# Check system resources
docker system df
docker system prune # Clean up unused resources

Debug Commands

# Access container shell
docker exec -it incident-server /bin/sh

# Check running processes
docker exec incident-server ps aux

# Verify file permissions
docker exec incident-server ls -la /app

# Test network connectivity
docker exec incident-server ping postgres
docker exec incident-server telnet redis 6379

Log Analysis

# View recent logs
docker compose logs --tail=50 incident-server

# Follow logs in real-time
docker compose logs -f incident-server

# Search logs for errors
docker compose logs incident-server | grep ERROR

# Export logs for analysis
docker compose logs --no-color incident-server > incident-server.log

Best Practices

Image Management

  • Use specific tags instead of latest for production
  • Regularly update base images for security patches
  • Use multi-stage builds to minimize image size
  • Scan images for vulnerabilities before deployment

Resource Management

  • Set resource limits for all containers
  • Use health checks for automatic recovery
  • Monitor resource usage and adjust limits accordingly
  • Implement log rotation to prevent disk space issues

Data Management

  • Use named volumes for persistent data
  • Implement regular backups with automated scripts
  • Test restore procedures regularly
  • Monitor database performance and tune as needed

Security

  • Never expose databases to external networks in production
  • Use secrets management for sensitive configuration
  • Run containers as non-root users when possible
  • Regularly update images and apply security patches

This Docker deployment approach provides a robust, scalable foundation for running the incident management platform in any environment.