Kubernetes Deployment

Overview

Kubernetes deployment provides a production-ready, scalable solution for the incident management platform. This guide covers deployment using Kustomize and Helm, with comprehensive monitoring, security, and high availability configurations.

Key Features:

High availability with multiple replicas
Auto-scaling based on CPU and memory usage
Persistent storage for PostgreSQL and Redis
Ingress configuration with TLS termination
Monitoring with Prometheus and Grafana
Service mesh integration ready

Prerequisites

Before deploying to Kubernetes:

Cluster Requirements

Kubernetes version: 1.21 or later
Node resources: Minimum 4 CPU cores, 8GB RAM
Storage: Dynamic provisioning support (recommended)
Ingress controller: nginx, traefik, or similar
cert-manager: For automatic TLS certificate management

Required Tools

kubectl configured for your cluster
helm 3.5+ (for Helm deployment method)
kustomize (for Kustomize deployment method)
Container registry access for image pulls

Verify Prerequisites

# Check cluster access
kubectl cluster-info

# Verify node resources
kubectl get nodes -o wide

# Check for ingress controller
kubectl get pods -n ingress-nginx

# Verify cert-manager (if using)
kubectl get pods -n cert-manager

# Check storage classes
kubectl get storageclass

Quick Deployment

Option 1: Kustomize (Recommended)

# Clone repository
git clone https://github.com/incidents/incidents.git
cd incidents

# Build and push image (update with your registry)
docker build -t your-registry/incidents:v1.0.0 .
docker push your-registry/incidents:v1.0.0

# Update image reference in kustomization.yaml
cd k8s
vim kustomization.yaml  # Update image tag

# Deploy all components
kubectl apply -k .

# Verify deployment
kubectl get pods -n incident-management

Option 2: Individual Resources

# Deploy components in order
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/postgres.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/incident-server.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/monitoring.yaml

Configuration Files

Namespace

k8s/namespace.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: incident-management
  labels:
    name: incident-management
    environment: production

Secrets

k8s/secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: incident-management-secrets
  namespace: incident-management
type: Opaque
data:
  postgres-password: <base64-encoded-password>
  jwt-secret: <base64-encoded-jwt-secret>
  redis-password: <base64-encoded-redis-password>
---
apiVersion: v1
kind: Secret
metadata:
  name: registry-secret
  namespace: incident-management
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: <base64-encoded-docker-config>

Generate secrets:

# Generate base64 encoded secrets
echo -n "your-strong-password" | base64
echo -n "your-jwt-secret-key" | base64

# Create registry secret for private registries
kubectl create secret docker-registry registry-secret \
  --docker-server=your-registry.com \
  --docker-username=your-username \
  --docker-password=your-password \
  --docker-email=your-email@company.com \
  -n incident-management

ConfigMap

k8s/configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-management-config
  namespace: incident-management
data:
  LOG_LEVEL: "info"
  PORT: "8080"
  PROMETHEUS_ENABLED: "true"
  REDIS_URL: "redis://redis:6379"
  DATABASE_URL: "postgres://incidents:$(POSTGRES_PASSWORD)@postgres:5432/incidents?sslmode=disable"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: incident-management
data:
  POSTGRES_DB: incidents
  POSTGRES_USER: incidents

Application Deployment

Incident Server Deployment

k8s/incident-server.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: incident-server
  namespace: incident-management
  labels:
    app: incident-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: incident-server
  template:
    metadata:
      labels:
        app: incident-server
    spec:
      imagePullSecrets:
        - name: registry-secret
      containers:
        - name: incident-server
          image: your-registry/incidents:v1.0.0
          ports:
            - containerPort: 8080
          env:
            - name: PORT
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: PORT
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: LOG_LEVEL
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: postgres-password
            - name: DATABASE_URL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: DATABASE_URL
            - name: JWT_SECRET
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: jwt-secret
            - name: REDIS_URL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: REDIS_URL
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /var/cache
      volumes:
        - name: tmp
          emptyDir: {}
        - name: cache
          emptyDir: {}
      securityContext:
        fsGroup: 1000
---
apiVersion: v1
kind: Service
metadata:
  name: incident-server
  namespace: incident-management
  labels:
    app: incident-server
spec:
  selector:
    app: incident-server
  ports:
    - name: http
      port: 80
      targetPort: 8080
  type: ClusterIP

Database Deployment

PostgreSQL

k8s/postgres.yaml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: incident-management
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          ports:
            - containerPort: 5432
          envFrom:
            - configMapRef:
                name: postgres-config
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: postgres-password
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
              subPath: postgres
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - pg_isready -U incidents -d incidents
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - pg_isready -U incidents -d incidents
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "fast-ssd" # Update with your storage class
        resources:
          requests:
            storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: incident-management
spec:
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432
  type: ClusterIP

Redis

k8s/redis.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: incident-management
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          ports:
            - containerPort: 6379
          command:
            - redis-server
            - --appendonly
            - "yes"
            - --requirepass
            - "$(REDIS_PASSWORD)"
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: redis-password
          volumeMounts:
            - name: redis-data
              mountPath: /data
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "250m"
          livenessProbe:
            exec:
              command:
                - redis-cli
                - ping
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - redis-cli
                - ping
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: redis-data
          persistentVolumeClaim:
            claimName: redis-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pvc
  namespace: incident-management
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: "fast-ssd"
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: incident-management
spec:
  selector:
    app: redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
  type: ClusterIP

Ingress Configuration

nginx Ingress

k8s/ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: incident-management-ingress
  namespace: incident-management
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
  tls:
    - hosts:
        - incidents.yourdomain.com
      secretName: incident-management-tls
  rules:
    - host: incidents.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: incident-server
                port:
                  number: 80

TLS Certificate (cert-manager)

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx

Auto-scaling Configuration

Horizontal Pod Autoscaler

k8s/hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: incident-server-hpa
  namespace: incident-management
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: incident-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
        - type: Pods
          value: 2
          periodSeconds: 60
      selectPolicy: Max

Vertical Pod Autoscaler (Optional)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: incident-server-vpa
  namespace: incident-management
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: incident-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: incident-server
        maxAllowed:
          cpu: 1000m
          memory: 2Gi
        minAllowed:
          cpu: 100m
          memory: 128Mi

Monitoring Configuration

ServiceMonitor for Prometheus

k8s/monitoring.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: incident-server-metrics
  namespace: incident-management
  labels:
    app: incident-server
spec:
  selector:
    matchLabels:
      app: incident-server
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: incident-management-alerts
  namespace: incident-management
spec:
  groups:
    - name: incident-management
      rules:
        - alert: IncidentServerDown
          expr: up{job="incident-server-metrics"} == 0
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "Incident Management Server is down"
            description: "Incident management server has been down for more than 2 minutes"

        - alert: HighResponseTime
          expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High response time for incident management API"
            description: "95th percentile response time is above 1 second"

        - alert: DatabaseConnectionFailure
          expr: postgresql_up == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "PostgreSQL database is down"
            description: "PostgreSQL database has been unreachable for more than 1 minute"

Grafana Dashboard ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-management-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  incident-management.json: |
    {
      "dashboard": {
        "title": "Incidents",
        "panels": [
          {
            "title": "Request Rate",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(http_requests_total[5m])) by (method, endpoint)"
              }
            ]
          },
          {
            "title": "Response Time",
            "type": "graph", 
            "targets": [
              {
                "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))"
              }
            ]
          }
        ]
      }
    }

Security Configuration

NetworkPolicy

k8s/network-policy.yaml:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: incident-management-network-policy
  namespace: incident-management
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - podSelector:
            matchLabels:
              app: incident-server
      ports:
        - protocol: TCP
          port: 5432 # PostgreSQL
        - protocol: TCP
          port: 6379 # Redis
  egress:
    - to: []
      ports:
        - protocol: TCP
          port: 53 # DNS
        - protocol: UDP
          port: 53 # DNS
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379

Pod Security Policy

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: incident-management-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - "configMap"
    - "emptyDir"
    - "projected"
    - "secret"
    - "downwardAPI"
    - "persistentVolumeClaim"
  runAsUser:
    rule: "MustRunAsNonRoot"
  seLinux:
    rule: "RunAsAny"
  fsGroup:
    rule: "RunAsAny"

Backup Configuration

Database Backup CronJob

k8s/backup.yaml:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: incident-management
spec:
  schedule: "0 2 * * *" # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: postgres-backup
              image: postgres:15-alpine
              command:
                - /bin/bash
                - -c
                - |
                  TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
                  pg_dump -h postgres -U incidents incidents | gzip > /backup/incidents_${TIMESTAMP}.sql.gz
                  find /backup -name "incidents_*.sql.gz" -mtime +7 -delete
              env:
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: incident-management-secrets
                      key: postgres-password
              volumeMounts:
                - name: backup-storage
                  mountPath: /backup
          restartPolicy: OnFailure
          volumes:
            - name: backup-storage
              persistentVolumeClaim:
                claimName: backup-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
  namespace: incident-management
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: "standard"

Deployment Management

Kustomization Configuration

k8s/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: incident-management

resources:
  - namespace.yaml
  - secret.yaml
  - configmap.yaml
  - postgres.yaml
  - redis.yaml
  - incident-server.yaml
  - ingress.yaml
  - monitoring.yaml
  - hpa.yaml
  - network-policy.yaml
  - backup.yaml

images:
  - name: your-registry/incidents
    newTag: v1.0.0

patchesStrategicMerge:
  - patches/production-patch.yaml

configMapGenerator:
  - name: incident-management-config
    literals:
      - LOG_LEVEL=info
      - ENVIRONMENT=production

secretGenerator:
  - name: incident-management-secrets
    literals:
      - postgres-password=your-secure-password
      - jwt-secret=your-jwt-secret
      - redis-password=your-redis-password

Production Patches

k8s/patches/production-patch.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: incident-server
spec:
  replicas: 5
  template:
    spec:
      containers:
        - name: incident-server
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          env:
            - name: LOG_LEVEL
              value: "warn"
            - name: ENVIRONMENT
              value: "production"

Operations & Maintenance

Deployment Commands

# Deploy new version
kubectl set image deployment/incident-server incident-server=your-registry/incidents:v1.1.0 -n incident-management

# Rollback deployment
kubectl rollout undo deployment/incident-server -n incident-management

# Check rollout status
kubectl rollout status deployment/incident-server -n incident-management

# Scale deployment
kubectl scale deployment incident-server --replicas=5 -n incident-management

Monitoring Commands

# Check pod status
kubectl get pods -n incident-management

# View pod logs
kubectl logs -f deployment/incident-server -n incident-management

# Check resource usage
kubectl top pods -n incident-management

# View events
kubectl get events -n incident-management --sort-by='.lastTimestamp'

Troubleshooting Commands

# Describe problematic resources
kubectl describe pod <pod-name> -n incident-management
kubectl describe deployment incident-server -n incident-management

# Access pod shell for debugging
kubectl exec -it <pod-name> -n incident-management -- /bin/sh

# Port forward for local access
kubectl port-forward svc/incident-server 8080:80 -n incident-management

# Check ingress configuration
kubectl describe ingress incident-management-ingress -n incident-management

Best Practices

Resource Management

Set resource requests and limits for all containers
Use horizontal pod autoscaling for application pods
Implement proper health checks for reliability
Configure pod disruption budgets for availability

Security

Run containers as non-root users
Use read-only root filesystems where possible
Implement network policies to restrict traffic
Regularly update container images and scan for vulnerabilities

High Availability

Deploy across multiple nodes using pod anti-affinity
Use persistent storage with replication
Implement proper backup strategies
Configure monitoring and alerting

Performance

Monitor resource usage and adjust as needed
Use appropriate storage classes for different workloads
Implement caching strategies with Redis
Optimize database queries and connections

This Kubernetes deployment provides a robust, scalable, and secure foundation for running the incident management platform in production environments.

Edit this page on GitHub

ChatOps Integration

Quick Start

Kubernetes Deployment

Overview#

Prerequisites#

Cluster Requirements#

Required Tools#

Verify Prerequisites#

Quick Deployment#

Option 1: Kustomize (Recommended)#

Option 2: Individual Resources#

Configuration Files#

Namespace#

Secrets#

ConfigMap#

Application Deployment#

Incident Server Deployment#

Database Deployment#

PostgreSQL#

Redis#

Ingress Configuration#

nginx Ingress#

TLS Certificate (cert-manager)#

Auto-scaling Configuration#

Horizontal Pod Autoscaler#

Vertical Pod Autoscaler (Optional)#

Monitoring Configuration#

ServiceMonitor for Prometheus#

Grafana Dashboard ConfigMap#

Security Configuration#

NetworkPolicy#

Pod Security Policy#

Backup Configuration#

Database Backup CronJob#

Deployment Management#

Kustomization Configuration#

Production Patches#

Operations & Maintenance#

Deployment Commands#

Monitoring Commands#

Troubleshooting Commands#

Best Practices#

Resource Management#

Security#

High Availability#

Performance#