Kubernetes Deployment

Overview

Kubernetes deployment provides a production-ready, scalable solution for the incident management platform. This guide covers deployment using Kustomize and Helm, with comprehensive monitoring, security, and high availability configurations.

Key Features:

  • High availability with multiple replicas
  • Auto-scaling based on CPU and memory usage
  • Persistent storage for PostgreSQL and Redis
  • Ingress configuration with TLS termination
  • Monitoring with Prometheus and Grafana
  • Service mesh integration ready

Prerequisites

Before deploying to Kubernetes:

Cluster Requirements

  • Kubernetes version: 1.21 or later
  • Node resources: Minimum 4 CPU cores, 8GB RAM
  • Storage: Dynamic provisioning support (recommended)
  • Ingress controller: nginx, traefik, or similar
  • cert-manager: For automatic TLS certificate management

Required Tools

  • kubectl configured for your cluster
  • helm 3.5+ (for Helm deployment method)
  • kustomize (for Kustomize deployment method)
  • Container registry access for image pulls

Verify Prerequisites

# Check cluster access
kubectl cluster-info

# Verify node resources
kubectl get nodes -o wide

# Check for ingress controller
kubectl get pods -n ingress-nginx

# Verify cert-manager (if using)
kubectl get pods -n cert-manager

# Check storage classes
kubectl get storageclass

Quick Deployment

# Clone repository
git clone https://github.com/incidents/incidents.git
cd incidents

# Build and push image (update with your registry)
docker build -t your-registry/incidents:v1.0.0 .
docker push your-registry/incidents:v1.0.0

# Update image reference in kustomization.yaml
cd k8s
vim kustomization.yaml  # Update image tag

# Deploy all components
kubectl apply -k .

# Verify deployment
kubectl get pods -n incident-management

Option 2: Individual Resources

# Deploy components in order
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/postgres.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/incident-server.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/monitoring.yaml

Configuration Files

Namespace

k8s/namespace.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: incident-management
  labels:
    name: incident-management
    environment: production

Secrets

k8s/secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: incident-management-secrets
  namespace: incident-management
type: Opaque
data:
  postgres-password: <base64-encoded-password>
  jwt-secret: <base64-encoded-jwt-secret>
  redis-password: <base64-encoded-redis-password>
---
apiVersion: v1
kind: Secret
metadata:
  name: registry-secret
  namespace: incident-management
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: <base64-encoded-docker-config>

Generate secrets:

# Generate base64 encoded secrets
echo -n "your-strong-password" | base64
echo -n "your-jwt-secret-key" | base64

# Create registry secret for private registries
kubectl create secret docker-registry registry-secret \
  --docker-server=your-registry.com \
  --docker-username=your-username \
  --docker-password=your-password \
  --docker-email=your-email@company.com \
  -n incident-management

ConfigMap

k8s/configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-management-config
  namespace: incident-management
data:
  LOG_LEVEL: "info"
  PORT: "8080"
  PROMETHEUS_ENABLED: "true"
  REDIS_URL: "redis://redis:6379"
  DATABASE_URL: "postgres://incidents:$(POSTGRES_PASSWORD)@postgres:5432/incidents?sslmode=disable"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: incident-management
data:
  POSTGRES_DB: incidents
  POSTGRES_USER: incidents

Application Deployment

Incident Server Deployment

k8s/incident-server.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: incident-server
  namespace: incident-management
  labels:
    app: incident-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: incident-server
  template:
    metadata:
      labels:
        app: incident-server
    spec:
      imagePullSecrets:
        - name: registry-secret
      containers:
        - name: incident-server
          image: your-registry/incidents:v1.0.0
          ports:
            - containerPort: 8080
          env:
            - name: PORT
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: PORT
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: LOG_LEVEL
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: postgres-password
            - name: DATABASE_URL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: DATABASE_URL
            - name: JWT_SECRET
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: jwt-secret
            - name: REDIS_URL
              valueFrom:
                configMapKeyRef:
                  name: incident-management-config
                  key: REDIS_URL
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /var/cache
      volumes:
        - name: tmp
          emptyDir: {}
        - name: cache
          emptyDir: {}
      securityContext:
        fsGroup: 1000
---
apiVersion: v1
kind: Service
metadata:
  name: incident-server
  namespace: incident-management
  labels:
    app: incident-server
spec:
  selector:
    app: incident-server
  ports:
    - name: http
      port: 80
      targetPort: 8080
  type: ClusterIP

Database Deployment

PostgreSQL

k8s/postgres.yaml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: incident-management
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          ports:
            - containerPort: 5432
          envFrom:
            - configMapRef:
                name: postgres-config
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: postgres-password
          volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
              subPath: postgres
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - pg_isready -U incidents -d incidents
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - pg_isready -U incidents -d incidents
            initialDelaySeconds: 5
            periodSeconds: 5
  volumeClaimTemplates:
    - metadata:
        name: postgres-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "fast-ssd" # Update with your storage class
        resources:
          requests:
            storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: incident-management
spec:
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432
  type: ClusterIP

Redis

k8s/redis.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: incident-management
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          ports:
            - containerPort: 6379
          command:
            - redis-server
            - --appendonly
            - "yes"
            - --requirepass
            - "$(REDIS_PASSWORD)"
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: incident-management-secrets
                  key: redis-password
          volumeMounts:
            - name: redis-data
              mountPath: /data
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "250m"
          livenessProbe:
            exec:
              command:
                - redis-cli
                - ping
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - redis-cli
                - ping
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: redis-data
          persistentVolumeClaim:
            claimName: redis-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: redis-pvc
  namespace: incident-management
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: "fast-ssd"
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: incident-management
spec:
  selector:
    app: redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
  type: ClusterIP

Ingress Configuration

nginx Ingress

k8s/ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: incident-management-ingress
  namespace: incident-management
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
  tls:
    - hosts:
        - incidents.yourdomain.com
      secretName: incident-management-tls
  rules:
    - host: incidents.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: incident-server
                port:
                  number: 80

TLS Certificate (cert-manager)

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx

Auto-scaling Configuration

Horizontal Pod Autoscaler

k8s/hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: incident-server-hpa
  namespace: incident-management
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: incident-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
        - type: Pods
          value: 2
          periodSeconds: 60
      selectPolicy: Max

Vertical Pod Autoscaler (Optional)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: incident-server-vpa
  namespace: incident-management
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: incident-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: incident-server
        maxAllowed:
          cpu: 1000m
          memory: 2Gi
        minAllowed:
          cpu: 100m
          memory: 128Mi

Monitoring Configuration

ServiceMonitor for Prometheus

k8s/monitoring.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: incident-server-metrics
  namespace: incident-management
  labels:
    app: incident-server
spec:
  selector:
    matchLabels:
      app: incident-server
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: incident-management-alerts
  namespace: incident-management
spec:
  groups:
    - name: incident-management
      rules:
        - alert: IncidentServerDown
          expr: up{job="incident-server-metrics"} == 0
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "Incident Management Server is down"
            description: "Incident management server has been down for more than 2 minutes"

        - alert: HighResponseTime
          expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High response time for incident management API"
            description: "95th percentile response time is above 1 second"

        - alert: DatabaseConnectionFailure
          expr: postgresql_up == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            summary: "PostgreSQL database is down"
            description: "PostgreSQL database has been unreachable for more than 1 minute"

Grafana Dashboard ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: incident-management-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  incident-management.json: |
    {
      "dashboard": {
        "title": "Incidents",
        "panels": [
          {
            "title": "Request Rate",
            "type": "graph",
            "targets": [
              {
                "expr": "sum(rate(http_requests_total[5m])) by (method, endpoint)"
              }
            ]
          },
          {
            "title": "Response Time",
            "type": "graph", 
            "targets": [
              {
                "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))"
              }
            ]
          }
        ]
      }
    }

Security Configuration

NetworkPolicy

k8s/network-policy.yaml:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: incident-management-network-policy
  namespace: incident-management
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - podSelector:
            matchLabels:
              app: incident-server
      ports:
        - protocol: TCP
          port: 5432 # PostgreSQL
        - protocol: TCP
          port: 6379 # Redis
  egress:
    - to: []
      ports:
        - protocol: TCP
          port: 53 # DNS
        - protocol: UDP
          port: 53 # DNS
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379

Pod Security Policy

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: incident-management-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - "configMap"
    - "emptyDir"
    - "projected"
    - "secret"
    - "downwardAPI"
    - "persistentVolumeClaim"
  runAsUser:
    rule: "MustRunAsNonRoot"
  seLinux:
    rule: "RunAsAny"
  fsGroup:
    rule: "RunAsAny"

Backup Configuration

Database Backup CronJob

k8s/backup.yaml:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: incident-management
spec:
  schedule: "0 2 * * *" # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: postgres-backup
              image: postgres:15-alpine
              command:
                - /bin/bash
                - -c
                - |
                  TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
                  pg_dump -h postgres -U incidents incidents | gzip > /backup/incidents_${TIMESTAMP}.sql.gz
                  find /backup -name "incidents_*.sql.gz" -mtime +7 -delete
              env:
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: incident-management-secrets
                      key: postgres-password
              volumeMounts:
                - name: backup-storage
                  mountPath: /backup
          restartPolicy: OnFailure
          volumes:
            - name: backup-storage
              persistentVolumeClaim:
                claimName: backup-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: backup-pvc
  namespace: incident-management
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: "standard"

Deployment Management

Kustomization Configuration

k8s/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: incident-management

resources:
  - namespace.yaml
  - secret.yaml
  - configmap.yaml
  - postgres.yaml
  - redis.yaml
  - incident-server.yaml
  - ingress.yaml
  - monitoring.yaml
  - hpa.yaml
  - network-policy.yaml
  - backup.yaml

images:
  - name: your-registry/incidents
    newTag: v1.0.0

patchesStrategicMerge:
  - patches/production-patch.yaml

configMapGenerator:
  - name: incident-management-config
    literals:
      - LOG_LEVEL=info
      - ENVIRONMENT=production

secretGenerator:
  - name: incident-management-secrets
    literals:
      - postgres-password=your-secure-password
      - jwt-secret=your-jwt-secret
      - redis-password=your-redis-password

Production Patches

k8s/patches/production-patch.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: incident-server
spec:
  replicas: 5
  template:
    spec:
      containers:
        - name: incident-server
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          env:
            - name: LOG_LEVEL
              value: "warn"
            - name: ENVIRONMENT
              value: "production"

Operations & Maintenance

Deployment Commands

# Deploy new version
kubectl set image deployment/incident-server incident-server=your-registry/incidents:v1.1.0 -n incident-management

# Rollback deployment
kubectl rollout undo deployment/incident-server -n incident-management

# Check rollout status
kubectl rollout status deployment/incident-server -n incident-management

# Scale deployment
kubectl scale deployment incident-server --replicas=5 -n incident-management

Monitoring Commands

# Check pod status
kubectl get pods -n incident-management

# View pod logs
kubectl logs -f deployment/incident-server -n incident-management

# Check resource usage
kubectl top pods -n incident-management

# View events
kubectl get events -n incident-management --sort-by='.lastTimestamp'

Troubleshooting Commands

# Describe problematic resources
kubectl describe pod <pod-name> -n incident-management
kubectl describe deployment incident-server -n incident-management

# Access pod shell for debugging
kubectl exec -it <pod-name> -n incident-management -- /bin/sh

# Port forward for local access
kubectl port-forward svc/incident-server 8080:80 -n incident-management

# Check ingress configuration
kubectl describe ingress incident-management-ingress -n incident-management

Best Practices

Resource Management

  • Set resource requests and limits for all containers
  • Use horizontal pod autoscaling for application pods
  • Implement proper health checks for reliability
  • Configure pod disruption budgets for availability

Security

  • Run containers as non-root users
  • Use read-only root filesystems where possible
  • Implement network policies to restrict traffic
  • Regularly update container images and scan for vulnerabilities

High Availability

  • Deploy across multiple nodes using pod anti-affinity
  • Use persistent storage with replication
  • Implement proper backup strategies
  • Configure monitoring and alerting

Performance

  • Monitor resource usage and adjust as needed
  • Use appropriate storage classes for different workloads
  • Implement caching strategies with Redis
  • Optimize database queries and connections

This Kubernetes deployment provides a robust, scalable, and secure foundation for running the incident management platform in production environments.