Kubernetes Deployment
Overview
Kubernetes deployment provides a production-ready, scalable solution for the incident management platform. This guide covers deployment using Kustomize and Helm, with comprehensive monitoring, security, and high availability configurations.
Key Features:
- High availability with multiple replicas
- Auto-scaling based on CPU and memory usage
- Persistent storage for PostgreSQL and Redis
- Ingress configuration with TLS termination
- Monitoring with Prometheus and Grafana
- Service mesh integration ready
Prerequisites
Before deploying to Kubernetes:
Cluster Requirements
- Kubernetes version: 1.21 or later
- Node resources: Minimum 4 CPU cores, 8GB RAM
- Storage: Dynamic provisioning support (recommended)
- Ingress controller: nginx, traefik, or similar
- cert-manager: For automatic TLS certificate management
Required Tools
kubectlconfigured for your clusterhelm3.5+ (for Helm deployment method)kustomize(for Kustomize deployment method)- Container registry access for image pulls
Verify Prerequisites
# Check cluster access
kubectl cluster-info
# Verify node resources
kubectl get nodes -o wide
# Check for ingress controller
kubectl get pods -n ingress-nginx
# Verify cert-manager (if using)
kubectl get pods -n cert-manager
# Check storage classes
kubectl get storageclassQuick Deployment
Option 1: Kustomize (Recommended)
# Clone repository
git clone https://github.com/incidents/incidents.git
cd incidents
# Build and push image (update with your registry)
docker build -t your-registry/incidents:v1.0.0 .
docker push your-registry/incidents:v1.0.0
# Update image reference in kustomization.yaml
cd k8s
vim kustomization.yaml # Update image tag
# Deploy all components
kubectl apply -k .
# Verify deployment
kubectl get pods -n incident-managementOption 2: Individual Resources
# Deploy components in order
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/postgres.yaml
kubectl apply -f k8s/redis.yaml
kubectl apply -f k8s/incident-server.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/monitoring.yamlConfiguration Files
Namespace
k8s/namespace.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: incident-management
labels:
name: incident-management
environment: productionSecrets
k8s/secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: incident-management-secrets
namespace: incident-management
type: Opaque
data:
postgres-password: <base64-encoded-password>
jwt-secret: <base64-encoded-jwt-secret>
redis-password: <base64-encoded-redis-password>
---
apiVersion: v1
kind: Secret
metadata:
name: registry-secret
namespace: incident-management
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <base64-encoded-docker-config>Generate secrets:
# Generate base64 encoded secrets
echo -n "your-strong-password" | base64
echo -n "your-jwt-secret-key" | base64
# Create registry secret for private registries
kubectl create secret docker-registry registry-secret \
--docker-server=your-registry.com \
--docker-username=your-username \
--docker-password=your-password \
--docker-email=your-email@company.com \
-n incident-managementConfigMap
k8s/configmap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: incident-management-config
namespace: incident-management
data:
LOG_LEVEL: "info"
PORT: "8080"
PROMETHEUS_ENABLED: "true"
REDIS_URL: "redis://redis:6379"
DATABASE_URL: "postgres://incidents:$(POSTGRES_PASSWORD)@postgres:5432/incidents?sslmode=disable"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: incident-management
data:
POSTGRES_DB: incidents
POSTGRES_USER: incidentsApplication Deployment
Incident Server Deployment
k8s/incident-server.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: incident-server
namespace: incident-management
labels:
app: incident-server
spec:
replicas: 3
selector:
matchLabels:
app: incident-server
template:
metadata:
labels:
app: incident-server
spec:
imagePullSecrets:
- name: registry-secret
containers:
- name: incident-server
image: your-registry/incidents:v1.0.0
ports:
- containerPort: 8080
env:
- name: PORT
valueFrom:
configMapKeyRef:
name: incident-management-config
key: PORT
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: incident-management-config
key: LOG_LEVEL
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: incident-management-secrets
key: postgres-password
- name: DATABASE_URL
valueFrom:
configMapKeyRef:
name: incident-management-config
key: DATABASE_URL
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: incident-management-secrets
key: jwt-secret
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: incident-management-config
key: REDIS_URL
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
securityContext:
fsGroup: 1000
---
apiVersion: v1
kind: Service
metadata:
name: incident-server
namespace: incident-management
labels:
app: incident-server
spec:
selector:
app: incident-server
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIPDatabase Deployment
PostgreSQL
k8s/postgres.yaml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: incident-management
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15-alpine
ports:
- containerPort: 5432
envFrom:
- configMapRef:
name: postgres-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: incident-management-secrets
key: postgres-password
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: postgres
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U incidents -d incidents
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U incidents -d incidents
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "fast-ssd" # Update with your storage class
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: incident-management
spec:
selector:
app: postgres
ports:
- name: postgres
port: 5432
targetPort: 5432
type: ClusterIPRedis
k8s/redis.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: incident-management
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
command:
- redis-server
- --appendonly
- "yes"
- --requirepass
- "$(REDIS_PASSWORD)"
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: incident-management-secrets
key: redis-password
volumeMounts:
- name: redis-data
mountPath: /data
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "250m"
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: redis-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-pvc
namespace: incident-management
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: "fast-ssd"
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: incident-management
spec:
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379
type: ClusterIPIngress Configuration
nginx Ingress
k8s/ingress.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: incident-management-ingress
namespace: incident-management
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
spec:
tls:
- hosts:
- incidents.yourdomain.com
secretName: incident-management-tls
rules:
- host: incidents.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: incident-server
port:
number: 80TLS Certificate (cert-manager)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginxAuto-scaling Configuration
Horizontal Pod Autoscaler
k8s/hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: incident-server-hpa
namespace: incident-management
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: incident-server
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: MaxVertical Pod Autoscaler (Optional)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: incident-server-vpa
namespace: incident-management
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: incident-server
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: incident-server
maxAllowed:
cpu: 1000m
memory: 2Gi
minAllowed:
cpu: 100m
memory: 128MiMonitoring Configuration
ServiceMonitor for Prometheus
k8s/monitoring.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: incident-server-metrics
namespace: incident-management
labels:
app: incident-server
spec:
selector:
matchLabels:
app: incident-server
endpoints:
- port: http
path: /metrics
interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: incident-management-alerts
namespace: incident-management
spec:
groups:
- name: incident-management
rules:
- alert: IncidentServerDown
expr: up{job="incident-server-metrics"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Incident Management Server is down"
description: "Incident management server has been down for more than 2 minutes"
- alert: HighResponseTime
expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High response time for incident management API"
description: "95th percentile response time is above 1 second"
- alert: DatabaseConnectionFailure
expr: postgresql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL database is down"
description: "PostgreSQL database has been unreachable for more than 1 minute"Grafana Dashboard ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: incident-management-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
incident-management.json: |
{
"dashboard": {
"title": "Incidents",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (method, endpoint)"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))"
}
]
}
]
}
}Security Configuration
NetworkPolicy
k8s/network-policy.yaml:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: incident-management-network-policy
namespace: incident-management
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
- from:
- podSelector:
matchLabels:
app: incident-server
ports:
- protocol: TCP
port: 5432 # PostgreSQL
- protocol: TCP
port: 6379 # Redis
egress:
- to: []
ports:
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53 # DNS
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379Pod Security Policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: incident-management-psp
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- "configMap"
- "emptyDir"
- "projected"
- "secret"
- "downwardAPI"
- "persistentVolumeClaim"
runAsUser:
rule: "MustRunAsNonRoot"
seLinux:
rule: "RunAsAny"
fsGroup:
rule: "RunAsAny"Backup Configuration
Database Backup CronJob
k8s/backup.yaml:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
namespace: incident-management
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: postgres:15-alpine
command:
- /bin/bash
- -c
- |
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
pg_dump -h postgres -U incidents incidents | gzip > /backup/incidents_${TIMESTAMP}.sql.gz
find /backup -name "incidents_*.sql.gz" -mtime +7 -delete
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: incident-management-secrets
key: postgres-password
volumeMounts:
- name: backup-storage
mountPath: /backup
restartPolicy: OnFailure
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: backup-pvc
namespace: incident-management
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: "standard"Deployment Management
Kustomization Configuration
k8s/kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: incident-management
resources:
- namespace.yaml
- secret.yaml
- configmap.yaml
- postgres.yaml
- redis.yaml
- incident-server.yaml
- ingress.yaml
- monitoring.yaml
- hpa.yaml
- network-policy.yaml
- backup.yaml
images:
- name: your-registry/incidents
newTag: v1.0.0
patchesStrategicMerge:
- patches/production-patch.yaml
configMapGenerator:
- name: incident-management-config
literals:
- LOG_LEVEL=info
- ENVIRONMENT=production
secretGenerator:
- name: incident-management-secrets
literals:
- postgres-password=your-secure-password
- jwt-secret=your-jwt-secret
- redis-password=your-redis-passwordProduction Patches
k8s/patches/production-patch.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: incident-server
spec:
replicas: 5
template:
spec:
containers:
- name: incident-server
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
env:
- name: LOG_LEVEL
value: "warn"
- name: ENVIRONMENT
value: "production"Operations & Maintenance
Deployment Commands
# Deploy new version
kubectl set image deployment/incident-server incident-server=your-registry/incidents:v1.1.0 -n incident-management
# Rollback deployment
kubectl rollout undo deployment/incident-server -n incident-management
# Check rollout status
kubectl rollout status deployment/incident-server -n incident-management
# Scale deployment
kubectl scale deployment incident-server --replicas=5 -n incident-managementMonitoring Commands
# Check pod status
kubectl get pods -n incident-management
# View pod logs
kubectl logs -f deployment/incident-server -n incident-management
# Check resource usage
kubectl top pods -n incident-management
# View events
kubectl get events -n incident-management --sort-by='.lastTimestamp'Troubleshooting Commands
# Describe problematic resources
kubectl describe pod <pod-name> -n incident-management
kubectl describe deployment incident-server -n incident-management
# Access pod shell for debugging
kubectl exec -it <pod-name> -n incident-management -- /bin/sh
# Port forward for local access
kubectl port-forward svc/incident-server 8080:80 -n incident-management
# Check ingress configuration
kubectl describe ingress incident-management-ingress -n incident-managementBest Practices
Resource Management
- Set resource requests and limits for all containers
- Use horizontal pod autoscaling for application pods
- Implement proper health checks for reliability
- Configure pod disruption budgets for availability
Security
- Run containers as non-root users
- Use read-only root filesystems where possible
- Implement network policies to restrict traffic
- Regularly update container images and scan for vulnerabilities
High Availability
- Deploy across multiple nodes using pod anti-affinity
- Use persistent storage with replication
- Implement proper backup strategies
- Configure monitoring and alerting
Performance
- Monitor resource usage and adjust as needed
- Use appropriate storage classes for different workloads
- Implement caching strategies with Redis
- Optimize database queries and connections
This Kubernetes deployment provides a robust, scalable, and secure foundation for running the incident management platform in production environments.