Monitoring Configuration Reference
This document provides comprehensive configuration options for monitoring and observability in KubeZero environments.
Core Monitoring Stack
Prometheus Configuration
Basic Setup
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: kubezero-prometheus
namespace: monitoring
spec:
replicas: 2
retention: "30d"
storage:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
Service Discovery
spec:
serviceMonitorSelector:
matchLabels:
team: platform
podMonitorSelector:
matchLabels:
monitoring: enabled
ruleSelector:
matchLabels:
prometheus: kubezero
Grafana Configuration
Dashboard Provisioning
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboards
namespace: monitoring
data:
kubernetes-overview.json: |
{
"dashboard": {
"id": null,
"title": "Kubernetes Overview",
"tags": ["kubernetes"],
"panels": []
}
}
Data Source Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
access: proxy
isDefault: true
Alerting Configuration
AlertManager Setup
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: kubezero-alertmanager
spec:
replicas: 3
storage:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Alert Rules
Cluster Health Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cluster-health
spec:
groups:
- name: cluster.rules
rules:
- alert: NodeDown
expr: up{job="node-exporter"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} is down"
Application Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: application-alerts
spec:
groups:
- name: apps.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
Notification Channels
Slack Integration
global:
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
title: 'KubeZero Alert'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
Email Notifications
receivers:
- name: 'email-notifications'
email_configs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.gmail.com:587'
auth_username: '[email protected]'
auth_password: 'app-password'
subject: 'KubeZero Alert: {{ .GroupLabels.alertname }}'
Logging Configuration
Fluent Bit Setup
apiVersion: logging.coreos.com/v1
kind: FluentBit
metadata:
name: kubezero-fluent-bit
spec:
inputs:
- name: tail
path: /var/log/containers/*.log
parser: cri
tag: kubernetes.*
filters:
- name: kubernetes
match: kubernetes.*
kube_url: https://kubernetes.default.svc:443
outputs:
- name: elasticsearch
match: '*'
host: elasticsearch.logging.svc.cluster.local
port: 9200
Loki Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
data:
loki.yaml: |
server:
http_listen_port: 3100
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
Tracing Configuration
Jaeger Setup
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: kubezero-jaeger
spec:
strategy: production
storage:
type: elasticsearch
elasticsearch:
nodeCount: 3
redundancyPolicy: SingleRedundancy
OpenTelemetry Collector
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: kubezero-otel-collector
spec:
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
jaeger:
endpoint: jaeger-collector:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
Custom Metrics
ServiceMonitor Examples
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-metrics
spec:
selector:
matchLabels:
app: my-application
endpoints:
- port: metrics
interval: 30s
path: /metrics
PodMonitor Examples
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: pod-metrics
spec:
selector:
matchLabels:
monitoring: enabled
podMetricsEndpoints:
- port: metrics
interval: 15s
Performance Tuning
Prometheus Optimization
spec:
prometheus:
prometheusSpec:
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: fast-ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi
Grafana Optimization
spec:
grafana:
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1"
persistence:
enabled: true
size: 10Gi
storageClassName: fast-ssd
Security Configuration
RBAC for Monitoring
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: monitoring-network-policy
spec:
podSelector:
matchLabels:
app.kubernetes.io/part-of: monitoring
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
egress:
- to:
- namespaceSelector: {}