Monitoring Configuration Reference

This document provides comprehensive configuration options for monitoring and observability in KubeZero environments.

Core Monitoring Stack

Prometheus Configuration

Basic Setup

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kubezero-prometheus
  namespace: monitoring
spec:
  replicas: 2
  retention: "30d"
  storage:
    volumeClaimTemplate:
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
        storageClassName: fast-ssd

Service Discovery

spec:
  serviceMonitorSelector:
    matchLabels:
      team: platform
  podMonitorSelector:
    matchLabels:
      monitoring: enabled
  ruleSelector:
    matchLabels:
      prometheus: kubezero

Grafana Configuration

Dashboard Provisioning

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboards
  namespace: monitoring
data:
  kubernetes-overview.json: |
    {
      "dashboard": {
        "id": null,
        "title": "Kubernetes Overview",
        "tags": ["kubernetes"],
        "panels": []
      }
    }

Data Source Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  datasources.yaml: |
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus:9090
        access: proxy
        isDefault: true

Alerting Configuration

AlertManager Setup

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: kubezero-alertmanager
spec:
  replicas: 3
  storage:
    volumeClaimTemplate:
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

Alert Rules

Cluster Health Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: cluster-health
spec:
  groups:
    - name: cluster.rules
      rules:
        - alert: NodeDown
          expr: up{job="node-exporter"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.instance }} is down"

Application Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: application-alerts
spec:
  groups:
    - name: apps.rules
      rules:
        - alert: HighErrorRate
          expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High error rate detected"

Notification Channels

Slack Integration

global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        title: 'KubeZero Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

Email Notifications

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: '[email protected]'
        from: '[email protected]'
        smarthost: 'smtp.gmail.com:587'
        auth_username: '[email protected]'
        auth_password: 'app-password'
        subject: 'KubeZero Alert: {{ .GroupLabels.alertname }}'

Logging Configuration

Fluent Bit Setup

apiVersion: logging.coreos.com/v1
kind: FluentBit
metadata:
  name: kubezero-fluent-bit
spec:
  inputs:
    - name: tail
      path: /var/log/containers/*.log
      parser: cri
      tag: kubernetes.*
  filters:
    - name: kubernetes
      match: kubernetes.*
      kube_url: https://kubernetes.default.svc:443
  outputs:
    - name: elasticsearch
      match: '*'
      host: elasticsearch.logging.svc.cluster.local
      port: 9200

Loki Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
data:
  loki.yaml: |
    server:
      http_listen_port: 3100
    ingester:
      lifecycler:
        ring:
          kvstore:
            store: inmemory
    schema_config:
      configs:
        - from: 2020-10-24
          store: boltdb-shipper
          object_store: filesystem
          schema: v11

Tracing Configuration

Jaeger Setup

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: kubezero-jaeger
spec:
  strategy: production
  storage:
    type: elasticsearch
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy

OpenTelemetry Collector

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: kubezero-otel-collector
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
    exporters:
      jaeger:
        endpoint: jaeger-collector:14250
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [jaeger]

Custom Metrics

ServiceMonitor Examples

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-metrics
spec:
  selector:
    matchLabels:
      app: my-application
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

PodMonitor Examples

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: pod-metrics
spec:
  selector:
    matchLabels:
      monitoring: enabled
  podMetricsEndpoints:
    - port: metrics
      interval: 15s

Performance Tuning

Prometheus Optimization

spec:
  prometheus:
    prometheusSpec:
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "8Gi"
          cpu: "4"
      storageSpec:
        volumeClaimTemplate:
          spec:
            storageClassName: fast-ssd
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 200Gi

Grafana Optimization

spec:
  grafana:
    resources:
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "2Gi"
        cpu: "1"
    persistence:
      enabled: true
      size: 10Gi
      storageClassName: fast-ssd

Security Configuration

RBAC for Monitoring

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources: ["nodes", "services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: monitoring-network-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/part-of: monitoring
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
  egress:
    - to:
        - namespaceSelector: {}

Core Monitoring Stack​

Prometheus Configuration​

Basic Setup​

Service Discovery​

Grafana Configuration​

Dashboard Provisioning​

Data Source Configuration​

Alerting Configuration​

AlertManager Setup​

Alert Rules​

Cluster Health Alerts​

Application Alerts​

Notification Channels​

Slack Integration​

Email Notifications​

Logging Configuration​

Fluent Bit Setup​

Loki Configuration​

Tracing Configuration​

Jaeger Setup​

OpenTelemetry Collector​

Custom Metrics​

ServiceMonitor Examples​

PodMonitor Examples​

Performance Tuning​

Prometheus Optimization​

Grafana Optimization​

Security Configuration​

RBAC for Monitoring​

Network Policies​

Core Monitoring Stack

Prometheus Configuration

Basic Setup

Service Discovery

Grafana Configuration

Dashboard Provisioning

Data Source Configuration

Alerting Configuration

AlertManager Setup

Alert Rules

Cluster Health Alerts

Application Alerts

Notification Channels

Slack Integration

Email Notifications

Logging Configuration

Fluent Bit Setup

Loki Configuration

Tracing Configuration

Jaeger Setup

OpenTelemetry Collector

Custom Metrics

ServiceMonitor Examples

PodMonitor Examples

Performance Tuning

Prometheus Optimization

Grafana Optimization

Security Configuration

RBAC for Monitoring

Network Policies