Skip to main content

Troubleshooting

Common issues and solutions for KubeZero platform.

Overview

This guide helps you diagnose and resolve common issues with KubeZero deployments.

Diagnostic Tools

Use these tools for troubleshooting:

# Check platform status
kubezero status

# Validate configuration
kubezero validate

# Get detailed information
kubezero describe --component argocd

# Check logs
kubezero logs --component istio --follow

Common Issues

Platform Installation

Issue: Bootstrap fails with timeout error

Solution:

  1. Check network connectivity
  2. Verify cloud credentials
  3. Increase timeout values
  4. Check resource quotas
# Increase timeout
kubezero bootstrap --timeout 30m

# Check logs
kubezero logs --component bootstrap

Application Deployment

Issue: Application stuck in pending state

Solution:

  1. Check resource availability
  2. Verify image pull secrets
  3. Check node selectors
  4. Review security policies
# Check pod events
kubectl describe pod <pod-name>

# Check resource usage
kubectl top nodes
kubectl top pods

GitOps Sync Issues

Issue: ArgoCD applications out of sync

Solution:

  1. Check Git repository access
  2. Verify webhook configuration
  3. Manual sync if needed
  4. Check application health
# Force sync
kubezero app sync <app-name>

# Check ArgoCD status
kubectl get applications -n argocd

Networking Issues

Issue: Service mesh communication failures

Solution:

  1. Check service mesh configuration
  2. Verify mTLS settings
  3. Review network policies
  4. Check DNS resolution
# Check Istio configuration
istioctl analyze

# Verify mTLS status
istioctl authn tls-check

# Test connectivity
kubezero network test --from pod1 --to pod2

Certificate Issues

Issue: TLS certificate errors

Solution:

  1. Check cert-manager status
  2. Verify DNS configuration
  3. Review certificate requests
  4. Check rate limits
# Check certificate status
kubectl get certificates

# Debug certificate issues
kubectl describe certificaterequest

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

Performance Issues

High Resource Usage

Symptoms:

  • Slow response times
  • Pod evictions
  • Out of memory errors

Diagnosis:

# Check resource usage
kubectl top nodes
kubectl top pods --all-namespaces

# Check metrics
kubezero metrics --component platform

Solutions:

  1. Scale resources up
  2. Optimize application code
  3. Implement resource limits
  4. Add more nodes

Storage Issues

Symptoms:

  • Persistent volume errors
  • Database connection failures
  • Slow I/O performance

Diagnosis:

# Check storage classes
kubectl get storageclass

# Check persistent volumes
kubectl get pv,pvc --all-namespaces

# Monitor storage metrics
kubezero metrics --component storage

Debugging Workflows

Step-by-Step Debugging

  1. Identify the problem

    • Gather symptoms
    • Check error messages
    • Review recent changes
  2. Collect information

    • Platform status
    • Application logs
    • Resource usage
    • Network connectivity
  3. Isolate the issue

    • Test components individually
    • Check dependencies
    • Verify configurations
  4. Apply solutions

    • Make targeted fixes
    • Test changes
    • Monitor results

Log Analysis

Effective log analysis:

# Platform logs
kubezero logs --component all --since 1h

# Application logs
kubectl logs deployment/my-app --follow

# System logs
journalctl -u kubelet --since "1 hour ago"

Getting Help

When you need additional support:

  1. Documentation: Check relevant docs
  2. Community: Ask in discussions
  3. Issues: Create GitHub issues
  4. Support: Contact support team

Creating Good Bug Reports

Include this information:

  • KubeZero version
  • Cloud provider and region
  • Error messages and logs
  • Steps to reproduce
  • Expected vs actual behavior

Prevention

Prevent issues before they occur:

  1. Regular Updates: Keep platform updated
  2. Monitoring: Implement comprehensive monitoring
  3. Testing: Test changes in staging
  4. Documentation: Keep runbooks current
  5. Training: Train team on troubleshooting

For detailed troubleshooting procedures, see the troubleshooting reference.