Troubleshooting
Common issues and solutions for KubeZero platform.
Overview
This guide helps you diagnose and resolve common issues with KubeZero deployments.
Diagnostic Tools
Use these tools for troubleshooting:
# Check platform status
kubezero status
# Validate configuration
kubezero validate
# Get detailed information
kubezero describe --component argocd
# Check logs
kubezero logs --component istio --follow
Common Issues
Platform Installation
Issue: Bootstrap fails with timeout error
Solution:
- Check network connectivity
- Verify cloud credentials
- Increase timeout values
- Check resource quotas
# Increase timeout
kubezero bootstrap --timeout 30m
# Check logs
kubezero logs --component bootstrap
Application Deployment
Issue: Application stuck in pending state
Solution:
- Check resource availability
- Verify image pull secrets
- Check node selectors
- Review security policies
# Check pod events
kubectl describe pod <pod-name>
# Check resource usage
kubectl top nodes
kubectl top pods
GitOps Sync Issues
Issue: ArgoCD applications out of sync
Solution:
- Check Git repository access
- Verify webhook configuration
- Manual sync if needed
- Check application health
# Force sync
kubezero app sync <app-name>
# Check ArgoCD status
kubectl get applications -n argocd
Networking Issues
Issue: Service mesh communication failures
Solution:
- Check service mesh configuration
- Verify mTLS settings
- Review network policies
- Check DNS resolution
# Check Istio configuration
istioctl analyze
# Verify mTLS status
istioctl authn tls-check
# Test connectivity
kubezero network test --from pod1 --to pod2
Certificate Issues
Issue: TLS certificate errors
Solution:
- Check cert-manager status
- Verify DNS configuration
- Review certificate requests
- Check rate limits
# Check certificate status
kubectl get certificates
# Debug certificate issues
kubectl describe certificaterequest
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
Performance Issues
High Resource Usage
Symptoms:
- Slow response times
- Pod evictions
- Out of memory errors
Diagnosis:
# Check resource usage
kubectl top nodes
kubectl top pods --all-namespaces
# Check metrics
kubezero metrics --component platform
Solutions:
- Scale resources up
- Optimize application code
- Implement resource limits
- Add more nodes
Storage Issues
Symptoms:
- Persistent volume errors
- Database connection failures
- Slow I/O performance
Diagnosis:
# Check storage classes
kubectl get storageclass
# Check persistent volumes
kubectl get pv,pvc --all-namespaces
# Monitor storage metrics
kubezero metrics --component storage
Debugging Workflows
Step-by-Step Debugging
-
Identify the problem
- Gather symptoms
- Check error messages
- Review recent changes
-
Collect information
- Platform status
- Application logs
- Resource usage
- Network connectivity
-
Isolate the issue
- Test components individually
- Check dependencies
- Verify configurations
-
Apply solutions
- Make targeted fixes
- Test changes
- Monitor results
Log Analysis
Effective log analysis:
# Platform logs
kubezero logs --component all --since 1h
# Application logs
kubectl logs deployment/my-app --follow
# System logs
journalctl -u kubelet --since "1 hour ago"
Getting Help
When you need additional support:
- Documentation: Check relevant docs
- Community: Ask in discussions
- Issues: Create GitHub issues
- Support: Contact support team
Creating Good Bug Reports
Include this information:
- KubeZero version
- Cloud provider and region
- Error messages and logs
- Steps to reproduce
- Expected vs actual behavior
Prevention
Prevent issues before they occur:
- Regular Updates: Keep platform updated
- Monitoring: Implement comprehensive monitoring
- Testing: Test changes in staging
- Documentation: Keep runbooks current
- Training: Train team on troubleshooting
For detailed troubleshooting procedures, see the troubleshooting reference.