Private Deployment
Troubleshooting

Status & Troubleshooting

This guide covers checking deployment status, viewing logs, and troubleshooting common issues.

Checking Deployment Status

Using the CLI

Get comprehensive deployment status:

rulebricks status

This displays:

  • Infrastructure: Cluster endpoint, node status, resource usage
  • Kubernetes: Node count, pod distribution, health
  • Database: Availability, endpoints, connection status
  • Application: Deployment status, replicas, versions
  • Services: Endpoints, versions, health
  • Certificates: Validity, expiration dates

Status Output Example

Infrastructure:
  Cluster: my-rulebricks-cluster
  Endpoint: https://abc123.region.eks.amazonaws.com
  Nodes: 3/3 healthy

Kubernetes:
  Nodes: 3
  CPU Usage: 45%
  Memory Usage: 62%

Database:
  Type: self-hosted
  Status: Running
  Endpoint: postgresql.my-namespace.svc.cluster.local:5432

Application:
  Status: Running
  Replicas: 2/2
  Version: 1.2.3

Services:
  App: https://app.example.com
  Grafana: https://grafana.example.com
  Supabase: https://supabase.example.com

Certificates:
  app.example.com: Valid (expires in 89 days)

Viewing Logs

Component Logs

View logs from specific components:

# View app logs
rulebricks logs app
 
# Follow logs in real-time
rulebricks logs app -f
 
# View last N lines
rulebricks logs app --tail 500
 
# View all components
rulebricks logs all -f

Available Components

  • app - Main Rulebricks application
  • hps - HPS service (rule processing)
  • workers - Worker pods
  • database - PostgreSQL database
  • supabase - All Supabase services
  • traefik - Ingress controller
  • prometheus - Metrics collection
  • grafana - Monitoring dashboards
  • all - Combined logs from all components

Using kubectl

You can also use kubectl directly:

# List all pods
kubectl get pods --all-namespaces
 
# View logs from a pod
kubectl logs <pod-name> -n <namespace> -f
 
# View logs from all pods in a deployment
kubectl logs -l app=rulebricks-app -n <namespace> -f
 
# View previous container logs (if pod restarted)
kubectl logs <pod-name> -n <namespace> --previous

Common Issues and Solutions

Infrastructure Issues

Cluster Creation Fails

Symptoms:

  • Terraform errors during deployment
  • Timeout errors
  • Resource quota errors

Solutions:

  1. Check cloud provider quotas:

    • AWS: Check service quotas in AWS Console
    • GCP: Verify billing is enabled and quotas are sufficient
    • Azure: Check subscription quotas
  2. Verify credentials:

    # AWS
    aws sts get-caller-identity
     
    # GCP
    gcloud auth list
     
    # Azure
    az account show
  3. Check for existing resources:

    • Verify no conflicting resource names
    • Check for existing clusters with same name
    • Review Terraform state
  4. Review Terraform logs:

    rulebricks deploy --verbose

Cluster Not Accessible

Symptoms:

  • kubectl commands fail
  • Cannot connect to cluster

Solutions:

  1. Verify kubectl context:

    kubectl config current-context
    kubectl config get-contexts
  2. Update kubeconfig:

    # AWS
    aws eks update-kubeconfig --name <cluster-name> --region <region>
     
    # GCP
    gcloud container clusters get-credentials <cluster-name> --region <region>
     
    # Azure
    az aks get-credentials --resource-group <rg> --name <cluster-name>
  3. Check network connectivity:

    • Verify firewall rules
    • Check security groups
    • Test network connectivity

Application Issues

Pods Not Starting

Symptoms:

  • Pods in Pending or CrashLoopBackOff state
  • Pods not reaching Running state

Solutions:

  1. Check pod status:

    kubectl get pods --all-namespaces
    kubectl describe pod <pod-name> -n <namespace>
  2. Review pod events:

    kubectl get events --all-namespaces --sort-by='.lastTimestamp'
  3. Check resource constraints:

    kubectl top nodes
    kubectl describe nodes
  4. Review pod logs:

    kubectl logs <pod-name> -n <namespace>
    kubectl logs <pod-name> -n <namespace> --previous
  5. Common causes:

    • Insufficient node resources
    • Image pull errors
    • Configuration errors
    • Resource quota limits

Application Not Responding

Symptoms:

  • 502/503 errors
  • Timeout errors
  • Service unavailable

Solutions:

  1. Check service status:

    kubectl get svc --all-namespaces
    kubectl describe svc <service-name> -n <namespace>
  2. Verify ingress:

    kubectl get ingress --all-namespaces
    kubectl describe ingress <ingress-name> -n <namespace>
  3. Check pod health:

    kubectl get pods -n <namespace>
    rulebricks status
  4. Review application logs:

    rulebricks logs app -f
  5. Test connectivity:

    # Test service endpoint
    kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
    curl http://localhost:8080

Database Issues

Database Connection Failures

Symptoms:

  • Application cannot connect to database
  • Database connection errors in logs

Solutions:

  1. Check database status:

    rulebricks status
    kubectl get pods -n <database-namespace>
  2. Verify database is running:

    kubectl logs <db-pod> -n <database-namespace>
  3. Test database connectivity:

    kubectl exec -it <db-pod> -n <database-namespace> -- psql -U postgres -c "SELECT version();"
  4. Check database credentials:

    kubectl get secret <db-secret> -n <namespace> -o yaml
  5. Review connection string:

    • Verify database host/port
    • Check credentials
    • Verify network policies

Database Migration Failures

Symptoms:

  • Migration errors in logs
  • Database schema not updated

Solutions:

  1. Review migration logs:

    rulebricks logs database
  2. Check migration status:

    kubectl logs <app-pod> -n <namespace> | grep -i migration
  3. Manual migration (if needed):

    • Connect to database
    • Review migration files
    • Run migrations manually if necessary

Certificate Issues

TLS Certificate Not Generated

Symptoms:

  • Certificate not issued
  • HTTPS not working
  • Certificate errors in browser

Solutions:

  1. Check cert-manager:

    kubectl get pods -n cert-manager
    kubectl logs -n cert-manager -l app=cert-manager
  2. Verify certificate requests:

    kubectl get certificaterequests --all-namespaces
    kubectl describe certificaterequest <name> -n <namespace>
  3. Check DNS configuration:

    dig your-domain.com
    nslookup your-domain.com
  4. Verify domain ownership:

    • Ensure DNS points to load balancer
    • Check DNS propagation (can take 5-30 minutes)
    • Verify ports 80 and 443 are accessible
  5. Review ACME challenges:

    kubectl get challenges --all-namespaces
    kubectl describe challenge <name> -n <namespace>

Certificate Expired

Symptoms:

  • Certificate expiration warnings
  • HTTPS errors

Solutions:

  1. Check certificate status:

    kubectl get certificates --all-namespaces
    kubectl describe certificate <name> -n <namespace>
  2. Force renewal (if needed):

    kubectl delete certificaterequest <name> -n <namespace>
    # cert-manager will automatically create a new request
  3. Verify automatic renewal:

    • cert-manager automatically renews certificates
    • Check renewal schedule in certificate spec

Resource Issues

Out of Resources

Symptoms:

  • Pods in Pending state
  • InsufficientCPU or InsufficientMemory events
  • Node resource exhaustion

Solutions:

  1. Check node resources:

    kubectl top nodes
    kubectl describe nodes
  2. Check pod resources:

    kubectl top pods --all-namespaces
  3. Scale up cluster:

    • Increase node_count in configuration
    • Enable autoscaling
    • Deploy with updated configuration
  4. Optimize resource requests:

    • Review resource requests in configuration
    • Adjust based on actual usage
    • Consider using larger instance types

High Resource Usage

Symptoms:

  • High CPU/memory usage
  • Performance degradation
  • Pod evictions

Solutions:

  1. Identify resource consumers:

    kubectl top pods --all-namespaces --sort-by=cpu
    kubectl top pods --all-namespaces --sort-by=memory
  2. Review resource limits:

    kubectl describe pod <pod-name> -n <namespace>
  3. Scale services:

    • Increase replicas for stateless services
    • Enable autoscaling
    • Adjust resource limits
  4. Optimize configuration:

    • Review performance settings
    • Adjust volume level
    • Tune Kafka and worker settings

Debug Mode

Enable verbose logging for detailed debugging:

rulebricks deploy --verbose
rulebricks status --verbose
rulebricks logs app -v

Getting Help

Collecting Debug Information

Before seeking help, collect:

  1. Configuration:

    cat rulebricks.yaml
  2. Status:

    rulebricks status > status.txt
  3. Logs:

    rulebricks logs all > logs.txt
  4. Kubernetes state:

    kubectl get all --all-namespaces > k8s-state.txt
    kubectl describe nodes > nodes.txt
  5. Events:

    kubectl get events --all-namespaces > events.txt

Support Resources

When contacting support, include:

  • Configuration file (without secrets)
  • Status output
  • Relevant logs
  • Error messages
  • Steps to reproduce

Best Practices

  1. Monitor regularly: Check status and logs regularly
  2. Set up alerts: Configure monitoring alerts
  3. Keep backups: Regular database and configuration backups
  4. Test changes: Test in development before production
  5. Document customizations: Keep notes on manual changes
  6. Review logs: Regularly review logs for issues
  7. Update regularly: Keep CLI and deployments updated

Next Steps