Status & Troubleshooting

This guide covers checking deployment status, viewing logs, and troubleshooting common issues.

Checking Deployment Status

Using the CLI

Get comprehensive deployment status:

rulebricks status

This displays:

Infrastructure: Cluster endpoint, node status, resource usage
Kubernetes: Node count, pod distribution, health
Database: Availability, endpoints, connection status
Application: Deployment status, replicas, versions
Services: Endpoints, versions, health
Certificates: Validity, expiration dates

Status Output Example

Infrastructure:
  Cluster: my-rulebricks-cluster
  Endpoint: https://abc123.region.eks.amazonaws.com
  Nodes: 3/3 healthy

Kubernetes:
  Nodes: 3
  CPU Usage: 45%
  Memory Usage: 62%

Database:
  Type: self-hosted
  Status: Running
  Endpoint: postgresql.my-namespace.svc.cluster.local:5432

Application:
  Status: Running
  Replicas: 2/2
  Version: 1.2.3

Services:
  App: https://app.example.com
  Grafana: https://grafana.example.com
  Supabase: https://supabase.example.com

Certificates:
  app.example.com: Valid (expires in 89 days)

Viewing Logs

Component Logs

View logs from specific components:

# View app logs
rulebricks logs app
 
# Follow logs in real-time
rulebricks logs app -f
 
# View last N lines
rulebricks logs app --tail 500
 
# View all components
rulebricks logs all -f

Available Components

app - Main Rulebricks application
hps - HPS service (rule processing)
workers - Worker pods
database - PostgreSQL database
supabase - All Supabase services
traefik - Ingress controller
prometheus - Metrics collection
grafana - Monitoring dashboards
all - Combined logs from all components

Using kubectl

You can also use kubectl directly:

# List all pods
kubectl get pods --all-namespaces
 
# View logs from a pod
kubectl logs <pod-name> -n <namespace> -f
 
# View logs from all pods in a deployment
kubectl logs -l app=rulebricks-app -n <namespace> -f
 
# View previous container logs (if pod restarted)
kubectl logs <pod-name> -n <namespace> --previous

Common Issues and Solutions

Infrastructure Issues

Cluster Creation Fails

Symptoms:

Terraform errors during deployment
Timeout errors
Resource quota errors

Solutions:

Check cloud provider quotas:
- AWS: Check service quotas in AWS Console
- GCP: Verify billing is enabled and quotas are sufficient
- Azure: Check subscription quotas

Verify credentials:

# AWS
aws sts get-caller-identity
 
# GCP
gcloud auth list
 
# Azure
az account show

Check for existing resources:
- Verify no conflicting resource names
- Check for existing clusters with same name
- Review Terraform state
Review Terraform logs:
```
rulebricks deploy --verbose
```

Cluster Not Accessible

Symptoms:

kubectl commands fail
Cannot connect to cluster

Solutions:

Verify kubectl context:

kubectl config current-context
kubectl config get-contexts

Update kubeconfig:

# AWS
aws eks update-kubeconfig --name <cluster-name> --region <region>
 
# GCP
gcloud container clusters get-credentials <cluster-name> --region <region>
 
# Azure
az aks get-credentials --resource-group <rg> --name <cluster-name>

Check network connectivity:
- Verify firewall rules
- Check security groups
- Test network connectivity

Application Issues

Pods Not Starting

Symptoms:

Pods in Pending or CrashLoopBackOff state
Pods not reaching Running state

Solutions:

Check pod status:

kubectl get pods --all-namespaces
kubectl describe pod <pod-name> -n <namespace>

Review pod events:

kubectl get events --all-namespaces --sort-by='.lastTimestamp'

Check resource constraints:

kubectl top nodes
kubectl describe nodes

Review pod logs:

kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous

Common causes:
- Insufficient node resources
- Image pull errors
- Configuration errors
- Resource quota limits

Application Not Responding

Symptoms:

502/503 errors
Timeout errors
Service unavailable

Solutions:

Check service status:

kubectl get svc --all-namespaces
kubectl describe svc <service-name> -n <namespace>

Verify ingress:

kubectl get ingress --all-namespaces
kubectl describe ingress <ingress-name> -n <namespace>

Check pod health:

kubectl get pods -n <namespace>
rulebricks status

Review application logs:
```
rulebricks logs app -f
```

Test connectivity:

# Test service endpoint
kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
curl http://localhost:8080

Database Issues

Database Connection Failures

Symptoms:

Application cannot connect to database
Database connection errors in logs

Solutions:

Check database status:

rulebricks status
kubectl get pods -n <database-namespace>

Verify database is running:

kubectl logs <db-pod> -n <database-namespace>

Test database connectivity:

kubectl exec -it <db-pod> -n <database-namespace> -- psql -U postgres -c "SELECT version();"

Check database credentials:

kubectl get secret <db-secret> -n <namespace> -o yaml

Review connection string:
- Verify database host/port
- Check credentials
- Verify network policies

Database Migration Failures

Symptoms:

Migration errors in logs
Database schema not updated

Solutions:

Review migration logs:
```
rulebricks logs database
```

Check migration status:

kubectl logs <app-pod> -n <namespace> | grep -i migration

Manual migration (if needed):
- Connect to database
- Review migration files
- Run migrations manually if necessary

Certificate Issues

TLS Certificate Not Generated

Symptoms:

Certificate not issued
HTTPS not working
Certificate errors in browser

Solutions:

Check cert-manager:

kubectl get pods -n cert-manager
kubectl logs -n cert-manager -l app=cert-manager

Verify certificate requests:

kubectl get certificaterequests --all-namespaces
kubectl describe certificaterequest <name> -n <namespace>

Check DNS configuration:

dig your-domain.com
nslookup your-domain.com

Verify domain ownership:
- Ensure DNS points to load balancer
- Check DNS propagation (can take 5-30 minutes)
- Verify ports 80 and 443 are accessible

Review ACME challenges:

kubectl get challenges --all-namespaces
kubectl describe challenge <name> -n <namespace>

Certificate Expired

Symptoms:

Certificate expiration warnings
HTTPS errors

Solutions:

Check certificate status:

kubectl get certificates --all-namespaces
kubectl describe certificate <name> -n <namespace>

Force renewal (if needed):

kubectl delete certificaterequest <name> -n <namespace>
# cert-manager will automatically create a new request

Verify automatic renewal:
- cert-manager automatically renews certificates
- Check renewal schedule in certificate spec

Resource Issues

Out of Resources

Symptoms:

Pods in Pending state
InsufficientCPU or InsufficientMemory events
Node resource exhaustion

Solutions:

Check node resources:

kubectl top nodes
kubectl describe nodes

Check pod resources:
```
kubectl top pods --all-namespaces
```
Scale up cluster:
- Increase node_count in configuration
- Enable autoscaling
- Deploy with updated configuration
Optimize resource requests:
- Review resource requests in configuration
- Adjust based on actual usage
- Consider using larger instance types

High Resource Usage

Symptoms:

High CPU/memory usage
Performance degradation
Pod evictions

Solutions:

Identify resource consumers:

kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memory

Review resource limits:

kubectl describe pod <pod-name> -n <namespace>

Scale services:
- Increase replicas for stateless services
- Enable autoscaling
- Adjust resource limits
Optimize configuration:
- Review performance settings
- Adjust volume level
- Tune Kafka and worker settings

Debug Mode

Enable verbose logging for detailed debugging:

rulebricks deploy --verbose
rulebricks status --verbose
rulebricks logs app -v

Getting Help

Collecting Debug Information

Before seeking help, collect:

Configuration:
```
cat rulebricks.yaml
```
Status:
```
rulebricks status > status.txt
```
Logs:
```
rulebricks logs all > logs.txt
```

Kubernetes state:

kubectl get all --all-namespaces > k8s-state.txt
kubectl describe nodes > nodes.txt

Events:

kubectl get events --all-namespaces > events.txt

Support Resources

Documentation: This guide and other documentation pages
GitHub Issues: GitHub Issues (opens in a new tab)
Email Support: support@rulebricks.com

When contacting support, include:

Configuration file (without secrets)
Status output
Relevant logs
Error messages
Steps to reproduce

Best Practices

Monitor regularly: Check status and logs regularly
Set up alerts: Configure monitoring alerts
Keep backups: Regular database and configuration backups
Test changes: Test in development before production
Document customizations: Keep notes on manual changes
Review logs: Regularly review logs for issues
Update regularly: Keep CLI and deployments updated

Next Steps

Set up monitoring: See Monitoring & Logging
Configure backups: Set up automated backups
Plan upgrades: See Upgrades & Maintenance
Optimize performance: Review performance configuration

Intalling Upgrades Developer Guide