githubEdit

Troubleshooting Duplo IAC Pipeline Failures Due to Service Unavailability

If your Duplo IAC pipeline is failing with API endpoints returning 404 or 503 errors, this is typically caused by services being in a stopped state or scaled down to zero replicas. Common Symptoms Pipeline fails with errors like "503 Server Error: Service Unavailable" API endpoints return 404 responses Services appear as "Status: Stopped" in the Kubernetes dashboard Root Causes The most common causes include: Services scaled to zero replicas: Auto-scaling policies may scale services down to 0 when there's no traffic Database connectivity issues: Services may fail to start due to database connection timeouts or storage exhaustion Resource allocation differences: Staging environments often have smaller resource limits compared to production Resolution Steps

  1. Check Service Status First, verify if your services are running by checking the Kubernetes dashboard. Look for services with "Status: Stopped".

  2. Update Minimum Replicas For services (not jobs), ensure the minimum replica count is set to 1, not 0: Services should have min-replicas: 1 Jobs can have min-replicas: 0

  3. Verify Database Connectivity If services fail to start with database connection errors: Check if the database instance is running Verify there's sufficient storage space Confirm the database connection parameters are correct

  4. Monitor Resource Allocation Staging environments may have different resource constraints than production: Check memory limits and CPU allocation Verify HPA (Horizontal Pod Autoscaler) settings Ensure ingress-gateway has adequate replicas Prevention To prevent similar issues: Set appropriate minimum replica counts for all services Monitor database storage and set up alerts for low disk space Review auto-scaling policies to ensure they don't scale critical services to zero Regularly test your staging environment to catch configuration drift

Last updated

Was this helpful?