Autoscale your Host workloads in DuploCloud
DuploCloud supports various ways to scale Host workloads, depending on the underlying AWS services being used.
Scale to or from zero when creating Autoscaling Groups in DuploCloud
DuploCloud allows you to scale to or from zero in Amazon EKS clusters by enabling the Scale from Zero option within the Advanced Options when creating an Autoscaling Group. This feature intelligently adjusts the number of instances in your cluster, dynamically scaling up when demand increases and down to zero when resources are not in use. Reducing resource allocation during idle periods leads to significant cost savings.
Autoscaling to zero is ideal for Kubernetes workloads that don’t always require 100% availability such as:
Non-Critical Workloads: Batch processing jobs, data analysis tasks, and other non-customer-facing services that can be scaled down to zero during off-peak hours (e.g., nights or weekends).
Dev/Test Environments: Development and testing environments that can be scaled up when developers need them and scaled down when not in use.
Background Jobs: Workloads with background jobs running in Kubernetes that are only needed intermittently, such as those triggered by specific events or scheduled at certain times.
Autoscaling to zero is not suitable for all workloads. Avoid using this feature for:
Customer-Facing Applications: Frontend web applications that must always be available should not use autoscaling to zero, as it can cause downtime and negatively impact user experience.
Workloads Outside Kubernetes: If background jobs or other processes are not running in Kubernetes, autoscaling to zero will not apply. Different scaling strategies are required for these environments.
Scaling to or from zero with AWS Autoscaling Groups (ASG) offers several advantages depending on the context and requirements of your application:
Cost Savings: By scaling down to zero instances during periods of low demand, you minimize costs associated with running and maintaining instances. This pay-as-you-go model ensures you only pay for resources when they are actively being used.
Resource Efficiency: Scaling to zero ensures that resources are not wasted during periods of low demand. By terminating instances when they are not needed, you optimize resource utilization and prevent over-provisioning, leading to improved efficiency and reduced infrastructure costs.
Flexibility: Scaling to zero provides the flexibility to dynamically adjust your infrastructure in response to changes in workload. It allows you to efficiently allocate resources based on demand, ensuring that your application can scale up or down seamlessly to meet varying levels of traffic.
Simplified Management: With automatic scaling to zero, you can streamline management tasks associated with provisioning and de-provisioning instances. The ASG handles scaling operations automatically, reducing the need for manual intervention and simplifying infrastructure management.
Rapid Response to Increased Demand: Scaling from zero allows your infrastructure to quickly respond to spikes in traffic or sudden increases in workload. By automatically launching instances as needed, you ensure that your application can handle surges in demand without experiencing performance degradation or downtime.
Improved Availability: Scaling from zero helps maintain optimal availability and performance for your application by ensuring that sufficient resources are available to handle incoming requests. This proactive approach to scaling helps prevent resource constraints and ensures a consistent user experience even during peak usage periods.
Enhanced Scalability: Scaling from zero enables your infrastructure to scale out horizontally, adding additional instances as demand grows. This horizontal scalability allows you to seamlessly handle increases in workload and accommodate a growing user base without experiencing bottlenecks or performance issues.
Elasticity: Scaling from zero provides elasticity to your infrastructure, allowing it to expand and contract based on demand. This elasticity ensures that you can efficiently allocate resources to match changing workload patterns, resulting in optimal resource utilization and cost efficiency.
Create Autoscaling groups to scale EC2 instances to your workload
Configure Autoscaling Groups (ASG) to ensure the application load is scaled based on the number of EC2 instances configured. Autoscaling detects unhealthy instances and launches new EC2 instances. ASG is also cost-effective as EC2 Instances are dynamically created per the application requirement within minimum and maximum count limits.
The Use for Cluster Autoscaling option will not be available until you enable the Cluster Autoscaler option in your Infrastructure.
In the DuploCloud Portal, navigate to Cloud Services -> Hosts.
In the ASG tab, click Add. The Add ASG page is displayed.
In the Friendly Name field, enter the name of the ASG.
Select Availability Zone and Instance Type.
In the Instance Count field, enter the desired capacity for the Autoscaling group.
In the Minimum Instances field, enter the minimum number of instances. The Autoscaling group ensures that the total number of instances is always greater than or equal to the minimum number of instances.
In the Maximum Instances field, enter the maximum number of instances. The Autoscaling group ensures that the total number of instances is always less than or equal to the maximum number of instances.
Select Use for Cluster Autoscaling.
Select Advanced Options.
Select the appropriate Image ID.
From the Agent Platform list box, select Linux Docker/Native to run a Docker service or select EKS Linux to run services using EKS. Fill in additional fields as needed for your ASG.
Optionally, enable Spot Instances.
Optionally, for EKS only, enable Scale from zero.
Click Add. Your ASG is added and displayed in the ASG tab.
View the Hosts created as part of ASG creation from the ASG Hosts tab.
Refer to AWS Documentation for detailed steps on creating Scaling policies for the Autoscaling Group.
The DuploCloud Portal provides the ability to configure Services based on the platforms EKS Linux and Linux Docker/Native. Select the ASG based on the platform used when creating services and Autoscaling groups. Optionally, if you previously enabled Spot Instances in the ASG, you can configure the Service to use Spot Instances by selecting Tolerate spot instances.
ECS Autoscaling has the ability to scale the desired count of tasks for the ECS Service configured in your infrastructure. Average CPU/Memory metrics of your tasks are used to increase/decrease the desired count value.
Navigate to Cloud Services -> ECS. Select the ECS Task Definition where Autoscaling needs to be enabled > Add Scaling Target
Set the MinCapacity (minimum value 2) and MaxCapacity to complete the configuration.
Once Autoscaling for Targets is configured, Next we have to add Scaling Policy
Provide details below:
Policy Name - The name of the scaling policy.
Policy Dimension - The metric type tracked by the target tracking scaling policy.. Select from the dropdown
Target Value - The target value for the metric.
Scalein Cooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start.
ScaleOut Cooldown -The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
Disable ScaleIn - Disabling scale-in makes sure this target tracking scaling policy will never be used to scale in the Autoscaling group
This step creates the target tracking scaling policy and attaches it to the Autoscaling group
View the Scaling Target and Policy Details from the DuploCloud Portal. Update and Delete Operations are also supported from this view
Create Autoscaling Groups (ASG) with Spot Instances in the DuploCloud platform
Spot Instances are spare capacity priced at a significant discount compared to On-Demand Instances. Users specify the maximum price (bid) they will pay per hour for a Spot Instance. The instance is launched if the current Spot price is below the user's bid. Since Spot Instances can be interrupted when spare capacity is unavailable, applications using Spot Instances must be fault-tolerant and able to handle interruptions.
Spot Instances are only supported for Auto-scaling Groups (ASG) with EKS
Follow the steps in the section Creating Autoscaling Groups (ASG). Before clicking Add, Click the box to access Advanced Options. Enable Use Spot Instances and enter your bid, in dollars, in the Maximum Spot Price field.
Follow the steps in Creating Services using Autoscaling Groups. In the Add Service page, Basic Options, Select Tolerate spot instances.
Tolerations will be entered by default in the Add Service page, Advanced Options, Other Container Config field.
Autoscale your DuploCloud Kubernetes deployment
Before autoscaling can be configured for your Kubernetes service, make sure that:
Horizontal Pod Autoscaler (HPA) automatically scales the Deployment and its ReplicaSet. HPA checks the metrics configured in regular intervals and then scales the replicas up or down accordingly.
You can configure HPA while creating a Deployment Service from the DuploCloud Portal.
In the DuploCloud Portal, navigate Kubernetes -> Services, displaying the Services page.
In Add Service - Basic Options, from the Replication Strategy list box, select Horizontal Pod Scheduler.
In the Horizontal Pod Autoscaler Config field, add a sample configuration, as shown below. Update the minimum/maximum Replica Count in the resource
attributes, based on your requirements.
Click Next to navigate to Advanced Options.
In Advanced Options, in the Other Container Config field, ensure your resource attributes, such as Limits
and Requests
, are set to work with your HPA configuration, as in the example below.
At the bottom of the Advanced Options page, click Create.
For HPA Configures Services, Replica is set as Auto in the DuploCloud Portal
When your services are running, Replicas: Auto is displayed on the Service page.
If a Kubernetes Service is running with a Horizontal Pod AutoScaler (HPA), you cannot stop the Service by clicking Stop in the service's Actions menu in the DuploCloud Portal.
Instead, do the following to stop the service from running:
In the DuploCloud Portal, navigate to Kubernetes -> Containers and select the Service you want to stop.
From the Actions menu, select Edit.
From the Replication Strategy list box, select Static Count.
In the Replicas field, enter 0 (zero).
Click Next to navigate to the Advanced Options page.
Click Update to update the service.
When the Cluster Autoscaler flag is set and a Tenant has one or more ASGs, an unschedulable-pod alert will be delayed by five (5) minutes to allow for autoscaling. You can configure the Infrastructure settings to bypass the delay and send the alerts in real-time.
From the DuploCloud portal, navigate to Administrator -> Infrastructure.
Click on the Infrastructure you want to configure settings for in the Name list.
Select the Settings tab.
Click the Add button. The Infra - Set Custom Data pane displays.
In the Setting Name list box, select Enables faults prior to autoscaling Kubernetes nodes.
Set the Enable toggle switch to enable the setting.
Click Set. DuploCloud will now generate faults for unschedulable K8s nodes immediately (before autoscaling).
is setup in the DuploCloud tenant
is enabled for your DuploCloud infrastructure
Create a new by clicking Add.