LogoLogo
HomePlatformAsk DuploCloudPricing
  • Overview
  • Product Updates
  • Workshops
    • DuploCloud 101 for AWS
      • Create Your Infrastructure and Application
        • 1. Log in to the DuploCloud Portal
        • 2. Create a DuploCloud Infrastructure
        • 3. Create a DuploCloud Tenant
        • 4. Create an EKS Worker Node
        • 5. Deploy an Application
        • 6. Create a Load Balancer
        • 7. Deploy an S3 Bucket
        • 8. Deploy a Database
        • 9. Create an Alarm
      • Daily Operations using DuploCloud
        • 1. Host, Container, and Kubectl Shell
        • 2. Logging
        • 3. Metrics
        • 4. Billing and Cost Management
        • 5. Audit Logs
        • 6 - Tenant and Admin Just-In-Time (JIT) AWS Access
        • 7. CI/CD
        • 8. Security Hub and Dashboard
        • 9. Terraform Mode of Operations
      • Post-workshop Reference Guide
        • Post-Workshop Testing and Documentation Links
        • Connect With Us
        • DuploCloud Whitepapers
        • DuploCloud Terraform Provider
        • DuploCloud AWS Demo Video
  • Getting Started with DuploCloud
    • What DuploCloud Does
    • DuploCloud Onboarding
    • Application Focused Interface: DuploCloud Architecture
      • DuploCloud Tenancy Models
      • DuploCloud Common Components
        • Infrastructure
        • Plan
        • Tenant
        • Hosts
        • Services
        • Diagnostics
      • Management Portal Scope
    • GRC Tools and DuploCloud
    • Public Cloud Tutorials
    • Getting Help with DuploCloud
  • Container Orchestrators
    • Terminologies in Container Orchestration
  • DuploCloud Prerequisites
    • DNS Configuration
  • AWS User Guide
    • Prerequisites
      • Route 53 Hosted Zone
      • ACM Certificate
      • Shell Access for Containers
      • VPN Setup
      • Connect to the VPN
    • AWS Quick Start
      • Step 1: Create Infrastructure and Plan
      • Step 2: Create a Tenant
      • Step 3: Create an RDS Database (Optional)
      • Creating an EKS Service
        • Step 4: Create a Host
        • Step 5: Create a Service
        • Step 6: Create a Load Balancer
        • Step 7: Enable Additional Load Balancer Options (Optional)
        • Step 8: Create a Custom DNS Name (Optional)
        • Step 9: Test the Application
      • Creating an ECS Service
        • Step 4: Create a Task Definition for an Application
        • Step 5: Create the ECS Service and Load Balancer
        • Step 6: Test the Application
      • Creating a Native Docker Service
        • Step 4: Create an EC2 Host
        • Step 5: Create a Service
        • Step 6: Create a Load Balancer
        • Step 7: Test the Application
    • AWS Use Cases
      • Creating an Infrastructure and Plan for AWS
        • EKS Setup
          • Enable EKS endpoints
          • Enable EKS logs
          • Enable Cluster Autoscaler
        • ECS Setup
          • Enable ECS logging
        • Add VPC endpoints
        • Security Group rules
        • Upgrading the EKS version
      • Creating a Tenant (Environment)
        • Setting Tenant session duration
        • Setting Tenant expiration
        • Tenant Config settings
      • Hosts (VMs)
        • Adding Hosts
        • Connect EC2 instance
        • Adding Shared Hosts
        • Adding Dedicated Hosts
        • Autoscaling Hosts
          • Autoscaling Groups (ASG)
            • Launch Templates
            • Instance Refresh for ASG
            • Scale to or from Zero
            • Spot Instances for AWS
          • ECS Autoscaling
          • Autoscaling in Kubernetes
        • Configure Auto-reboot
        • Create Amazon Machine Image (AMI)
        • Hibernate an EC2 Host
        • Snapshots
        • Taints for EKS Nodes
        • Disable Source Destination Check
      • Auditing
      • Logs
        • Enable Default-Tenant logging
        • Enable Non-Default Tenant logging
        • Configure Logging per Tenant
        • Display logs
        • Create custom logs
      • Diagnostics and Metrics
        • Metrics Setup
        • Metrics Dashboard
        • Kubernetes Administrator dashboard
      • Faults and Alerts
        • Alert notifications
        • Automatic alert creation
        • Automatic fault healing
        • SNS Topic Alerts
        • System Settings Flags
      • AWS Console link
      • Just-in-Time (JIT) Access
      • Billing and Cost management
        • Enable billing data
        • View billing data
        • Apply cost allocation tags
        • DuploCloud License Usage
        • Configure Billing Alerts
      • Resource Quotas
      • Big Data and ETL
      • Custom Resource tags
    • AWS Services
      • Containers and Services
        • EKS Containers and Services
          • Allocation Tagging
        • ECS Containers, Task Definitions and Services
        • Passing Configs and Secrets
        • Container Rollback
        • Docker Registry credentials
      • Load Balancers
        • Target Groups
        • EKS Load Balancers
        • ECS Services and Load Balancers
        • Native Docker Load Balancers
      • Storage
        • Storage Class and PVCs
        • GP3 Storage Class
      • API Gateway
      • Batch
      • CloudFront
      • Databases
        • AWS ElastiCache
        • AWS DynamoDB database
        • AWS Timestream database
        • RDS database
          • IAM authentication
          • Backup and restore
          • Sharing encrypted database
          • Manage RDS Snapshots
          • Add and manage RDS read replicas
            • Add Aurora RDS replicas
          • Add monitoring interval
          • Enable or disable RDS logging
          • Restrict RDS instance size
          • Add parameters in Parameter Groups
          • Manage Performance Insights
      • Data Pipeline
      • Elastic Container Registry (ECR)
        • Sharing ECR Repos
      • Elastic File System (EFS)
        • Mount an EFS in an EC2 instance
      • EMR Serverless
      • EventBridge
      • IoT (Internet of Things)
      • Kafka Cluster
      • Kinesis Stream
      • Lambda Functions
        • Configure Lambda with Container Images
        • Lambda Layers
      • Managed Airflow
      • NAT Gateway for HA
      • OpenSearch
      • Probes and Health Check
      • S3 Bucket
      • SNS Topic
      • SQS Queue
      • Virtual Private Cloud (VPC) Peering
      • Web App Firewall (WAF)
    • AWS FAQ
    • AWS Systems Settings
      • AWS Infrastructure Settings
      • AWS Tenant Settings
    • AWS Security Settings
      • Tenant Security settings
      • Infrastructure Security settings
      • System Security settings
      • AWS Account Security settings
      • Vanta Compliance Controls
  • GCP User Guide
    • Container deployments
      • Container orchestration features
      • Key DuploCloud concepts
    • Prerequisites
      • Docker Registry
      • Service Account Setup
      • Cloud DNS Zone
      • Certificates for Load Balancer and Ingress
      • Initial Infrastructure Setup
      • Tools Tenant
        • Enable Kubectl Shell
      • Docker
        • Docker Registry Credentials (Optional)
        • Shell Access for Docker (Optional)
      • VPN
        • VPN Setup
        • Connect to the VPN
      • Managed SSL Certificates with Certificate Manager (Optional)
    • GCP Quick Start
      • Step 1: Create Infrastructure and Plan
      • Step 2: Create a Tenant
      • Create a Service with GKE Autopilot
        • Step 3: Create a Service
        • Step 4: Create a Load Balancer
        • Step 5: Test the Application
      • Create a Service with GKE Standard
        • Step 3: Create a Node Pool
        • Step 4: Create a Service
        • Step 5: Create a Load Balancer
        • Step 6: Test the Application
    • GCP Use Cases
      • Creating an Infrastructure and Plan for GCP
        • Creating a GKE Autopilot Cluster
        • Creating GKE Standard Cluster
        • Kubectl token and config
        • Upgrading the GKE version
      • Creating a Tenant (Environment)
        • Tenant expiry
        • Tenant Config settings
      • Hosts (VMs)
      • Cost management for billing
        • Export Billing to BigQuery
        • Manage cross project billing in GCP
    • GCP Services
      • Containers and Services
      • GKE Containers and Services
        • Allocation Tagging
        • Docker Registry credentials
        • Container Rollback
        • Passing Config and Secrets
      • GCP Databases
        • Cloud SQL
        • Firestore Database
        • Managed Redis
      • Load Balancers
      • Cloud Armour
      • Cloud Credentials
      • Cloud Functions
      • Cloud Scheduler
      • Cloud Storage
      • Node Pools
      • Pub/Sub
    • GCP FAQs
    • GCP Systems Settings
      • GCP Infrastructure Settings
      • GCP Tenant Settings
    • GCP Security Settings
      • Infrastructure Security settings
      • GCP Account Security settings
  • Azure User Guide
    • Container deployments
      • Container orchestration features
      • Key DuploCloud concepts
    • Prerequisites
      • Program DNS entries
      • Set the AKS cluster version
      • Import SSL certificates
      • Provision the VPN
      • Connect to the VPN
      • Managed Identity Setup
    • Azure Quick Start
      • Step 1: Create Infrastructure and Plan
      • Step 2: Create a Tenant
      • Step 3: Create Agent Pools
      • Step 4: Create a Service
      • Step 5: Create a Load Balancer
      • Step 6: Test the Application
    • Azure Use Cases
      • Creating an Infrastructure and Plan for Azure
        • AKS initial setup
        • Kubectl token and config
        • Encrypted storage account
        • Upgrading the AKS version
      • Creating a Tenant (Environment)
        • Tenant expiry
        • Tenant Config settings
      • Hosts (VMs)
        • Autoscaling for Hosts
          • Autoscaling Azure Agent Pools
        • Shared Hosts
        • Availability Sets
        • Snapshots
      • Logs
      • Metrics
      • Faults and alerts
        • Alert notifications
      • Azure Portal link
      • Billing and Cost management
        • Enable billing data
        • Viewing billing data
    • Azure Services
      • Containers and Services
        • AKS Containers and Services
          • Allocation Tagging
        • Docker Registry Credentials
        • Container Rollback
        • Passing Configs and Secrets
      • Agent Pools
        • Spot Instances for AKS Agent Pools
      • Azure Container Registry (ACR)
      • Databases
        • MSSQL Server database
        • PostgreSQL database
        • PostgreSQL Flexible Server
        • MySQL Server database
          • Azure Managed SQL Instances
        • MySQL Flexible Server
        • Redis database
      • Docker Web Application
      • Databricks
      • Data Factory
      • Infra Secrets
      • Key Vault
      • Load Balancers
      • Public IP Address Prefix
      • Serverless
        • App Service Plans and Web Apps
        • Function Apps
      • Service Bus
      • Storage Account
      • Subscription
      • VM Scale Sets
    • Azure FAQ
    • Azure Systems Settings
      • Azure Infrastructure Settings
      • Azure Tenant Settings
    • Azure Security Settings
      • Tenant Security Settings
  • Kubernetes User Guide
    • Kubernetes Quick Start
    • Kubectl
      • Local Kubectl Setup
        • Kubectl Shell
      • Kubectl Shell
        • Enable Kubectl Shell for GKE
        • Enable Kubectl Shell for AKS
      • Kubectl Tokens and Access Management
      • Read-only Access in Kubernetes
      • Mirantis Lens
    • Configs and Secrets
      • Setting Kubernetes Secrets
      • Creating a Kubernetes ConfigMap
      • Setting Environment Variables (EVs) from a ConfigMap or Secret
      • Mounting ConfigMaps and Secrets as files
      • Using Kubernetes Secrets with Azure Storage connection data
      • Creating the SecretProviderClass Custom Resource to mount secrets
      • Managing Secrets and ConfigMaps access for readonly users (AWS and GCP)
    • Jobs
    • CronJobs
    • DaemonSet
    • Helm Charts
    • Ingress Loadbalancer
      • EKS Ingress
      • GKE Ingress
      • AKS Shared Application Gateway
        • Using an Azure Application Gateway SSL policy with Ingress
    • InitContainers and Sidecar Containers
    • HPA
    • Pod Toleration
    • Kubernetes Lifecycle Hooks
    • Kubernetes StorageClass and PVC
      • Native Azure Storage Classes
    • Import an External Kubernetes Cluster
    • Managed Service Accounts (RBAC)
    • Create a Diagnostics Application Service
  • Security and Compliance
    • Control Groups
    • Isolation and Firewall
      • Cloud Account
      • Network Segmentation
      • IAM
      • Security Groups
      • VPN
      • WAF
    • Access Management
      • Authentication Methods
      • Cloud Console, API and CLI
      • VM SSH
      • Container Shell
      • Kubernetes Access
      • Permission Sets
    • Encryption
      • At Rest Encryption
      • In Transit encryption
    • Tags and Label
    • Security Monitoring
      • Agent Management
      • SIEM
      • Vulnerabilities
      • Hardening Standards (CIS)
      • File Integrity Monitoring
      • Access Monitoring
      • HIDS
      • NIDS
      • Inventory Monitoring
        • Inventory Reports
      • Antivirus
      • VAPT (Pen Test)
      • AWS Security HUB
      • Alerting and Event Management
    • Compliance Frameworks
    • Security and Compliance Workflow
  • Terraform User Guide
    • DuploCloud Terraform Provider
    • DuploCloud Terraform Exporter
      • Install Terraform Exporter
      • Generate Terraform
      • Using Generated Code
      • Troubleshooting Guide
    • Terraform FAQ
  • Automation and Tools
    • DuploCtl CLI
    • Supported 3rd Party Tools
    • Automation Stacks
      • Clone from a Tenant
      • Create a deploy template
      • Deploy from a template
      • Customize deploy templates
  • CI/CD Overview
    • Service Accounts
    • GitHub Actions
      • Configure GitHub
      • Build a Docker image
      • Update a Kubernetes Service
      • Update an ECS Service
      • Update a Lambda function
      • Update CloudFront
      • Upload to S3 bucket
      • Execute Terraform
    • CircleCI
      • Configure CircleCI
      • Build and Push Docker Image
      • Update Service
    • GitLab CI/CD
      • Configure Gitlab
      • Build a Docker image
      • Update a service
    • Bitbucket Pipelines
      • Configure Bitbucket
      • Build a Docker image
      • Update the Service with Deploy Pipe
    • Azure Pipelines
      • Configure Azure DevOps
      • Build a Docker image from Azure DevOps
      • Update a Service
      • Troubleshooting
    • Katkit
      • Environments
      • Link repository
      • Phases
      • Katkit config
      • Advanced functions
  • User Administration
    • User Logins
    • User access to DuploCloud
    • API tokens
    • Session Timeout
    • Tenant Access for Users
      • Add Tenant access over a VPN
      • Read-only access to a Tenant
      • Cross-tenant Access
      • Deleting a Tenant
    • VPN access for users
    • Database access for users
    • SSO Configuration
      • Azure SSO Configuration
      • Okta Identity Management
    • Login Banner/Button Customization
  • Observability
    • Standard Observability Suite
      • Setup
        • Logging Setup
          • Custom Kibana Logging URL
        • Metrics Setup
        • Auditing
          • Custom Kibana Audit URL
      • Logs
      • Metrics
    • Advanced Observability Suite
      • Architecture
      • Dashboards
        • Administrator Dashboard
        • Tenant Dashboard
        • Customizing Dashboards
      • Logging with Loki
      • Metrics with Mimir
      • Tracing with Tempo
      • Profiles with Pyroscope
      • Alerts with Alert Manager
      • Service Level Objectives (SLOs)
      • OTEL Stack Resource Requirements
      • Application Instrumentation
      • Custom Metrics
      • Terraform
    • Faults and Alerts
      • Alert notifications
      • Automatic alert creation
    • Auditing
    • Web App Firewall (WAF)
  • Runbooks
    • Configuring Egress and Ingress for AKS Ingress Controllers in Private Networks
    • Configuring Retool to SSH into a DuploCloud Host with a Static IP Address for Secure Remote Database
  • FAQs
  • Extras
    • FluxCD
    • Deploying Helm Charts
    • Setting up SCPs (Service Control Policies) for DuploCloud
    • BYOH
    • Delegate Subdomains
    • Video Transcripts
      • DuploCloud AWS Product Demo
      • DuploCloud Azure Product Demo
      • DuploCloud GCP Product Demo
      • DevOps Deep Dive - Abstracting Cloud Complexity
      • DuploCloud Uses Infrastructure-as-Code to Stitch Together DevOps Lifecycle
Powered by GitBook
LogoLogo

Platform

  • Overview
  • Demo Videos
  • Pricing Guide
  • Documentaiton

Solutions

  • DevOps Automation
  • Compliance
  • Platform Engineering
  • Edge Deployments

Resources

  • Blog & News
  • Customer Stories
  • Webinars
  • Privacy Policy

Company

  • Careers
  • Press
  • Events
  • Contact

© DuploCloud, Inc. All rights reserved. DuploCloud trademarks used herein are registered trademarks of DuploCloud and affiliates

On this page
  • Creating an EMR Studio
  • Creating an EMR Serverless application
  • Cloning an application
  • Creating a job
  • Monitoring running jobs

Was this helpful?

Edit on GitHub
Export as PDF
  1. AWS User Guide
  2. AWS Services

EMR Serverless

Run big data applications with open-source frameworks without managing clusters and servers

PreviousMount an EFS in an EC2 instanceNextEventBridge

Last updated 5 months ago

Was this helpful?

Amazon EMR Serverless is a serverless option in that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. You get all the features and benefits of Amazon EMR without needing experts to plan and manage clusters.

In this procedure, we , , then to run the application with EMR Serverless.

DuploCloud EMR Serverless supports Hive, Spark, and custom ECR images.

Creating an EMR Studio

To create EMR Serverless applications you first need to create an EMR studio.

  1. In the DuploCloud Portal, navigate to Cloud Services -> Analytics.

  2. Click the EMR Serverless tab.

  3. Click EMR Studio.

  4. Click Add. The Add EMR Studio pane displays.

  5. Enter a Description of the Studio for reference.

  6. Select an that you previously defined from the Logs Default S3 Bucket list box.

  7. Optionally, in the Logs Default S3 Folder field, specify the path to which logs are written.

  8. Click Create. The EMR Studio is created and displayed.

  9. Select the EMR Studio name in the Name column. The EMR Studio page displays. View the Details of the EMR Serverless Studio.

  10. Navigate to the EMR Serverless tab and click the menu () icon in the Actions column. Use the Actions Menu to delete the studio if needed, as well as to view the studio in the AWS Console.

Now that the EMR Studio exists, you create an application to run analytics with it. The DuploCloud Portal supports Hive and Spark applications. In this example, we create a Spark Application.

Creating an EMR Serverless application

  1. In the EMR Serverless tab, click Add. A configuration wizard launches with five steps for you to complete.

  2. Enter the EMR Serverless Application Name (app1, in this example) and the EMR Release Label in the Basics step. DuploCloud prepends the string DUPLOSERVICES-TENANT_NAME to your chosen application name, where TENANT_NAME is your Tenant's name. Click Next.

  3. Accept the defaults for the Capacity, Limits, and Configure pages by clicking Next on each page until you reach the Confirm page.

  4. On the Confirm page, click Submit. Your created application instance (DUPLOSERVICES-DEFAULT-APP1, in this example) is displayed in the EMR Serverless tab with the State of CREATED.

Before you begin to create a job to run the application, clone an instance of it to run.

Cloning an application

  1. Make any desired changes while advancing through the Basics, Capacity, Limits, and Configure steps, clicking Next to advance the wizard to the next page. DuploCloud gives your cloned app a unique generated name by default (app1-c-833, in this example).

  2. On the Confirm page, click Submit. In the EMR Serverless tab, you should now have two application instances in the CREATED State: your original application instance (DUPLOSERVICES-DEFAULT-APP1) and the cloned application instance (DUPLOSERVICES-DEFAULT-APP1-C-833).

Creating a job

You have created and cloned the Spark application. Now you must create and clone a job to run it in EMR Serverless. In this example, we create a Spark job.

  1. Select the application instance that you previously cloned. This instance (DUPLOSERVICES-DEFAULT-APP1-C-833, in this example) has a STATE of CREATED.

  2. Click Add. The configuration wizard launches.

  3. In the Basics step, enter the EMR Serverless RunJob Name (jobfromcloneapp, in this example).

  4. Click Next.

  5. In the Job details step, select a previously-defined Spark Script S3 Bucket.

  6. Optionally, in the Spark Scripts field, you can specify an array of arguments passed to your JAR or Python script. Each argument in the array must be separated by a comma (,). In the example below, a single argument of "40000" is entered.

  7. Optionally, in the Spark Submit Parameters field, you can specify Spark --conf parameters. See the example below.

  8. Click Next.

  9. Make any desired changes in the Configure step and click Next to advance the wizard to the Confirm page.

  10. On the Confirm page, click Submit. In the Run Jobs tab for your cloned application, your job JOBFROMCLONEAPP displays.

Monitoring running jobs

Observe the status of your jobs and makes changes, if needed. In this example, we monitor the Spark jobs created and cloned in this procedure.

  1. In the DuploCloud Portal, navigate to Cloud Services -> Analytics.

  2. Click the EMR Serverless tab.

  3. Select the application instance that you want to monitor. The Run Jobs tab displays run jobs connected to the application instance and each job's STATE.

  4. Using the Actions menu, you can view the Console, Start, Stop, Edit, Clone or Delete jobs. You can also click the Details tab to view configuration details.

On the EMR Serverless page, click the menu () icon and select Clone.

If you are new to Spark, use the Info Tips (blue icon) when entering data in the EMR Serverless configuration wizard steps below.

In the Spark Script File field, enter a path to define where your scripts are stored.

S3 Bucket
Amazon EMR
S3 Bucket
create an EMR studio
create and clone a Spark application
create and clone a Spark job
Actions menu with EMR Studio option highlighted on EMR Serverless tab
EMR Studio page with Basic and Details tabs.
EMR Serverless Studio Actions Menu
EMR Serverless configuration wizard Basics step
EMR Serverless tab with CREATED application instance
Actions menu with Clone option on EMR Serverless tab
Original application instance and cloned instance in EMR Serverless tab
EMR Serverless configuration wizard Basics step with EMR Serverless RunJob Name field
EMR Serverless configuration wizard Job details step with Spark Script Arguments and Spark Submit Parameters fields
Run Jobs tab for cloned application instance DUPLOSERVICES-DEFAULT-APP1-C-753
Run Jobs tab with 2 jobs in various STATEs