> For the complete documentation index, see [llms.txt](https://docs.duplocloud.com/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.duplocloud.com/docs/automation-platform/overview/aws-services/emr-serverless.md). # EMR Serverless Amazon EMR Serverless is a serverless option in [Amazon EMR](https://aws.amazon.com/emr/)[ ](https://aws.amazon.com/emr/serverless/)that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. You get all the features and benefits of Amazon EMR without needing experts to plan and manage clusters. In this procedure, we [create an EMR studio](#creating-an-emr-studio), [create and clone a Spark application](#creating-and-cloning-a-spark-application), then [create and clone a Spark job](#creating-and-cloning-a-spark-job) to run the application with EMR Serverless. {% hint style="info" %} DuploCloud EMR Serverless supports Hive, Spark, and custom ECR images. {% endhint %} ## Creating an EMR Studio To create EMR Serverless applications you first need to create an EMR studio. 1. In the DuploCloud Portal, navigate to **Cloud Services** -> **Analytics**. 2. Click the **EMR Serverless** tab. 3. Click **EMR Studio**.

**Actions** menu with **EMR Studio** option highlighted on **EMR Serverless** tab

4. Click **Add**. The **Add EMR Studio** pane displays.\ \ ![](/files/UKfiycjI2Xjp50s8ne1p)
5. Enter a **Description** of the **Studio** for reference. 6. Select an [S3 Bucket](/docs/automation-platform/overview/aws-services/s3-bucket.md) that you previously defined from the **Logs Default S3 Bucket** list box. 7. Optionally, in the **Logs Default S3 Folder** field, specify the path to which logs are written. 8. Click **Create**. The EMR Studio is created and displayed. 9. Select the EMR Studio name in the **Name** column. The **EMR Studio** page displays. View the **Details** of the EMR Serverless Studio.

EMR Studio page with **Basic** and **Details** tabs.

10. Navigate to the **EMR Serverless** tab and click the menu (

) icon in the **Actions** column. Use the **Actions** Menu to delete the studio if needed, as well as to view the studio in the AWS Console.

Now that the EMR Studio exists, you create an application to run analytics with it.\ The DuploCloud Portal supports `Hive` and `Spark` applications. In this example, we create a Spark Application. ## Creating an EMR Serverless application 1. In the **EMR Serverless** tab, click **Add**. A configuration wizard launches with five steps for you to complete. 2. Enter the **EMR Serverless Application Name** (`app1`, in this example) and the **EMR Release Label** in the **Basics** step. DuploCloud prepends the string **DUPLOSERVICES-*****TENANT\_NAME*** to your chosen application name, where ***TENANT\_NAME*** is your Tenant's name. Click **Next**.

**EMR Serverless** configuration wizard **Basics** step

3. Accept the defaults for the **Capacity**, **Limits**, and **Configure** pages by clicking **Next** on each page until you reach the **Confirm** page. 4. On the **Confirm** page, click **Submit**. Your created application instance (`DUPLOSERVICES-DEFAULT-APP1`, in this example) is displayed in the **EMR Serverless** tab with the **State** of **CREATED**.

**EMR Serverless** tab with **CREATED** application instance

Before you begin to create a job to run the application, clone an instance of it to run. ### Cloning an application 1. On the EMR Serverless page, click the menu (

) icon and select **Clone**.

**Actions** menu with **Clone** option on **EMR Serverless** tab

2. Make any desired changes while advancing through the **Basics**, **Capacity**, **Limits**, and **Configure** steps, clicking **Next** to advance the wizard to the next page. DuploCloud gives your cloned app a unique generated name by default (**app1-c-833**, in this example). 3. On the **Confirm** page, click **Submit**. In the EMR Serverless tab, you should now have two application instances in the **CREATED State**: your original application instance (**DUPLOSERVICES-DEFAULT-APP1)** and the cloned application instance (**DUPLOSERVICES-DEFAULT-APP1-C-833)**.

Original application instance and cloned instance in EMR Serverless tab

## Creating a job You have created and cloned the Spark application. Now you must create and clone a job to run it in EMR Serverless. In this example, we create a Spark job. {% hint style="info" %} If you are new to Spark, use the Info Tips (blue

icon) when entering data in the EMR Serverless configuration wizard steps below. {% endhint %} 1. Select the application instance that you previously cloned. This instance (**DUPLOSERVICES-DEFAULT-APP1-C-833**, in this example) has a **STATE** of **CREATED**. 2. Click **Add**. The configuration wizard launches. 3. In the **Basics** step, enter the **EMR Serverless RunJob Name** (**jobfromcloneapp**, in this example).

**EMR Serverless** configuration wizard **Basics** step with **EMR Serverless RunJob Name** field

4. Click **Next**. 5. In the **Job details** step, select a previously-defined **Spark Script S3 Bucket.** 6. In the **Spark Script** [**S3 Bucket**](/docs/automation-platform/overview/aws-services/s3-bucket.md) **File** field, enter a path to define where your scripts are stored. 7. Optionally, in the **Spark Scripts** field, you can specify an array of arguments passed to your JAR or Python script. Each argument in the array must be separated by a comma (**,**). In the example below, a single argument of **"40000"** is entered. 8. Optionally, in the **Spark Submit Parameters** field, you can specify Spark **`--conf`** parameters. See the example below.

**EMR Serverless** configuration wizard **Job details** step with **Spark Script Arguments** and **Spark Submit Parameters** fields

9. Click **Next**. 10. Make any desired changes in the **Configure** step and click **Next** to advance the wizard to the **Confirm** page. 11. On the **Confirm** page, click **Submit**. In the **Run Jobs** tab for your cloned application, your job **JOBFROMCLONEAPP** displays.

**Run Jobs** tab for cloned application instance **DUPLOSERVICES-DEFAULT-APP1-C-753**

## Monitoring running jobs Observe the status of your jobs and makes changes, if needed. In this example, we monitor the Spark jobs created and cloned in this procedure. 1. In the DuploCloud Portal, navigate to **Cloud Services** -> **Analytics**. 2. Click the **EMR Serverless** tab. 3. Select the application instance that you want to monitor. The **Run Jobs** tab displays run jobs connected to the application instance and each job's **STATE**.

**Run Jobs** tab with 2 jobs in various **STATE**s

4. Using the **Actions** menu, you can view the **Console**, **Start**, **Stop**, **Edit**, **Clone** or **Delete** jobs. You can also click the **Details** tab to view configuration details. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.duplocloud.com/docs/automation-platform/overview/aws-services/emr-serverless.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.