Skip to main content

Elastic MapReduce: Import an Existing EMR Cluster

You can import an existing EMR cluster to Elastigroup. Elastigroup then manages the cluster, enabling you to take advantage of numerous optimization features and significant cost savings.

You can choose from two different strategies of importing the EMR cluster:

  • Clone: Elastigroup copies the configuration of an existing environment (including terminated environments) and creates a new cluster with this configuration.

  • Wrap: Elastigroup manages scaling of only the task nodes of an existing EMR cluster.

The procedures below describe each import strategy in detail.

Amazon EMR can occasionally get stuck with a Resizing status during changes in an instance group capacity. In this case, the actual number of running instances will not match the request number. Learn how Elastigroup EMR Auto-Recovery process handles these situations.

Check out our API Reference and learn how to create an Elastigroup to run your task nodes using RESTful APIs.

Prerequisites

  • A verified Spot account
  • A running EMR cluster

Get Started

  1. In the left menu of the Spot console, click Elastigroup/Groups, and click Create Elastigroup.

  2. In the Use Cases page, click EMR.

  3. Choose Use an Existing Cluster, and click Select.

General

  1. In the General tab, complete the following information:
    • Name: The display name of the cluster
    • Region: The AWS region of the instances in the cluster
    • Description: A few words identifying the purpose of the cluster
  2. Click Next.

Strategy & Compute

Strategy

Choose the import strategy you would like to use and enter the relevant Origin Cluster or Source Cluster.

Instance Groups

If you are cloning an environment, define your instance groups as follows:

Primary

  • Instance Types: Choose one or more preferred instance types for the primary.
  • Life Cycle: Choose Spot or On-Demand.

Core

  • Instance Types: Choose one or more preferred instance types for the core.
  • Life Cycle: Choose Spot or On-Demand.
  • Target
  • Minimum
  • Maximum
  • Capacity Unit

Task

  • Instance Types: Choose one or more preferred instance types for the task instances.
  • Life Cycle: Choose Spot or On-Demand.
  • Target
  • Minimum
  • Maximum
  • Capacity Unit
tip

EMR Primary and Core node instance groups must always have at least one instance running to avoid cluster termination. It is highly recommended to avoid running them on single spot instances by either setting the Target capacities to more than one instance or setting the Lifecycle to On-Demand.

If you are wrapping an environment, you only need to configure the Task Instance Groups.

Tags

If you are cloning an environment, you can define tags in the Tags section.

Scheduling

You can schedule actions for the Task Instance Groups.

Advanced

If you are cloning an environment, you can define Advanced parameters as described below.

  • Set a Root Volume Size (GB)

    tip

    Decreasing root volume size is not recommended and might affect the proper launch of the instance group or the cluster.

  • Include EMR Steps. This adds any steps configured in the original cluster to the clone in Elastigroup.

Scaling (Optional)

You can define scaling policies as described in Create a New EMR Cluster.

Review

In the Review tab, you can review your EMR cluster configuration in the JSON format that Elastigroup will use to create the Elastigroup. If you need to make changes, you can go back to the other tabs and edit, or you can make your changes directly in the JSON. When you are finished reviewing, click Create.