Skip to content

AWS EMR on EC2

Setup Permissions for Wherobots through IAM

To link to your AWS account, you need an existing AWS account. If you don’t have an AWS account, you can sign up for an AWS Free Tier account at https://aws.amazon.com/free/.

Create an IAM user under your AWS account

  1. Navigate to your AWS IAM dashboard (What is an IAM user?)
  2. Click on the Users tab
    User tabs

  3. Click on “Add users”
    Add users

  4. If you want to grant the IAM user the least permissions required for Wherobots to operate in your account, skip this step and refer to Grant IAM User permissions. Alternatively, if you want to save time here, you can simply choose Attach policies directly and toggle the following 2 policies:

    1. AmazonEMRFullAccessPolicy_v2
    2. AmazonS3FullAccess
  5. Review the new user and press “Create user”

    Create user

  6. Find the new user you created and open its page

  7. Click on the “Security credentials” tab

    Security Credentials

  8. Scroll down to the “Access keys” section and click on “Create access key”

    Access keys

  9. Select Third-party service and click “Next”

    Third party service

  10. Set a description tag (optional) and click on “Create access key”

    Create access key

  11. Download or copy the access keys you have just created. You will need them in the coming steps

    Retrive access keys

Grant IAM User permissions

To enable wherobots to operate in your account, the minimum permissions you need to put in your IAM policies are listed as below.

Permissions to manage EMR clusters

Wherobots requires permissions to list clusters, describe and clone an EMR cluster, terminate an EMR cluster.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowEMRAPI",
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:RunJobFlow"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowEMRCluters",
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:GetManagedScalingPolicy",
                "elasticmapreduce:DescribeStep",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListSteps",
                "elasticmapreduce:ListInstanceFleets",
                "elasticmapreduce:GetAutoTerminationPolicy",
                "elasticmapreduce:DescribeCluster",
                "elasticmapreduce:TerminateJobFlows",
                "elasticmapreduce:ListInstanceGroups"
            ],
            "Resource": "arn:aws:elasticmapreduce:*:{ACCOUNT_ID}:cluster/*"
        },
        {
            "Sid": "AllowPassRoles",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::{ACCOUNT_ID}:role/*",
            "Condition": {
                "StringLike": {
                    "iam:PassedToService": "*amazonaws.com*"
                }
            }
        }
    ]
}

You can directly copy the policies and create under your account.

Permissions to upload wherobots binaries

Wherobots will create a bucket with

  1. tag author = wherobots
  2. name starts with wherobots-*

in your account. And use it to upload our binaries and config files. To achieve that, we require below permissions.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowInspectBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketTagging"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowManageWherobotsBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:PutBucketTagging"
            ],
            "Resource": "arn:aws:s3:::wherobots-*"
        }
    ]
}

Grant permissions to IAM User

  1. Find your 12-digit AWS Account ID. Please see the steps on AWS

  2. In AWS Console Navigate to Policies under IAM. Then click Create Policy at the top right. Switch to Json view and paste all the policies above in. Replace the {ACCOUNT_ID} with the real account id of yours. Then create the new Policy.

  3. Navigate to Users under IAM. Find the IAM user you just created. Under tab Permissions -> Permissions policies. Click Add permissions. In the edit panel, choose Attach Policies directly, then search the new IAM policy you just created. And attach it to the new IAM User.

  1. Navigate to the Wherobots AWS cloud provider page
  2. Enter your access key and secret key from your AWS IAM user into the corresponding fields
  3. Press “Save Changes”

Create the EMR Cluster

Note

Wherobots currently does not support EMR on EKS. Please choose EMR on EC2.

Wherobots deploy to EMR by cloning the existing cluster preferred by the user and append sedona add-ons. This means the user need to spawn a clone source cluster first. Below are the steps to create the simplest cluster that is feasible for wherobots to clone.

  1. Navigate to AWS EMR clusters list, click the Create Cluster button. 1-0.png

  2. Under Application bundle section, choose Custom and pick at least following apps:

    • spark
    • livy
    • JupyterEnterpriseGateway

    1-1.png

  3. Choose the Amazon EMR Service Role and EC2 instance profile for Amazon EMR. Pick the default roles prompt by AWS if you don't need any special permission control. 1-2.png

  4. Click the Create Cluster button, navigate back to clusters page, you will find the cluster provisioning. Click into the cluster you will see the details. 1.png

For more details of how to configure the EMR cluster, please refer to the AWS official document

Set up your EMR Cluster with Wherobots

  1. Navigate to the Wherobots Third-party Connect page and select 'AWS'.

    Third Party Landing AWS

  2. Find the source cluster you created in the cluster list, then choose clone to start. In the pop up window, input the new cluster name and choose the sedona version to install. Then click on 'Proceed'.

    AWS Clusters Install Sedona popup

  3. Wherobots service will clone your source cluster, and plant wherobots binaries into the new cluster. You can monitor the cluster spawn process through the notification. Once the notification reflects that the installation has succeeded, the EMR cluster spawn process has begun. Once the cluster is up, it can be used to run Sedona.

    4-0.png 4-1.png

Spawn EMR Workspace(JupyterLab) and run notebooks

Tip

We provide many ready-to-use example Python Jupyter notebooks. Please try them out: Wherobots examples.

  1. Navigate to your AWS EMR cluster page, click the Workspace on the side bar, then choose to create workspace. If there is no Studio yet, create a new one follow the AWS guidance, or use any existing studio available.

    • Make sure that the studio is under same VPC and subnet(multi-choice) with the EMR cluster.

    5.png

  2. In the create page, under Advanced configuration, choose Attach Workspace to an EMR cluster, then pick the cluster spawned by wherobots under.

    6.png

  3. Submit the create, then navigate back to the workspace list, wait for the new workspace turns into attached status, then click the workspace, AWS will pop up a new jupyterlab tab in browser.

    7-0.png 7-1.png

  4. Import a Jupyter notebook. You can get one from Wherobots examples.

  5. Now your notebook is imported. You can open it, and then choose the PySpark kernel, and execute the cells one by one.

    9.png


Last update: November 17, 2023 03:11:08