How to configure an Amazon S3 bucket to allow ScrapeHero to write data and images

We will assume that you have already created your Amazon S3 account. If not, then go here to do that: http://aws.amazon.com

What we will need from you

At the end of this exercise, we will need 2 things from you through an email or in response to the ticket.

These two items are marked with this icon

  1. The Name of the Bucket that you create in Step 1
  2. The CSV file containing the Access Key and the Secret Access Key that you downloaded from AWS in Step 3

STEP 1: Create a new Bucket

A bucket is like a folder where we can write files or read file from – like a Windows or FTP folder

Go to the S3 area in AWS

Click the Create bucket button

Enter a name for the Bucket – For the rest of the article we will be using the name scrapehero for the bucket, so please use that name

NOTE: If you decide to use a different name, you will need to use the same name in the sections below.

Keep clicking the NEXT button and DO NOT change or add any values on the subsequent screens. Till you see the Create bucket button on the bottom right – click the Create bucket button.

Create a folder (or multiple folders – Optional) to organize the data

We need AT LEAST one folder created in this bucket – let’s call it data

Click Save to create the folder

Please copy the name of this bucket and email us the NAME of this BUCKET (or tell us it is the default – scrapehero)

STEP 2: Create a new Policy

This policy restricts access for scrapehero to just this S3 bucket (not your full S3 folders)

Go to IAM – Policies

Click the Create Policy Button

Select the Create your own policy button

Give the policy a name such as ScrapeHeroS3AccessPolicy

and type anything you like in the Description field.

Then in the Policy Document – copy and paste the following

NOTE: If you changed the bucket name (above) to anything other than scrapehero, please change the name of the bucket in the policy below to the same name as you used (above)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::scrapehero",
                "arn:aws:s3:::scrapehero/*"
            ]
        }
    ]
}

Click Validate Policy and if there are no errors, click the CreatePolicy Button

STEP 3: Create a new User

For a secure configuration, you will need create a new user for ScrapeHero (say it is named scrapehero) with its own access and secret keys (which can be done using the Amazon AWS console’s “IAM” service), then you will need to make sure that that user has enough permissions.

Go to the Identity and Access Management (IAM) console in Amazon AWS – Users – Add user

Create a user named scrapehero and click the Programmatic Access check box before clicking Next: Permissions

Now click the Attach existing policy directly option on the next screen

Type in ScrapeHeroS3 to filter the policy list and click the CHECKBOX next to it. (DO NOT CLICK ON THE POLICY LINK ITSELF)

Scroll to the bottom of the page and click Next: Review

On the next page click the Create User button

On the next page, you will HAVE THE ONLY CHANCE TO SEE THE CREDENTIALS and DOWNLOAD them to send to us.

Click the Download CSV button

Download this file and email that to us