S3 - Faraday

In this tutorial, we'll show you how to:

Connect your S3 account to Faraday using a connection.

Let's dive in.

You'll need a Faraday account — signup is free!

Prerequisites

You'll need the following details to create your connection to S3:

Bucket name requiredtext
AWS region requiredtextS3 buckets exists inside of an AWS region, e.g. us-east-1

Granting access

First, you'll need Faraday access to your S3 account.

CSV is a well-known format for transferring data in large batches. Faraday can accept files up to 5gb in size. If you have more than 5gb of data, we ask that you split it into multiple files.

Faraday's CSV support is based on folders. Each CSV in a folder should have the exact same structure. There should be no more than 5000 files in a single folder.

Amazon Web Services (AWS) Simple Storage Service (S3) can be used to send and receive files. Access is shared using AWS IAM permissions. We suggest that you create a Faraday-only bucket to both send and receive data.

The bucket name should start with "faraday" (e.g. "s3://faraday-acme-io2quob/"). Within this bucket, Faraday should have full read and write access.

Alternatively, you can give Faraday access to certain prefixes in a shared bucket, or you can use a bucket that doesn't start with Faraday. In this case, please contact Faraday support for further assistance.

Which IAM account(s) you give access to depends on whether you're sending data to Faraday (using Datasets) and/or receiving data from Faraday (using Targets):

Sending data to Faraday (via Datasets)

AWS IAM user to give access to: arn:aws:iam::113233973114:user/stagecraft-download_s3

Permissions:

s3:ListBucket
s3:GetObject
s3:GetObjectAcl
s3:GetObjectVersion
s3:DeleteObject (This allows Faraday support to help with corrupted file deletion)

Receiving data from Faraday (via Targets, also known as Deployments)

AWS IAM user to give access to: arn:aws:iam::113233973114:user/deliver_s3

s3:ListBucket
s3:DeleteObject
s3:PutObject
s3:PutObjectAcl
s3:GetObject
s3:GetObjectAcl
s3:GetObjectVersion

Example bucket policy

Here is an example bucket policy. The top section is for Targets and the bottom section is for Datasets. Delete the sections you don't need.

  {
    "Version": "2012-10-17",
    "Id": "FaradayAccessToBucket",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::113233973114:user/deliver_s3"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::your-bucket-name"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::113233973114:user/deliver_s3"
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/optional-prefix/*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::113233973114:user/deliver_s3"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/optional-prefix/*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::113233973114:user/stagecraft-download_s3"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::your-bucket-name"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::113233973114:user/stagecraft-download_s3"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/optional-prefix/*"
        }
    ]
}

Faraday suggests that you use an unguessable string somewhere in the path to your data. This avoids what is called the Confused deputy problem

For example, let's say you were using S3. Instead of naming an S3 bucket s3://faraday-acme/, name it s3://faraday-acme-pwiiprz162ez. This guarantees that malicious actors cannot guess the name and request that Faraday import data from it into their account. The same logic applies to any path that is used to locate data.

Connecting

API via cURL

Dashboard

Use a POST /connections request:

curl https://api.faraday.ai/connections --json '{
  "name": "S3",
  "options": {
    "type": "s3-csv",
    "bucket_name": "...",
    "aws_region": "..."
  }
}'

Wait briefly while Faraday establishes your connection. It shouldn't take long.

Your new connection is now ready to use.