GCS
Create a connection between Faraday and Google Cloud Storage so that your data is always up to date to make predictions, and your predictions can seamlessly sync back to your cloud bucket.
In this tutorial, we'll show you how to:
- Connect your GCS account to Faraday using a connection.
Let's dive in.
- You'll need a Faraday account — signup is free!
Prerequisites
You'll need the following details to create your connection to GCS:
- Bucket name requiredtext
- Project ID requiredtext
Granting access
First, you'll need Faraday access to your GCS account.
CSV is a well-known format for transferring data in large batches. Faraday can accept files up to 5gb in size. If you have more than 5gb of data, we ask that you split it into multiple files.
Faraday's CSV support is based on folders. Each CSV in a folder should have the exact same structure. There should be no more than 5000 files in a single folder.
Google Cloud Storage (GCS) can be used to send and receive files. Access is shared using Google Cloud IAM permissions. We suggest that you create a Faraday-only bucket to both send and receive data. Within this bucket, Faraday would have full read and write access.
The bucket name should start with "faraday" (e.g. "gcp://faraday-acme-io2quob/"). Within this bucket, Faraday should have full read and write access.
Alternatively, you can give Faraday access to certain prefixes in a shared bucket, or you can use a bucket that doesn't start with Faraday. In this case, please contact Faraday support for further assistance.
Which IAM account(s) you give access to depends on whether you're sending data to Faraday (using Datasets) and/or receiving data from Faraday (using Targets):
Sending data to Faraday (via Datasets)
IAM user to give access to: faraday-incoming@production-237317.iam.gserviceaccount.com
- Give service account Storage Object Admin
- If this is a shared bucket, limit access using to specific prefixes using an object prefix IAM Condition
Receiving data from Faraday (via Targets, also known as Deployments)
IAM user to give access to: faraday-outgoing@production-237317.iam.gserviceaccount.com
- Give service account Storage Object Admin
- If this is a shared bucket, limit access using to specific prefixes using an object prefix IAM Condition
Example prefix condition
Here is an example object prefix condition
resource.name.startsWith('projects/_/buckets/BUCKET_NAME/objects/OBJECT_PREFIX')
Faraday suggests that you use an unguessable string somewhere in the path to your data. This avoids what is called the Confused deputy problem
For example, instead of naming an S3 bucket s3://faraday-acme/
,
name it s3://faraday-acme-pwiiprz162ez
. This guarantees that
malicious actors cannot guess the name and request that Faraday import data
from it into their account. The same logic applies to any path that is used to
locate data.
Connecting
Use a POST /connections
request:
curl https://api.faraday.ai/connections --json '{ "name": "GCS", "options": { "type": "gcp-gcs-csv", "bucket_name": "...", "project_id": "..." } }'
- Wait briefly while Faraday establishes your connection. It shouldn't take long.
Your new connection is now ready to use.