Adding data

All predictions start with data — and this is how you'll begin with Faraday as well.

Why does Faraday need my data?

Faraday makes predictions using a combination of customer data you provide and consumer data which we provide. Your data is necessary because it tells us how to recognize desirable outcomes like purchasing, and important groups of people like your customers.

📘Don't have any data?

How to add data

You will add data to Faraday by following three steps with the API:

  1. Upload raw data
  2. Register a dataset
  3. Define a cohort

Uploading raw data

First, you'll send raw customer data to Faraday.

Sending data to Faraday

You will send data to Faraday using CSV files. This is the "lingua franca" of data on the internet and the simplest, easiest way for developers to exchange machine-readable data.

📘Don't have any data?

Structuring your data

Faraday is optimized to work with event data. For example, we prefer a file containing all of your orders over a file containing all of your customers.

Data folders

Files are uploaded to "data folders" which are only accessible to your account.

  • Each folder should contain a unique type of data, and all files in a folder should share the same column structure. For example, if you have orders from an e-commerce platform—spanning one or more distinct files—those should all be uploaded to a single folder, perhaps called web_orders.
  • Let's say you have a separate orders database from a point-of-sale or phone system. Files from that system should go into a separate folder, such as phone_orders.
  • Now, regardless of how they bought, they're all your customers! Faraday lets you merge all this data together by specifying the same stream in the datasets mapping for each folder.

Incremental updates

Faraday is optimized to accept an ongoing sequence of unique, incremental files—perhaps daily—in your data folders. But we do support multiple ways to specify the uniqueness and recency of rows in the dataset mapping.

Upsert columns

In some cases a certain event such as an order could be "updated" in a subsequent file. It's important that Faraday not treat the "update" as a new event! To accomplish this, you can specify one or more columns that uniquely identify a row in upsert_columns. This could be an explicit "order ID" column, or it can be a natural composite key such as a combination of customer ID and timestamp. Using upsert_columns is the preferred way to specify uniqueness in your data set.

As a bonus, we will return these columns to you when you retrieve predictions so that joining the data back is straightforward!

Incremental column

If you cannot provide a unique identifier with upsert_columns, the next alternative is to use incremental_column—which must refer to a datetime column. The incremental column will be used to ignore any rows older than the most recent record already ingested. This option is only useful if you 1) don't have a unique ID per row and 2) have to send over your entire data set each time you push to Faraday.

Every row unique by default

If you cannot use upsert columns or you can guarantee you don't upload duplicate rows, we will simply treat every row as a unique entry.

Format

The data you send Faraday should be in standard CSV format with column headers. You will need a minimum of two types of columns:

  1. Timestamp column(s) that represents a point in time when the event described by the row occurred.
  2. Identity column(s) that identify people, such as by email, name, and/or address. See datasets for the full list of available identity fields. The more identity columns you include, the better Faraday's identity resolution will work.

Other metadata

Beyond these minimum requirements, you can also specify products associated with the event and/or a dollar value associated with the event. Some prediction types may require these metadata.

📘Column names

Example

Here's an example orders file:

📂 Sample order data

curl --request POST \
     --url https://api.faraday.ai/v1/datasets \
     --header 'Accept: application/json' \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
    "name": "orders_data",
    "identity_sets": {
        "customer": {
            "house_number_and_street": [
                "address"
            ],
            "person_first_name": "first_name",
            "person_last_name": "last_name",
            "city": "city",
            "state": "state"
        }
    },
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "date",
                    "format": "date_iso8601"
                },
                "value": {
                    "column_name": "total",
                    "format": "currency_dollars"
                }
            }
        }
    },
    "options": {
        "type": "hosted_csv",
        "upload_directory": "orders"
    }
}
'

Registering datasets

Raw CSV data isn't quite enough to make predictions. First, Faraday has to understand what your data means.

Registering a dataset

To do this, you'll use the POST /datasets endpoint. The main part of the request looks like this:

curl --request POST https://api.faraday.ai/v1/datasets \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
// options go here . . .
}
'

Below you'll find details on each of the three major options you need to provide to register a dataset.

Folder reference

As you learned in the previous step (Uploading raw data), you will upload one or more CSVs to a top-level folder within your inbox—this folder contains all the data for your new dataset.

For example, if you uploaded:

  • /shopify/initial_upload.csv
  • /shopify/2021-08-09.csv
  • /shopify/2021-08-10.csv

Then your upload directory is shopify.

{
    "options": {
      "type": "hosted_csv",
      "upload_directory": "shopify"
    },
    ...
}

Identities

Because your data is about people, Faraday needs to understand how to recognize these people. Typically, you will have columns in your data with information like names, addresses, emails, and phone numbers. This is your opportunity to tell Faraday that, for example, your fn column is a "First name."

In many cases, your data may include multiple identities per person; for example, shipping and billing addresses. That's why you specify your identities as an array. You can see the POST /datasets reference for more information.

{
  ...
    "identity_sets": {
        "customer": {
            "person_first_name": "fn"
            ...
        }
    },
  ...
}

Events

Finally, Faraday needs to understand what behaviors are being exhibited in your data. We call these events. For example, if you've uploaded order data, each row represents an order event. And each of these events was experienced by an individual person (see "Identities" above).

When mapping your data to events, you can indicate when the event happened (the datetime key), and the monetary value of the event (value). While all fields are optional, we recommend using datetime whenever possible, to improve the accuracy of the predictions.

{
  ...
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "updated_at",
                    "format": "date_iso8601"
                }
            }
        }
  ...
}

Putting it all together

curl --request POST https://api.faraday.ai/v1/datasets \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
    "name": "DATASET_NAME",
    "options": {
      "type": "hosted_csv",
      "upload_directory": "shopify"
    },
    "identity_sets": {
        "customer": {
            "email": "account_email"
        }
    },
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "updated_at",
                    "format": "date_iso8601"
                }
            }
        }
    }
}
'

Defining cohorts

Now that Faraday can understand your data, it's time to use it!

What are cohorts?

Cohorts are the building blocks you will use later to define your prediction objectives. Each one represents a group of people that's important to you; for example, your customers or leads.

Concretely, a cohort is a group of people who have all experienced the same event. You can also add conditions to further restrict who qualifies for your cohort.

Choosing an event

Any event you specified in the previous step (Registering datasets) can be used to define a new cohort.

For example, if your dataset represents orders, you could define a "Customers" cohort: everybody who has experienced an order event.

curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header 'Authorization: Bearer YOUR_API_TOKEN' \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --data '
{
     "name": "Customers",
     "stream_name": "orders"
}
'

Recency

You can optionally specify how recent the event must have been for somebody experiencing it to qualify for the cohort. For example, an "Early customers" cohort could require that someone have experienced an order event more than five years ago.

{
  "name": "Early customers",
  "stream_name": "orders",
  "min_days": 1825 // that's 5 years
}

Frequency

You can also choose to specify how many times the event must have been experienced by a given person for them to qualify for the cohort. For example, a "Multiple purchasers" cohort could require that someone experience an order event at least twice.

{
  "name": "Repeat buyers",
  "stream_name": "orders",
  "min_count": 2
}

Monetary

Finally, you can specify value requirements for events. For example, a "Best customers" cohort could require that someone's order events total at least $1,000 to qualify.

{
  "name": "Best customers",
  "stream_name": "orders",
  "min_value": 1000
}