Detect churn early

This tutorial uses the Faraday API to predict churn. You upload customer identifiers and first party data, and we provide responsibly sourced third-party data and infrastructure to make predictions.

😅This tutorial seems long, but it's only 9 POST requests to transform raw CSVs of orders and churn into finished predictions. Go forth and conquer!

📘You can't accidentally incur charges

Account & credentials

Create a free account if you haven't already. You will immediately get an API key that works for test data.

Prepare and send your data

You are ready to send some data over to Faraday. This is done by placing your data into a CSV file and sending it through the API.

📘Sample data

Make CSVs

For this tutorial you will need to identify i) your customers, and ii) churn events. The most straightforward way to do this is to have two datasets, one for customers and one for churning customers.

Your main data source for customers may be an export of your orders, for example, but it could also be a list of users from your CRM or other marketing tools. You will need to format your data as a CSV. See Sending data to Faraday for examples and validation details.

You will need another similar dataset specifying who has had the event of interest (who churned).

Here's an example list of columns in an valid CSV:

  • customer ID
  • first name
  • last name
  • address
  • city
  • state

But you could also (or alternatively) include:

  • email
  • phone

For best results (optionally) include date fields for orders and churns:

  • event date

🚧️Include a header row

👍Additional columns are OK

Uploading your CSV

After preparing your CSV file, you are going to upload it using the API's upload endpoint.

Note that you will always upload your files to a subfolder underneath uploads. The below example uploads a local file named acme_orders.csv to a folder and file on Faraday at orders/file1.csv. You can even upload multiple files with the same column structure into the same folder if that's easier — they'll all get merged together. This is especially useful if you want to update your model over time - for example, as new orders come in.

curl  --request POST \
      --url https://api.faraday.ai/v1/uploads/orders/orders1.csv \
      --header "Authorization: Bearer YOUR_API_KEY" \
      --header "Content-Type: application/octet-stream" \
      --data-binary "@acme_orders.csv"

Second, we upload a different type of file, acme_churns.csv to a separate churns folder at churns/churns1.csv.

curl  --request POST \
      --url https://api.faraday.ai/v1/uploads/churns/churns1.csv \
      --header "Authorization: Bearer YOUR_API_KEY" \
      --header "Content-Type: application/octet-stream" \
      --data-binary "@acme_churns.csv"

Repeated calls to the `/uploads` endpoint

Mapping your data

Once your file has finished uploading, Faraday needs to know how to understand it. You'll use Datasets to define these mapping.

📘All data across an account is used in modeling

If you're using the sample file, check out Testing for an example API call that includes the right field configuration.

curl --request POST \
     --url https://api.faraday.ai/v1/datasets \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
    "name": "orders_data",
     "identity_sets": {
          "customer": {
               "house_number_and_street": [
                    "address"
               ],
               "person_first_name": "first_name",
               "person_last_name": "last_name",
               "city": "city",
               "state": "state"
          }
     },
     "output_to_streams": {
          "orders": {
               "data_map": {
                    "datetime": {
                         "column_name": "date",
                         "format": "date_iso8601"
                    },
                    "value": {
                         "column_name": "total",
                         "format": "currency_dollars"
                    }
               }
          }
    },
    "options": {
        "type": "hosted_csv",
        "upload_directory": "orders"
    }
}
'
curl --request POST \
     --url https://api.faraday.ai/v1/datasets \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
    "name": "churns_data",
     "identity_sets": {
          "customer": {
               "house_number_and_street": [
                    "address"
               ],
               "person_first_name": "first_name",
               "person_last_name": "last_name",
               "city": "city",
               "state": "state"
          }
     },
     "output_to_streams": {
          "churns": {
               "data_map": {
                    "datetime": {
                         "column_name": "churn_date",
                         "format": "date_iso8601"
                    }
               }
          }
    },
    "options": {
        "type": "hosted_csv",
        "upload_directory": "churns"
    }
}
'

Let's break down the above example.

  • upload_directory — Here you are telling Faraday which files we're talking about by specifying the subfolder you uploaded your data to, e.g. orders in our above example. If there are multiple files in this folder (and they all have the same structure), they will be merged together.
  • identity_sets — Here's where you specify how Faraday should recognize the people in each of your rows. Your data may have multiple identities per row, especially in lists of orders where you may have separate billing and shipping info. Our example above creates an arbitrary identity name customer. It uses email (mapping the 'account_email' column from our CSV file to the 'email' field Faraday expects), but if you have names, addresses, or phone numbers it's important to include them to improve identity resolution. Faraday will always use the best combination of available identifiers to recognize people. Mapping options are available in Datasets.
  • output_to_streams — Here's where you tell Faraday how to recognize events in your data. Here, we're calling our events orders, because that's how many companies define their customers' transactional behavior, but you can use any name you like, and one dataset may represent multiple event types. We recommend (but do not require) that you specify a datetime field — in our sample files this corresponds to date and churn_date columns from the respective CSVs. You can also include metadata about products involved in the event and a dollar value associated with the event, although that's not necessary or always relevant - we do this for the total column of orders but not for churn.

Repeated calls to the `/datasets` endpoint

Create your cohorts

Now you're going to use this identity and event data to formally define the groups of people needed for predictions. We need to know who has the event of interest (churn) and (optionally) when, as well as who is eligible to churn (optionally starting / ending when). Specifying the dates of events allows us to model the attainment event rate as a function of time-varying attributes (such as tenure, recency, total purchases etc), and give you better predictions.

For this tutorial, you want to include all the people in the customers dataset you created, creating a cohort from it. All you have to do is point to the orders stream you created above and give your cohort a name like "Customers." By default, when a cohort is specified from an event stream, this captures the first date in the stream, which in this case (the first order) is when a customer becomes eligible to churn.

Similarly, you need to create a cohort for churned customers, referencing the churns stream.

curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Customer First Orders",
     "stream_name": "orders"
}
'
curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Customer Churn",
     "stream_name": "churns"
}
'

You'll need the UUID of the cohorts you just created in the next step, so copy them now!

Create your outcome

Now that you've formally defined your customer groups, it's time to move on to prediction. For this tutorial, we're going to create an outcome from your customers, which will use ML to build a model that predicts whether a given individual looks more like someone who will have the event of interest within the next 30 days or not.

You will take the cohort UUIDs returned in the previous step and use them to make the following call to create an outcome:

curl --request POST \
     --url https://api.faraday.ai/v1/outcomes \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Explicit Churn, All Customers",
     "attainment_cohort_id": "YOUR_CHURN_COHORT_ID",
     "eligible_cohort_id": "YOUR_CUSTOMERS_COHORT_ID"
}
'

When you create this outcome, Faraday starts building and validating the appropriate ML model behind the scenes. Remember to save the UUID you get back in your response.

Learn about your model

Once the model has finished building, we will generate an outcome model report for you. The report explains how we generated your model, how well your model performed, and more. You can use the call below to view the report:

curl --request GET \
     --url https://api.faraday.ai/v1/outcomes/YOUR_OUTCOME_ID/report.html \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header 'Accept: text/html'

Generate churn predictions

Now that you have a model for customer churn, you can predict what your future churn will look like, and then retrieve those results.

Set up your scope

To do this, you will first create a Scope—this is how you tell Faraday which predictions you may want on which populations. You'll need three UUIDs from resources you created:

  1. The churn outcome (the model)
  2. The customers cohort (to include this population)
  3. The churn cohort (to exclude this population)

Rather than defining a new cohort of non-churned customers to score, you can specify the inclusion cohort_ids and the exclusion exclusion_cohort_ids in the population.

Here you can also explicitly set demo to true, although it's the default. This puts the scope in a preview mode to avoid billing charges—by limiting its output.

curl --request POST \
     --url https://api.faraday.ai/v1/scopes \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header 'Accept: application/json' \
     --header 'Content-Type: application/json' \
     --data '
{
     "payload": {
          "outcome_ids": [
               "YOUR_OUTCOME_ID"
          ]
     },
     "population": {
          "cohort_ids": [
               "YOUR_CUSTOMERS_COHORT_ID"
          ],
          "exclusion_cohort_ids": [
               "YOUR_CHURN_COHORT_ID"
          ]
     },
     "name": "SCOPE_NAME",
     "preview": false
}
'

Checking scope status

Faraday proactively makes and caches the prediction you defined in your scope, which may take some time. To see if your scope is ready, you can fetch https://api.faraday.ai/v1/scopes/{scope_id}.

Deploying predictions

Now it's time to download the churn prediction results! The simplest way to do this is to retrieve them all in a single CSV file.

Add a target

First you'll add a Target to your scope with type hosted_csv.

curl --request POST \
     --url https://api.faraday.ai/v1/targets \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "name": "churn_csv_target",
     "options": {
          "type": "hosted_csv"
     },
     "representation": {
          "mode": "hashed"
     },
     "scope_id": "YOUR_SCOPE_ID"
}
'

Check whether your target is ready

Prior to trying to download your CSV check whether the resource (along with its dependencies) is ready:

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Accept: application/json"

Retrieve your predictions CSV

Once your deployment is ready, you can download the hosted CSV you created when you added your deploy target.

Looking at the file, you'll see that each one of your customers has been scored for propensity to churn.

In production, you'll generally automate the retrieval of this file and its insertion into your data warehouse and other systems. Faraday supports integration with a wide variety of tools.

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID/download.csv \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Accept: application/json" > my_local_file.csv
open my_local_file.csv

🚧️Preview mode