Bulk score leads

This tutorial uses the Faraday API to identify who amongst future leads is likely to become a customer. You upload customer identifiers and lead identifiers—we provide all of the rich consumer data necessary to build and employ predictive models. Open the above recipe for a list of API requests or keep reading for additional details.

🚧️Lead Conversion & Lead Acquisition

This tutorial is a combination of Find customer lookalikes and Find big spenders. This hybrid approach is often used to score leads received from a third-party (lead agency, vendor...)

This approach differs from the Find customer lookalikes recipe as the outcome we are going to create uses a lead cohort as the eligible population instead of the entire US population.

This approach differs from Find big spenders as the scope we are going to create is used to score the entire US population instead of scoring a lead cohort. Note that in this example, we are focusing on all customers, not just the "big spenders".

The result of this approach will provide a percentile for the US population but indexed on the lead cohort.

📘You can't accidentally incur charges

The steps in this guide are completely free. You won't be charged until you want to start retrieving predictions at scale.

Account & credentials

Create a free account if you haven't already. You will immediately get an API key that works for test data.

Prepare and send your data

You are ready to send some data over to Faraday. This is done by placing your data into a CSV file and sending it through the API.

📘Sample data

Don't have access to customer data just yet? No problem — grab our sample data from the Testing page.

Make a CSV

Since this tutorial is based on your leads and customers, your data source may be an export of your email subscriptions and orders, for example, but it could also be a list of users from your CRM or other marketing tools. You will need to format your data as a CSV. See Sending data to Faraday for examples and validation details.

Here's an example list of columns in a valid CSV:

  • customer ID or lead ID
  • first name
  • last name
  • address
  • city
  • state

But you could also (or alternatively) include:

  • email
  • phone

🚧️Include a header row

Your CSV file should have a "header" row, but you can use any headers you like. We suggest using recognizable headers that make sense to you.

👍Additional columns are OK

There is no need to remove other columns if you are using a larger dataset that is convenient to export, just upload the whole thing!

Uploading your CSV

After preparing your CSV file, you are going to upload it using the API's upload endpoint.

Note that you will always upload your files to a subfolder underneath uploads. The below example uploads a local file named acme_orders.csv to a folder and file on Faraday at orders/file1.csv. You can pick whatever folder name and filename you want: we will use it in the next step. You can even upload multiple files with the same column structure into the same folder if that's easier — they'll all get merged together. This is especially useful if you want to update your model over time - for example, as new orders come in.

curl --request POST \
     --url https://api.faraday.ai/v1/uploads/orders/file1.csv \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/octet-stream' \
     --data-binary "@acme_orders.csv"

Second, we upload a different type of file, acme_leads.csv to a separate churns folder at leads/file1.csv.

curl  --request POST \
      --url https://api.faraday.ai/v1/uploads/leads/file1.csv \
      --header "Authorization: Bearer YOUR_API_KEY" \
      --header "Content-Type: application/octet-stream" \
      --data-binary "@acme_leads.csv"

Repeated calls to the `/uploads` endpoint

At the same url location, repeated calls will result in any existing file at that location being overwritten without warning (even if your data have changed). If you want to add a new file rather than overwrite it is your responsibility to make sure the name is unique.

Mapping your data

Once your file has finished uploading, Faraday needs to know how to understand it. You'll use Datasets to define this mapping.

📘All data across an account is used in modeling

Make sure that all of your configurations and data are up to date - we use all available information as much as we can in order to build the best models. If you connect a value and a date incorrectly, so that the value actually is updated after the described date, then models can cheat (for instance if you gather 'favorite color' when a customer churns but associate it with the last transaction, 'favorite color' may be incorrectly available as a predictor of churn while this is not actually useful for prediction).

If you're using the sample file, check out Testing for an example API call that includes the right field configuration.

curl --request POST \
     --url https://api.faraday.io/v1/datasets \
     --header 'Accept: application/json' \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
    "name": "orders_data",
    "identity_sets": {
        "customer": {
            "house_number_and_street": [
                "address"
            ],
            "person_first_name": "first_name",
            "person_last_name": "last_name",
            "city": "city",
            "state": "state"
        }
    },
    "output_to_streams": {
        "orders": {
            "data_map": {
                "datetime": {
                    "column_name": "date",
                    "format": "date_iso8601"
                },
                "value": {
                    "column_name": "total",
                    "format": "currency_dollars"
                }
            }
        }
    },
    "options": {
        "type": "hosted_csv",
        "upload_directory": "orders"
    }
}
'
curl --request POST \
     --url https://api.faraday.io/v1/datasets \
     --header 'Accept: application/json' \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "leads_data",
     "identity_sets": {
          "lead": {
               "city": "city",
               "person_first_name": "first_name",
               "person_last_name": "last_name",
               "state": "state",
               "house_number_and_street": [
                    "address"
               ]
          }
     },
     "options": {
          "type": "hosted_csv",
          "upload_directory": "leads"
     },
     "output_to_streams": {
          "leads": {
               "data_map": {
                    "datetime": {
                         "column_name": "date",
                         "format": "date_iso8601"
                    }
               }
          }
     },

}
'

Let's break down the above example.

  • upload_directory — Here you are telling Faraday which files we're talking about by specifying the subfolder you uploaded your data to, e.g. orders or leads in our above example. If there are multiple files in this folder (and they all have the same structure), they will be merged together.
  • identity_sets — Here's where you specify how Faraday should recognize the people in each of your rows. Your data may have multiple identities per row, especially in lists of orders where you may have separate billing and shipping info. Our example above creates an arbitrary identity name customer and a lead identity name. It uses email (mapping the 'account_email' column from our CSV file to the 'email' field Faraday expects), but if you have names, addresses, or phone numbers it's important to include them to improve identity resolution. Faraday will always use the best combination of available identifiers to recognize people. Mapping options are available in Datasets.
  • output_to_streams — Here's where you tell Faraday how to recognize events in your data. Here, we're calling our events orders, because that's how many companies define their customers' transactional behavior, but you can use any name you like, and one dataset may represent multiple event types. We recommend (but do not require) that you specify a datetime field — in our sample files this corresponds to date and lead_date columns from the respective CSVs. You can also include metadata about products involved in the event and a dollar value associated with the event, although that's not necessary or always relevant - we do this for the total column of orders but not for leads.

Repeated calls to the `/datasets` endpoint

Repeated calls with identical configurations will result in duplicate resources being created. This can cause problems with downstream models by introducing potentially uneven duplications in data.

For instance, if you had a folder of system_x_orders with one schema, and another folder of system_y_orders with a different schema, both mapping into the orders stream, and were to have called the /datasets endpoint twice for system_y_orders, a customer's total lifetime orders count will increase one for each system x order and two for each system y order. If you do this after building a model, this will result in inaccurate predictions.

You only need to call the /datasets endpoint once for an entire upload_directory of data. If you add a new week's CSV file (with a new filename but into an existing upload_directory) using the /uploads endpoint, you do not need to make another call to the /datasets endpoint.

Create your cohorts

Now you're going to use these identities and event data to formally define the groups of people needed for predictions. We need to know who has the event of interest (customer) and (optionally) when, as well as who amongst the lead is eligible to become a customer (optionally starting / ending when). Specifying the dates of events allows us to model the attainment event rate as a function of time-varying attributes (such as tenure, recency, total purchases etc), and give you better predictions.

For this tutorial, you want to include all the people in the customers dataset you created, creating a cohort from it. All you have to do is point to the orders stream you created above and give your cohort a name like "Customers." By default, when a cohort is specified from an event stream, this captures the first date in the stream. In this case, a person becomes a customer when their first purchase occurs.

Similarly, you need to create a cohort for leads, referencing the leads stream.

curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Customer First Orders",
     "stream_name": "orders"
}
'
curl --request POST \
     --url https://api.faraday.ai/v1/cohorts \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Leads",
     "stream_name": "leads"
}
'

You'll need the UUID of the cohorts you just created in the next step, so copy them now!

Create your outcome

Now that you've formally defined your customer groups, it's time to move on to prediction. For this tutorial, we're going to create an outcome from your leads, which will use ML to build a model that predicts whether a given individual looks more like someone who will have the event of interest within the next 30 days or not. In this case, a lead who became a customer in the next 30 days.

You will take the cohort UUIDs returned in the previous step and use them to make the following call to create an outcome:

curl --request POST \
     --url https://api.faraday.ai/v1/outcomes \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header "Content-Type: application/json" \
     --data '
{
     "name": "Lead Conversion",
     "attainment_cohort_id": "YOUR_CUSTOMERS_COHORT_ID",
     "eligible_cohort_id": "YOUR_LEADS_COHORT_ID"
}
'

When you create this outcome, Faraday starts building and validating the appropriate ML model behind the scenes. Remember to save the UUID you get back in your response.

Learn about your model

Once the model has finished building, we will generate an outcome model report for you. The report explains how we generated your model, how well your model performed, and more. You can use the call below to view the report:

curl --request GET \
     --url https://api.faraday.ai/v1/outcomes/YOUR_OUTCOME_ID/report.html \
     --header "Authorization: Bearer YOUR_API_KEY" \
     --header 'Accept: text/html'

Generate lead conversion predictions

Now that you have a model for lead conversion, you can predict what your future conversion will look like, and then retrieve those results.

Set up your scope

To do this, you will first create a Scope—this is how you tell Faraday which predictions you may want on which populations. You'll need one UUID from resources you created:

  1. YOUR_OUTCOME_ID (the lead conversion outcome you created)
curl --request POST \
     --url https://api.faraday.ai/v1/scopes \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "payload": {
          "outcome_ids": [
               "YOUR_OUTCOME_ID"
          ]
     },
     "population": {
          "cohort_ids": []
     },
     "name": "SCOPE_NAME",
     "preview": true
}
'

Rather than defining a new cohort to get predictions for, you have instead specified the inclusion cohort_ids (leaving this empty sets it to the entire US population). It is a common pattern to include your eligible cohort (in this case existing leads) and to exclude your attainment cohort (in this case existing customers), but for lead conversion, we can skip excluding the attainment cohort.

When this request succeeds, you'll get an ID for your scope that you will need later in this tutorial (referred to as YOUR_SCOPE_ID in example requests).

📘Demo scopes

When requesting a scope, you can explicitly set preview to true, although it's the default. This puts the scope in a preview mode to avoid billing charges—by limiting its output.

Deploying predictions

Now it's time to download the results! The simplest way to do this is to retrieve them all in a single CSV file.

📘API targets

Depending on your use-case, you may wish to deploy to a lookup API so that you can retrieve predictions for specific individuals in real-time. See the quickstart for Retrieve real-time predictions for more information.

Add a target

First you'll add a Target to your scope with publication type hosted_csv.

curl --request POST \
     --url https://api.faraday.ai/v1/targets \
     --header 'Authorization: Bearer YOUR_API_KEY' \
     --header 'Content-Type: application/json' \
     --data '
{
     "name": "TARGET_NAME",
     "options": {
          "type": "hosted_csv"
     },
     "representation": {
          "mode": "hashed"
     },
     "scope_id": "YOUR_SCOPE_ID",
     "limit": {
          "method": "percentile",
          "outcome_id": "YOUR_OUTCOME_ID",
          "percentile_max": 100,
          "percentile_min": 1
     }
}
'

When this request succeeds, you'll get an ID for your target that you will need later in this tutorial (referred to as YOUR_TARGET_ID in example requests).

📘Publication versus replication targets

A publication target e.g. "type": "hosted_csv" in the above options block, this means that Faraday hosts your predictions for retrieval. Alternatively, Faraday can also copy your predictions to systems that you control. These types of targets are called replication targets and require a connection.

Check deployment status

Prior to downloading your CSV check whether the resource (along with its dependencies) is ready:

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID \
     --header 'Accept: application/json' \
     --header 'Authorization: Bearer YOUR_API_KEY'

Retrieve your predictions

Once your deployment is ready, you can download the hosted CSV you created when you added your deploy target.

curl --request GET \
     --url https://api.faraday.ai/v1/targets/YOUR_TARGET_ID/download.csv \
     --header 'Accept: text/csv' \
     --header 'Authorization: Bearer YOUR_API_KEY' > my_local_file.csv
open my_local_file.csv

Looking at the response, you'll see that each US resident has a fdy_outcome_OUTCOME_ID_propensity_score and a fdy_outcome_OUTCOME_ID_propensity_percentile. The score is the raw output of the model and the percentile is computed with respect to the raw scores of the individuals defined in the lead cohort used to train the model.

🚧️Percentile Normalization

fdy_outcome_OUTCOME_ID_propensity_percentile is computed using the lead cohort provided to the outcome and used here as our benchmark. For an individual to be assigned a percentile of 100 (i.e. the top 1%), the individual's score has to be in the top 1% of the score associated with the leads cohort.

For example, if an individual is assigned a percentile of 70 it means that this individual has a score higher than 69% of the lead cohort members.

These values measure the propensity of each person living in the USA to become a customer (with the baseline being established using the lead cohort) and you can now take business actions based on these predictions!

🚧️Preview mode

If a scope is in preview mode, you will only get a sample of the complete results back. This helps you validate the results you're getting and build your integrations before incurring charges.