Pipelines
In Pipelines, you’ll take the building blocks of your predictions–your outcomes, personas, and cohorts–and create a prediction pipeline that you can plug directly into your data warehouse, cloud bucket, or, via managed deployment, your favorite martech software (ESP, CRM, etc). Once complete, your predictions will always be kept up-to-date and auto-delivered to the destination that you chose.
Getting started
Inside Pipelines, you'll find a list of your current pipelines if you have any, with columns for:
- Population: the population being targeted for the pipeline's predictions.
- Payload: the outcomes, cohorts, and/or persona sets included in the pipeline.
- Deployments: the target destination of the deployment.
- Status: whether the pipeline is ready, queued, or errored.
Creating a pipeline
- Select new pipeline in the upper right of the Pipelines list view.
- Next, select your payload for this pipeline. Your payload is a combination of your outcome, your persona sets, and other customer cohorts–all of which are used to customize your predictions.
-
Outcome: the business outcome you want for this prediction deployment, configured in Outcomes.
-
Persona set: the personas applied to this deployment. Each individual record on the deploying end of this pipeline will be assigned a persona.
-
Cohort: Specific groups of customers based on criteria such as events and attributes, used as membership indicators in the pipeline.
Membership indicators, or including cohorts in your payload, are useful for segmentation. For example, say you want to know who in your customer base has an income of greater than $100k. Your population to include for your pipeline would be your Customers cohort, and as part of your payload, you could then select a cohort of customers with an income of greater than $100k. As a result, anyone who is indicated as being in that $100k or greater cohort in your pipeline is a current customer with more than $100k in income.
If the outcome you select in your payload specified an eligibility cohort, your pipeline is not restricted by that same eligibility cohort. For example, if your outcome's eligibility cohort is Leads (e.g. in a lead scoring outcome), and your pipeline's population to include is Everyone, then everyone will be scored–not just your leads.
-
Once your payload is selected, choose a population to include. This is the group of people you want to target with your predictions, such as Leads in the case of a traditional acquisition campaign.
-
After you've targeted a population, optionally exclude a population. Any cohorts selected in this section will not be targeted with your predictions.
-
With your criteria selected, click save pipeline. A loading bar will appear with a message indicating that your pipeline is building. Pipelines will generally take a few hours to build, and you'll receive an email when it's ready for use.
-
When the pipeline is finished building, it will be disabled. You'll need to add a deployment for you to enable it–check out how to add deployments in the next section.
Adding deployments
Deployments are the method through which Faraday users plug their pipelines–and therefore their predictions–back into their stack. Deployments can be configured to send to Faraday-hosted CSV, as well as to data warehouses, cloud buckets, and your favorite ESPs, CRMs, etc. You'll find the deployment section within a pipeline, under the pipeline's definition.
To create a deployment:
- Click add in the appropriate selection under Deployment.
- After clicking add, a window will open to provide specific options that enable you to tailor the deployment to your liking.
Choosing your deployment's representation format
-
Select your data format:
- Hashed (default): Best for deploying audiences to ad platforms. Data is hashed, and not human-readable.
- Referenced: Best for merging data back into your stack. Uses a reference key defined in your dataset's advanced options to identify unique rows. If a reference key is not defined, this option is unavailable to select.
- Identified: Best for direct mail and canvassing campaigns. Data is unhashed and human-readable.
- Aggregated: Best for geotargeted ad campaigns. Select this to see the number of people in each payload element (outcome, persona, cohort) within the area of the geographic type you select.
-
Next, select whether you'd like machine friendly or human friendly column headers.
- Machine friendly: Best for automated systems where consistent naming is relevant.
- Human friendly: Best for convenient, easy-to-read interpretation. Using human friendly makes your column headers instantly recognizable for what they are by including the outcome name and prediction type. This can help make your predictions easier to identify when deploying to ESPs, CRMs, etc, where you'll want to quickly be able to see a contact's persona or propensity score on their contact card.
-
Click next to move onto advanced settings.
Choosing your advanced settings & finalizing deployments
- Choose what, if any, advanced settings you'd like to set for this deployment.
-
Filter: Filter enables you to filter by the persona sets, outcomes, and cohort memberships you selected for your pipeline's payload. You can select specific personas to target in the deployment, e.g. filtering by your Married Mary persona by selecting its personas set choosing the "equal to" operator, and selecting Married Mary, will only include Married Marys in the deployment. Filtering by an outcome allows you to target a percent range of rows by percentile or score, enabling you to focus on only the people that matter most to you.
- Percentile is a whole integer between 1 and 100 (inclusive), and refers to the percentile of the outcome score distribution. The number of individuals in each percentile varies; as a rough estimate, the top 10 score percentiles correspond to the 10% of the population. For example, entering greater than or equal to 81 would filter the top 20% of the population scored.
- Score refers to the raw score of the outcome and is a decimal from 0 to 1. To correctly enter in a score include the decimal point. For example, a score of .5 would be entered as 0.5.
-
Limit: If your deployment has a propensity outcome, you can specify whether or not you'd like to limit your results by a top count of rows or a bottom count of rows.
- Only the top/bottom (count) enables an exact number of rows to export. Caveats to note:
- This limit refers only to rows and not necessarily to individuals. For hashed targets in particular, there are likely to be 2-3 duplicate rows per person (one per email and physical address).
- For larger pipeline sizes (20M+), the ordering is approximate and may not precisely represent the very top/bottom scoring individuals.
- Only the top/bottom (count) enables an exact number of rows to export. Caveats to note:
-
Structure: Under structure, you can rename and reorder columns. Renaming them can make it even more convenient when importing your data into your activation platform. For ad platform deployments like LinkedIn, Facebook, and Google Ads, selecting the appropriate option in the dropdown enables you organize the file in a way that's convenient for upload to that platform.
Column names don't allow spaces, so if you receive an error when saving, check that you don't have any spaces in renamed columns. Instead of "Faraday propensity score," try "faraday_propensity_score."
-
Connection-specific: In this last settings option, you'll see format for hosted CSV deployments, or settings specific to the connection if you're deploying back to your database. These are only recommended for advanced users and can safely be ignored otherwise.
-
- Click save to finish the deployment.
- When your deployment is complete, your pipeline will still be disabled by default, but you're now able to test your deployment with the test deployment button. Clicking test deployment will output 100 rows of your pipeline to the URL in the deployment.
- To keep your pipeline up to date automatically on a daily basis and enable the full results of your pipeline, click the enabled toggle in the upper right. It will display green when the pipeline is enabled, and your full results can be retrieved from the URL listed in the deployment or via the download button.
Understanding deployment columns
A deployment in Faraday will include various points of data about your customers. Use the below chart as a guide when analyzing your deployments.
Column name | Definition | Additional info |
---|---|---|
recipient_id | Faraday's internal key | |
Email address of the individual in FIG | We have a database of hashed emails that will not be present in this column | |
full_name | A concatenated version of name fields/the individual's full name | We prefer to have first_name and last_name rather than full_name for best matching results |
first_name | First name of the individual | |
last_name | Last name of the individual | |
address | Physical address of the individual | |
city | City the individual resides in | |
state | State the individual resides in | |
zip | Zip code/postcode the individual resides in | |
zip4 | The last 4 digits of a nine-digit full zip code used by post offices | |
phone | Phone number of the individual in FIG | |
usps_move_update | Prefilled text to allow ease of use when sending the list to a mailhouse | e.x. "or current resident" |
location_code | Stays the same for the same address across time | |
owner_code | Stays the same for a last name at the same address across time | |
latitude | Latitude of the physical address of the individual | |
longitude | Longitude of the physical address of the individual | |
score | Absolute score (scale of 0.0-1.0) of the individual based on the model used | This field will only appear when a predictive model is used in the pipeline |
score_percentile | Relative rank of the individual's score among all values | e.g. "top 1%" |
persona | The persona that the individual is a member of | |
poi_name | Name of the point of interest closest to the invidual's location | Only available if using Places to predict around a location |
poi_address | Physical address of the point of interest closest to the individual's location | Only available if using Places to predict around a location |
poi_city | City of the point of interest closest to the individual's location | Only available if using Places to predict around a location |
poi_state | State of the point of interest closest to the individual's location | Only available if using Places to predict around a location |
poi_zip | Zip code or postcode of the point of interest closest to the individual's location | Only available if using Places to predict around a location |
If aggregated is the deployment type you select, you'll find the below columns in your deployment:
Column name | Definition |
---|---|
county/metro/state/postcode | The aggregate area unit that was scored, selected in your pipeline's deployment (e.g. 05401 for postcode) |
count_fdy_outcome_propensity_score | The number of people that fall into the given metro, zip, county, etc based on the limit set in the pipeline's deployment |
avg_fdy_outcome_propensity_score | The average propensity score per aggregate area unit (e.g. county or zip) |
count_fdy_outcome_propensity_percentile | The number of people per percentile |
avg_fdy_outcome_propensity_percentile | The average propensity score percentile per aggregate area unit (e.g. county or zip) |