Datasets is where you plug your customer data into Faraday, then organize it to make it usable for predictions. Here, you'll organize your data into identity sets, events, and traits that can be used throughout the Faraday platform. By plugging your data into Faraday, you're able to use it to target specific desirable outcomes that your customers take–such as purchases in leads.
Inside Datasets, you'll find a list of your current datasets if you have any, with as columns for:
- Source: the source type of this dataset, such as CSV.
- Row Count: the number of rows in the dataset.
- Identities: the number of unique identities, or people, in the dataset.
- Matches: the number of people in your dataset that Faraday found matches for.
- Match rate: the number of matches divided by the number of identities.
- Events: the event type of the dataset, such as orders or churns.
- Status: whether the dataset is ready, queued, or errored.
- Select new dataset in the upper right of Datasets.
- Next, choose to create your data set from either a connection or CSV. The connections that populate here are pulled from the Connections, which is where you can add new ones.
- Regardless of whether you chose a connection-based or CSV-based dataset, after clicking finish, you'll receive a notification that your dataset has been created, and you'll be moved to the edit dataset view where you can customize it.
Identity sets are used to help Faraday identify people in your data. With this information, you can create cohorts of your customers (or anyone else identified in your data) and outcomes to make predictions about these individuals.
To add an identity set:
- In the dataset's definition (default) tab, click add identity set.
- Give your identity set a name.
- Next, match the properties that exist in your data in the field in dataset column with the Faraday property names in the left column.
Not all property fields are required, but email and address are the most useful for identifying people. The more fields you include, the more likely to match the people are.
- Once you're done matching your properties, click finish to save the identity set. If you need to edit or delete the identity set at any point, click the three dots (...) on the right.
Events show Faraday how to recognize actions taking place in your data, such as purchases, renewals, click events, upsells, etc. Dates are often the most useful piece of data for events.
Event streams that you define in datasets are available for selection when creating cohorts, which are then used to create outcomes, which then go on to help build your predictive pipelines.
- In the dataset's definition (default) tab, click add event to get started, which will open the new event window.
- Next, choose whether to add the event to an existing event stream, or create a new one.
- If you're using this data to add onto an existing event stream, select the appropriate event stream from the dropdown.
- If you don't have an existing event stream, or want to create a new one, select create new event stream.
Unsure which option to select? Generally, if the new dataset you're creating contains event data that a previously-made dataset also includes, such as order or churn dates, you'll want to add this event to that existing event stream to keep your data clean.
- In either of the above cases, click next to be taken to the event definition screen.
- Select the timestamp property for this event from your data in the left column.
- Optionally, select the format of the timestamp in the right column. This is generally auto-detected and if it is, it won't be modifiable.
- Define the value of the event, if applicable.
- Optionally, map any relevant properties that exist in your data by entering a name, associated field in dataset, and format.
- Click finish to save the event. If you need to edit or delete the event at any point, click the three dots (...) on the right.
Traits are interesting data points that can enhance the usefulness of your data in Faraday, but aren't used to identify a person or an event. For example, whether a person owns or rents their home, hobbies, income, etc. These traits can be appended to pipelines, used to create cohorts, used for analysis, etc.
Traits are completely optional, and are generally an edge use case. If you're unsure whether or not you should include them, it's best to avoid doing so.
- In the dataset's definition (default) tab, click add trait to get started, which will open the new trait window.
- Next, give the trait a name.
- Lastly, choose the corresponding field in the dataset. For example, you may have a field in your data called category that lists the customer's first purchase category.
Once you've finished adding an identity set, event, and/or trait, click Save dataset to save it for use throughout Faraday.
Sometimes it might be beneficial for you to add additional data to a dataset. For example, if your original dataset was a manual upload of order data from the previous month, and you'd like to append this month's order data.
If you've configured a dataset via connection to your data warehouse, it will automatically be kept up to date. As such, this section is focused on manual, CSV uploads.
- To start, you'll want to head to the dataset you'd like to configure, expand the advanced tab, and find replace all with latest file. By default, this setting is set to false, so each time you upload data via the below steps, the new file is merged into the dataset. If the value is changed to true, the entire dataset is replaced with a new file upload.
- Once in the data tab of your dataset, drag your new file to the upload prompt or click to open the file picker. When your additional file's upload is complete, it will appear in the files in dataset list and the dataset status banner at the top of the dataset (green for ready, red for error) will display the upload's refresh date.
Note: Your additional file upload must be in the same format as the data uploaded previously in this dataset. Columns in the new CSV must exactly match those in the original CSV.
- With your new data uploaded, you can now dig back into your predictive building blocks–your cohorts, outcomes, personas, and more–and make any required edits. For example, your newest upload may have included second purchases from a customers who were in your first upload, so you can now jump into cohorts to create or update a repeat purchaser cohort.
Faraday matches your customers into our database at the individual level, so the more info about each individual in your data, the more likely you are to have a good match rate. Date fields are extremely important when building predictive models. As an example, we like to know if someone is a customer, but more importantly, we need to know when they become a customer, or when they purchased a certain product, or took some other specific action. Often, many of the key date fields in your data might live in the orders table in your database.
|First name||Customer first name|
|Last name||Customer last name|
|Street address||Customer street address|
|Customer email address|
|Phone||Customer phone number|
|Customer||The field in your data that determines a customer|
|Lead data||The field(s) in your data that determines a lead. Do you have various lead categories? What determines when a lead converts? Lead status?|
|Product data||Date of purchase, item purchased, price of item, product types, number of orders|
|Subscription data||Date the subscription started and/or ended|
|Customer ID||The field that will be used to match your predictions back to the appropriate customer in your stack (e.g. Salesforce ID)|
If you've already checked out our article what data Faraday expects then you're well on your way to understanding what is specifically needed for prediction modeling in Faraday's system.
However, to drive home these best practices, here are some hypothetical examples to help understand how we bridge the gap between the incoming data we take and the models that you help design to predict your desired outcomes!
Even if you seem to understand the concept of an event stream it is always nice to see visual examples that show the shape of the data as you look at it in a database or spreadsheet.
Faraday has 4 base-level data points we utilize when we're processing any particular stream of events that you give us:
- date (datetime)
- value (i.e. - monetary value of said event per your business)
- channel (e.g. - "acquisition source")
Note: Not limited only to the above - give us all data points you desired to, which are meaningful to your business outcome!
Example 1: item-level data
Example 2: customer-level data
Example 3: event-level data
Example 3, the event-level data image, is the key to focus on. There are a couple assumptions here:
The event examples above are based on orders. What if your business doesn't specifically operate on orders? No problem, you may simulate this same data for any specific event stream that constitutes a individual's behavior in your system:
Insurance policies started
Emails clicked or bounced
We also assume your product set within the file is made up of 10-20 (max) easily-readable grouped categories. These are high-indexing across the historical events you provided, so ideally they have coverage across most of the events that have happened.
- Overall, you might have 1000 different products.
- These products may need to be mapped from SKUs or pattern-matched according to some rule you have:
- "Has the word 'deluxe' in the title."
- "SKUs beginning with "AM-" are our armchairs."
If you need to map or group your products in a more concise manner that might not be a simple pattern you can elucidate, a "SKU mapping" spreadsheet can supplement your data. We will take your mapping spreadsheet and join it direct to your data as if you provided it in the main dataset.
Example of SKU mapping
Why would Faraday be interested in all the metadata that accompanies a particular event record (i.e. - value, channel, product, etc)? Read the next section to learn more about how we use these features to roll-up data by individual.
Many clients may come to the table providing data that is "rolled up" or "aggregated" already, which does not provide the event level data Faraday requests. While this type of data is holistically useful for business analysts and business leadership to understand trends, patterns and summary attributes about an individual, Faraday already has a automated system in place to do just this.
Example of rollup data:
The reason Faraday is asking for event-specific data is because our prediction modeling system is built on individuals entering certain cohorts, based on a date. Having only the first or last date of an event (as in the above screenshot) actually hinders our ability to model off of your customers.
Faraday will take in your data:
- Event-by-event, along with all other associated fields in the row.
- Match these events to known identities using our algorithm
- Make assessments on meaningful, strong-signaled patterns present.
Therefore, rollups are the way to represent the aggregation of a single field in an event stream, based on some window prior and relative to the reference date provided in its definition. These can be leveraged by cohort membership and joined directly to those individuals for enrichment to the outcome. More common examples of types of aggregations might be (but not limited to):
UNION (distinct values)
windowed DAYS FROM
which may be translated into things like:
COUNT of orders from day 1 to day 90
SUM of policy payments received from day 30 to day 60
- UNION of distinct browser types viewing pages last 7 days
MAX value of investments from 284 to 365 days
MIN value of payment (all time)
DAYS FROM first event to last event
The result of a particular rollup is a single feature for a household (specifically, an "individual") in either the training or scoring data. These stand as important first-party data characteristics that can be used to tune and/or queue your model to providing a greater level of specificity on behaviors you may not have even know existed within the data.
To delete a dataset, click the options menu (three dots) on the far right of the dataset you'd like to delete, then click delete. If the dataset is in use by other objects in Faraday, such as an event stream or trait, the delete dataset popup will indicate that you need to modify those in order to delete the dataset. Once there are no other objects using this dataset, you can safely delete it.
See the deletions documentation for the order of dependencies, or the order of deletion priority.