Create dataset
https://api.faraday.ai/v1/datasetsCreate a new dataset
Authentication
Provide your API key in the Authorization header. You can find your API key in the Settings page of the dashboard.
Authorization: Bearer YOUR_TOKENBody
The dataset to create
connection_idstring<uuid>If this is a "retrieve" dataset, the UUID of a connection - see /connections for more detail.
Only a subset of connection types...
"5e0dfa56-2d52-4c06-a870-bc79c71e86a3"identity_providersarray[object]Which identity providers to use for matching, in order of priority.
By default, all datasets will match on 'fig' data.
The dataset's match-rate can be boosted by adding other identity providers.
Please contact support to get access to this feature.
[[{"provider":"fig"}]](Parameters used to POST a new value of the IdentitySets type.)
A mapping of {identity set name} (ex. shipping) -> {identity set object}.
Describes all the logical groupings of personally-identif...
{
"shipping1": {
"city": "shipping_address_city",
"house_number_and_street": [
"shipping_address_address1",
"shipping_address_address2"
],
"person_first_name": "shipping_address_first_name",
"person_last_name": "shipping_address_last_name",
"phone": "shipping_address_phone",
"postcode": "shipping_address_zip",
"state": "shipping_address_state"
},
"shipping2": {
"freeform_address": "shipping_address",
"person_first_name": "shipping_address_first_name",
"person_last_name": "shipping_address_last_name",
"phone": "shipping_address_phone"
}
}incremental_columnstringA column specifying a date associated with a record.
Ideally incremental_column SHOULD be set to make data loading more efficient.
Ideally ALSO set upsert_columns to ensure that data is not dupl...
"updated_at"An identifying name for this dataset.
Dataset connection options
output_to_streamsdictionary[string, object](Parameters used to POST a new value of the OutputToStreams type.)
Describes how to transform the dataset into one or more streams.
Streams typically represent events. They can have multiple data...
{
"orders": {
"data_map": {
"channel": "referring_site",
"datetime": "processed_at",
"product": {
"column": "skus",
"format": "list_comma_separated"
},
"value": "total_line_items_price"
}
}
}output_to_streams_arrayarray[object]An array-based approach to transforming datasets into streams. This structure allows multiple columns from the same dataset to map to the same stream, each with their own property configurations.
Unl...
output_to_traitsdictionary[string, object](Parameters used to POST a new value of the OutputToTraits type.)
A mapping of trait name to trait definition, where the key is what the trait will be called in Faraday's system.
Traits are charac...
previewbooleanA dataset in preview mode will only detect columns and produce a data preview, but not ingest the data.
Defaults to undefined, which is equivalent to false.
trueprivacystringCurrently supported:
- 'suppress' - data can be used for modeling but will be excluded from pipelines and deployments (do not contact)
- 'delete' - data can not be used for modeling and will be ex...
"suppress"suppress, deletereference_key_columnstringDeprecated: use reference_key_columns instead
The name of the column that references an ID from an external system.
Setting this enables export of data via /targets that is keyed on this field.
"customer_id"^[_a-zA-Z0-9][_a-zA-Z0-9 :/-]*$reference_key_columnsarray[string]The names of columns that reference IDs from an external system.
Setting this enables export of data via /targets that is keyed on this field.
["customer_id"]upsert_columnsarray[string]Also known as the "primary key" of the dataset. A column or set of columns that uniquely identify an input row.
Ideally upsert_columns SHOULD be set so that data is not duplicated in the dataset.
...
["id"]Responses
201The dataset was successfully created
archived_atstring<date-time>If not null, this resource will no longer receive updates, but will still be visable.
connection_idstring<uuid>If this is a "retrieve" dataset, the UUID of a connection - see /connections for more detail.
Only a subset of connection types...
"5e0dfa56-2d52-4c06-a870-bc79c71e86a3"When this resource was created.
detected_columnsarray[object]An array of columns
[
{
"data_type": "text",
"is_nullable": false,
"name": "id"
},
{
"data_type": "long",
"is_nullable": true,
"name": "amount"
}
]enrichmentdictionary[string, object]A mapping of enrichment sources (like FIG) to their enrichment metadata
A unique ID for this resource.
"8cd2dcf6-f2b3-4318-b8b3-eb19ab18d29d"identified_countintegerThe number of unique people identified in this dataset.
This can be different from the row_count, for example, in a table of orders.
The same person can order multiple things, so there are more rows t...
identity_providersarray[object]Which identity providers to use for matching, in order of priority.
By default, all datasets will match on 'fig' data.
The dataset's match-rate can be boosted by adding other identity providers.
Please contact support to get access to this feature.
[[{"provider":"fig"}]]A mapping of {identity set name} (ex. shipping) -> {identity set object}.
Describes all the logical groupings of personally-identifiable information present in the dataset.
Identity set objects map...
incremental_columnstringA column specifying a date associated with a record.
Ideally incremental_column SHOULD be set to make data loading more efficient.
Ideally ALSO set upsert_columns to ensure that data is not dupl...
"updated_at"last_read_input_atstring<date-time>The last time this resource's input was read.
last_updated_config_atstring<date-time>The last time this resource's configuration was updated. If this is more recent than last_updated_output_at, the resource will be rebuilt.
last_updated_output_atstring<date-time>The last time this resource successfully built.
managedbooleanA managed dataset requires special configuration from a Faraday admin, and is read-only.
truematched_countintegerDeprecated: Use enrichment instead.
The number of identified people in this dataset that Faraday found a match for in its data.
This will only be displayed if the dataset built successfully.
merge_datasetsarray[object]List of merge datasets using this dataset as a source.
An identifying name for this dataset.
Dataset connection options
If specified, all columns that are not excluded will be output as traits.
output_to_streams may not be not be specified when setting this parameter.
output_to_streamsdictionary[string, object]Describes how to transform the dataset into one or more streams.
Streams typically represent events. They can have multiple dataset sources and each dataset can be used to populate multiple streams....
output_to_streams_arrayarray[object]An array-based approach to transforming datasets into streams. This structure allows multiple columns from the same dataset to map to the same stream, each with their own property configurations.
Unl...
output_to_traitsdictionary[string, object]A mapping of trait name to trait definition, where the key is what the trait will be called in Faraday's system.
Traits are characteristics about people, that are unrelated to particular events.
Whe...
previewbooleanA dataset in preview mode will only detect columns and produce a data preview, but not ingest the data.
Defaults to undefined, which is equivalent to false.
trueprivacystringCurrently supported:
- 'suppress' - data can be used for modeling but will be excluded from pipelines and deployments (do not contact)
- 'delete' - data can not be used for modeling and will be ex...
"suppress"suppress, deletereference_key_columnstringDeprecated: use reference_key_columns instead
The name of the column that references an ID from an external system.
Setting this enables export of data via /targets that is keyed on this field.
"customer_id"^[_a-zA-Z0-9][_a-zA-Z0-9 :/-]*$reference_key_columnsarray[string]The names of columns that reference IDs from an external system.
Setting this enables export of data via /targets that is keyed on this field.
["customer_id"]row_countintegerThe total number of rows in this dataset.
This will only be displayed if the dataset built successfully.
10000sampleobjectIf supported by the connection, a sample of the data.
The current state of this resource and any updates.
"pending"new, starting, running, ready, errorstatus_changed_atstring<date-time>When the status of this resource was last updated.
status_errorstringIf this resource has status == "error", this will contain an error message.
When this resource was last updated.
updatesarray[object]A list of updates including how many rows were added.
If the dataset updates incrementally, these rows are added to the previous total. If the dataset is overwritten upon every ingestion, then these rows will be the new total row count.
[
{
"datetime": "2021-10-05T14:48:00.000Z",
"rows_added": 123
},
{
"datetime": "2021-10-06T14:48:00.000Z",
"rows_added": 32
}
]upsert_columnsarray[string]Also known as the "primary key" of the dataset. A column or set of columns that uniquely identify an input row.
Ideally upsert_columns SHOULD be set so that data is not duplicated in the dataset.
...
["id"]400The request was invalid.
A Faraday error code.
Some possible values include:
Generic HTTP Errors
- BAD_REQUEST: The request could not be validated.
- FORBIDDEN: You do not have permission to access the specified resour...
"ERROR_TYPE"BAD_REQUEST, FORBIDDEN, MAX_RESOURCES_REACHED, INTERNAL_SERVER_ERROR, INVALID_AUTHORIZATION, NOT_FOUND, MALFORMED_API_KEY, MISSING_API_KEY, EXPIRED_API_KEY, VALIDATION_FAILED, CONFLICTA unique ID for this error. Please include this in bug reports.
"082f9513-901c-4308-8081-902a8fe22d7e"validationErrorsarray[object]JSON Schema validation errors, if any.
401No API key was supplied.
A Faraday error code.
Some possible values include:
Generic HTTP Errors
- BAD_REQUEST: The request could not be validated.
- FORBIDDEN: You do not have permission to access the specified resour...
"ERROR_TYPE"BAD_REQUEST, FORBIDDEN, MAX_RESOURCES_REACHED, INTERNAL_SERVER_ERROR, INVALID_AUTHORIZATION, NOT_FOUND, MALFORMED_API_KEY, MISSING_API_KEY, EXPIRED_API_KEY, VALIDATION_FAILED, CONFLICTA unique ID for this error. Please include this in bug reports.
"082f9513-901c-4308-8081-902a8fe22d7e"validationErrorsarray[object]JSON Schema validation errors, if any.
403Access to this resource was forbidden.
A Faraday error code.
Some possible values include:
Generic HTTP Errors
- BAD_REQUEST: The request could not be validated.
- FORBIDDEN: You do not have permission to access the specified resour...
"ERROR_TYPE"BAD_REQUEST, FORBIDDEN, MAX_RESOURCES_REACHED, INTERNAL_SERVER_ERROR, INVALID_AUTHORIZATION, NOT_FOUND, MALFORMED_API_KEY, MISSING_API_KEY, EXPIRED_API_KEY, VALIDATION_FAILED, CONFLICTA unique ID for this error. Please include this in bug reports.
"082f9513-901c-4308-8081-902a8fe22d7e"validationErrorsarray[object]JSON Schema validation errors, if any.
404The requested resource ID was not found.
A Faraday error code.
Some possible values include:
Generic HTTP Errors
- BAD_REQUEST: The request could not be validated.
- FORBIDDEN: You do not have permission to access the specified resour...
"ERROR_TYPE"BAD_REQUEST, FORBIDDEN, MAX_RESOURCES_REACHED, INTERNAL_SERVER_ERROR, INVALID_AUTHORIZATION, NOT_FOUND, MALFORMED_API_KEY, MISSING_API_KEY, EXPIRED_API_KEY, VALIDATION_FAILED, CONFLICTA unique ID for this error. Please include this in bug reports.
"082f9513-901c-4308-8081-902a8fe22d7e"validationErrorsarray[object]JSON Schema validation errors, if any.
500An internal server error occurred.
A Faraday error code.
Some possible values include:
Generic HTTP Errors
- BAD_REQUEST: The request could not be validated.
- FORBIDDEN: You do not have permission to access the specified resour...
"ERROR_TYPE"BAD_REQUEST, FORBIDDEN, MAX_RESOURCES_REACHED, INTERNAL_SERVER_ERROR, INVALID_AUTHORIZATION, NOT_FOUND, MALFORMED_API_KEY, MISSING_API_KEY, EXPIRED_API_KEY, VALIDATION_FAILED, CONFLICTA unique ID for this error. Please include this in bug reports.
"082f9513-901c-4308-8081-902a8fe22d7e"validationErrorsarray[object]JSON Schema validation errors, if any.
Tags
Test request
Request snippet
curl -X POST 'https://api.faraday.ai/v1/datasets' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"connection_id": "5e0dfa56-2d52-4c06-a870-bc79c71e86a3",
"identity_providers": [
[
{
"provider": "fig"
}
]
],
"identity_sets": {
"shipping1": {
"city": "shipping_address_city",
"house_number_and_street": [
"shipping_address_address1",
"shipping_address_address2"
],
"person_first_name": "shipping_address_first_name",
"person_last_name": "shipping_address_last_name",
"phone": "shipping_address_phone",
"postcode": "shipping_address_zip",
"state": "shipping_address_state"
},
"shipping2": {
"freeform_address": "shipping_address",
"person_first_name": "shipping_address_first_name",
"person_last_name": "shipping_address_last_name",
"phone": "shipping_address_phone"
}
},
"incremental_column": "updated_at",
"name": "string",
"options": {
"table_name": "string (pattern: ^[_a-zA-Z][a-zA-Z0-9_]+$)",
"type": "aws_aurora_mysql"
},
"output_to_streams": {
"orders": {
"data_map": {
"channel": "referring_site",
"datetime": "processed_at",
"product": {
"column": "skus",
"format": "list_comma_separated"
},
"value": "total_line_items_price"
}
}
},
"output_to_streams_array": [
{
"properties": {
"key": {
"column_name": "age_in_years",
"decode": {
"cast": "string",
"map": {
"a": 1,
"b": 2
},
"sql": "SELECT * FROM table"
},
"recode": {
"map": {
"a": 1,
"b": 2
},
"sql": "SELECT * FROM table"
},
"value": "2025-10-01T00:00:00.000Z"
}
},
"stream_id": "eefb0735-6ad6-4611-a832-40bab2968353",
"stream_name": "attribute_assertion_figv2_age"
}
],
"output_to_traits": {
"key": {
"column_name": "skus",
"format": "currency_cents",
"null_values": [
"string"
],
"transformation_table": {
"key": "string"
},
"value": true
}
},
"preview": true,
"privacy": "suppress",
"reference_key_column": "customer_id",
"reference_key_columns": [
"customer_id"
],
"upsert_columns": [
"id"
]
}'Example response
{
"archived_at": "2024-01-01T12:00:00Z",
"connection_id": "5e0dfa56-2d52-4c06-a870-bc79c71e86a3",
"created_at": "2024-01-01T12:00:00Z",
"detected_columns": [
{
"data_type": "text",
"is_nullable": false,
"name": "id"
},
{
"data_type": "long",
"is_nullable": true,
"name": "amount"
}
],
"enrichment": {
"key": {
"any": 123,
"person": 123,
"residence": 123
}
},
"id": "8cd2dcf6-f2b3-4318-b8b3-eb19ab18d29d",
"identified_count": 0,
"identity_providers": [
[
{
"provider": "fig"
}
]
],
"identity_sets": {
"key": {
"address_line_1": "shipping_address_address1",
"address_line_2": "shipping_address_address2",
"city": "shipping_address_city",
"email": "email_address",
"email_hash": "email_hash",
"freeform_address": "shipping_address",
"house_number_and_street": [
"shipping_address_address1",
"shipping_address_address2"
],
"person_first_name": "shipping_address_first_name",
"person_full_name": "shipping_address_full_name",
"person_last_name": "shipping_address_last_name",
"phone": "shipping_address_phone",
"postcode": "shipping_address_zip",
"state": "shipping_address_state"
}
},
"incremental_column": "updated_at",
"last_read_input_at": "2024-01-01T12:00:00Z",
"last_updated_config_at": "2024-01-01T12:00:00Z",
"last_updated_output_at": "2024-01-01T12:00:00Z",
"managed": true,
"matched_count": 0,
"merge_datasets": [
{
"dataset_id": "5e0dfa56-2d52-4c06-a870-bc79c71e86a3",
"join_column": "id"
}
],
"name": "string",
"options": {
"table_name": "string (pattern: ^[_a-zA-Z][a-zA-Z0-9_]+$)",
"type": "aws_aurora_mysql"
},
"output_all_columns_as_traits": {
"exclude": [
"id"
],
"include": [
"id"
]
},
"output_to_streams": {
"key": {
"classic": true,
"conditions": [
{
"_eq": "string",
"_gt": 0,
"_gte": 0,
"_in": [
"string"
],
"_lt": 0,
"_lte": 0,
"_matches": "string",
"_neq": "string",
"_nin": [
"string"
],
"_nnull": true,
"_null": true,
"column_name": "string",
"optional": true
}
],
"data_map": {
"datetime": {
"column_name": "skus",
"format": "currency_cents"
}
},
"stream_id": "eefb0735-6ad6-4611-a832-40bab2968353"
}
},
"output_to_streams_array": [
{
"properties": {
"key": {
"column_name": "age_in_years",
"decode": {
"cast": "string",
"map": {
"a": 1,
"b": 2
},
"sql": "SELECT * FROM table"
},
"recode": {
"map": {
"a": 1,
"b": 2
},
"sql": "SELECT * FROM table"
},
"value": "2025-10-01T00:00:00.000Z"
}
},
"stream_id": "eefb0735-6ad6-4611-a832-40bab2968353",
"stream_name": "attribute_assertion_figv2_age"
}
],
"output_to_traits": {
"key": {
"column_name": "skus",
"format": "currency_cents",
"null_values": [
"string"
],
"transformation_table": {
"key": "string"
},
"value": true
}
},
"preview": true,
"privacy": "suppress",
"reference_key_column": "customer_id",
"reference_key_columns": [
"customer_id"
],
"resource_type": "datasets",
"row_count": 10000,
"sample": {},
"status": "pending",
"status_changed_at": "2024-01-01T12:00:00Z",
"status_error": "string",
"updated_at": "2024-01-01T12:00:00Z",
"updates": [
{
"datetime": "2021-10-05T14:48:00.000Z",
"rows_added": 123
},
{
"datetime": "2021-10-06T14:48:00.000Z",
"rows_added": 32
}
],
"upsert_columns": [
"id"
]
}