Databricks Delta Sharing

Create a connection between Faraday and Databricks Delta Sharing so that your lakehouse data is always up to date to make predictions.

In this tutorial, we'll show you how to:

  • Connect your Databricks Delta Sharing account to Faraday using a connection.

Let's dive in.

  1. You'll need a Faraday account — signup is free!

Prerequisites

You'll need the following details to create your connection to Databricks Delta Sharing:

  • Credentials JSON requiredtextThe Delta Sharing credentials JSON file contents. Download this file from the activation link provided by Databricks, then paste its contents here, and finally delete the file from your local computer.
  • Share name requiredtextThe name of the Delta Sharing share.
  • Schema name requiredtextThe schema name within the Delta Sharing share.

Granting access

First, you'll need Faraday access to your Databricks Delta Sharing account.

Databricks Delta Sharing is the simplest and most secure way to share lakehouse data with Faraday. Faraday does NOT interact with Databricks compute and does NOT need Databricks credentials. Instead, you share data using Delta Sharing, which provides Faraday with read-only access to specific Delta tables through a secure REST endpoint.

How it works

Delta Sharing works by giving recipients (like Faraday) access to signed URLs for the underlying Parquet files representing your shared table. This is a read-only pattern that requires no Databricks compute resources.

Faraday will:

  1. Parse and securely store the Delta Sharing credentials you provide
  2. Read your shared data via signed file URLs
  3. Copy that data into Faraday-controlled GCS storage
  4. Load that data into BigQuery for analysis
  5. Support incremental refresh using your table's watermark column

Setup process

1. Create a recipient in Databricks

In your Databricks workspace:

  1. Navigate to Delta Sharing in the UI
  2. Create a new Recipient for Faraday
  3. Generate an activation link for this recipient
  4. Copy the activation link

2. Share your data

Important: Delta Sharing only works with external tables, not Databricks managed tables. Ensure your tables are stored as external tables before attempting to share them.

  1. Create or select a Share in Databricks
  2. Add the external tables you want Faraday to access to this share
  3. Grant the Faraday recipient access to the share

3. Download the credentials file

  1. Open the activation link from step 1 in your browser
  2. The browser will download a credentials file (typically named config.share)
  3. Open the downloaded file in a text editor
  4. Copy the entire JSON contents of the file
  5. Delete the downloaded credentials file from your computer for security

The credentials file will look like this:

{
  "shareCredentialsVersion": 1,
  "bearerToken": "xxxxxxxxxxx",
  "endpoint": "https://oregon.cloud.databricks.com/api/2.0/delta-sharing/metastores/...",
  "expirationTime": "9999-12-30T23:59:59.960Z"
}

4. Create your Faraday connection

In Faraday, create a new Databricks Delta Sharing connection with:

  • Credentials JSON: Paste the entire JSON contents from the credentials file
  • Schema name: The schema within the share that contains your tables

Faraday will parse the credentials JSON and securely store the endpoint and bearer token.

5. Create your Faraday dataset

When creating a dataset, you'll specify:

  • Table name: The name of the table within the schema you want to ingest

Data loading patterns

Snapshot load

On initial ingestion, Faraday will:

  1. Retrieve signed Parquet URLs from Delta Sharing
  2. Copy those files into Faraday's GCS staging area
  3. Load them into BigQuery
  4. Store the snapshot metadata for future incremental updates

Incremental load

For ongoing updates, use Faraday's incremental column feature at the dataset level. This requires a monotonic watermark column in your table (such as updated_at).

Faraday will:

  1. Track the last watermark value loaded
  2. On each refresh, query only rows where watermark_column > last_watermark
  3. Merge the new/updated rows into BigQuery
  4. Update the last watermark value

Note: This pattern handles inserts and updates. Deletes require customer-provided tombstone rows or a stable soft-delete flag.

Delta Sharing hierarchy

Delta Sharing uses a three-level hierarchy:

  • Share: A collection of schemas shared with recipients
  • Schema: A collection of tables within a share
  • Table: The actual data table

In Faraday:

  • Share is determined by which share you grant the recipient access to
  • Schema is specified at the connection level
  • Table is specified at the dataset level

Security model

Delta Sharing provides secure, read-only access without exposing Databricks credentials:

  • The bearer token in the activation link grants controlled access only to tables you've explicitly shared
  • Access is time-limited and can be revoked at any time from Databricks
  • Faraday never receives your Databricks login credentials or compute access
  • The bearer token is stored securely in Faraday's secrets vault and is never logged or displayed

Faraday suggests that you use an unguessable string somewhere in the path to your data. This avoids what is called the Confused deputy problem

For example, let's say you were using S3. Instead of naming an S3 bucket s3://faraday-acme/, name it s3://faraday-acme-pwiiprz162ez. This guarantees that malicious actors cannot guess the name and request that Faraday import data from it into their account. The same logic applies to any path that is used to locate data.

Additional notes

  1. External tables only: Delta Sharing only supports external tables. Databricks managed tables cannot be shared via Delta Sharing. If you need to share managed table data, you must first convert them to external tables or create external table copies of your data.

  2. No compute costs: Delta Sharing doesn't use Databricks compute, so there are no compute charges for Faraday's access.

  3. Multiple tables: You can create multiple Faraday datasets from the same connection, each pointing to a different table in the same schema.

  4. Schema scope: If you need to access tables in multiple schemas, create separate Faraday connections for each schema.

  5. Credentials security: The credentials file contains a bearer token that grants access to your shared data. Always delete the downloaded file after copying its contents into Faraday. Never share this file or store it in an insecure location.

Connecting

API via cURL
Dashboard

Use a POST /connections request:

curl https://api.faraday.ai/connections --json '{
  "name": "Databricks Delta Sharing",
  "options": {
    "type": "databricks-delta-sharing",
    "credentials_json": "...",
    "share_name": "...",
    "schema_name": "..."
  }
}'
  1. Wait briefly while Faraday establishes your connection. It shouldn't take long.

Your new connection is now ready to use.