How to use Google Bard in BigQuery (quickstart)
Google just released LLM support inside of BigQuery. Here's how to start using it today.
Google just announced you can use Bard in BigQuery to do Generative AI at scale. That means you can run your queries right inside of your data warehouse, where your data lives. No more rate limits, HTTP clients, error messages, etc.
Well, it's not exactly Bard... it's actually PaLM 2, called text-bison
in Vertex AI land.
It works as "one shot" prompts - where you give it context inside of the prompt itself. It won't remember what you tell it, so it's safe to use on biz data. You can use Vertex AI API to fine-tune your model and I'm expecting this to be available via SQL soon.
First, create a US-based dataset: (or use one you already have... or another region that supports LLM)
bq mk --dataset --location us --project_id=$PROJECT_ID $DATASET_ID
Next, create a BigQuery connection: (there is no way to do this in SQL as far as I know)
$ bq mk --connection --connection_type=CLOUD_RESOURCE --location us --project_id=$PROJECT_NAME
Connection 78793318712.us.40833e7c-1c99-440f-911b-c553cf8485f1 successfully created
Next, see what service account is associated with that connection: (look at the end of the table)
$ bq show --connection 78793318712.us.40833e7c-1c99-440f-911b-c553cf8485f1
Connection 78793318712.us.40833e7c-1c99-440f-911b-c553cf8485f1
name friendlyName description Last modified type hasCredential properties
------------------------------------------------------ -------------- ------------- ----------------- ---------------- --------------- -----------------------------------------------------------------------------------------------
78793318712.us.40833e7c-1c99-440f-911b-c553cf8485f1 25 Jun 17:35:01 CLOUD_RESOURCE False {"serviceAccountId": "bqcx-78793318712-abcd@gcp-sa-bigquery-condel.iam.gserviceaccount.com"}
Next, give the service account associated with that connection access to Vertex AI:
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member='serviceAccount:bqcx-78793318712-abcd@gcp-sa-bigquery-condel.iam.gserviceaccount.com' \
--role='roles/aiplatform.user'
Next, create a remote model inside of BigQuery: (use the connection ID you got above, and note that I'm naming the model my_llm_1
)
CREATE OR REPLACE MODEL
`$PROJECT_ID.$DATASET_ID.my_llm_1`
REMOTE
WITH CONNECTION `projects/$PROJECT_ID/locations/us/connections/40833e7c-1c99-440f-911b-c553cf8485f1`
OPTIONS (remote_service_type ='cloud_ai_large_language_model_v1')
Then you should be able to run LLM queries:
SELECT
*
FROM
ML.GENERATE_TEXT(
MODEL `$PROJECT_ID.$DATASET_ID.my_llm_1`,
(
SELECT
"Say hello world" AS prompt
),
STRUCT()
)
But the killer app for LLM in the data warehouse is to run prompts across tables full of data. Here's how I tried sentiment analysis for some financial products:
CREATE TABLE `$PROJECT_ID.$DATASET_ID.my_outputs` AS (
SELECT
my_id,
ml_generate_text_result["predictions"][0]["content"] AS sentiment_analysis,
FROM
ML.GENERATE_TEXT(
MODEL `$PROJECT_ID.$DATASET_ID.my_llm_1`,
(
SELECT
-- return my id so I can join this back to my data
my_id,
CONCAT(
"Please provide an entity sentiment analysis of bonds, annuities, and CDs in the following text: ",
text_content
) AS prompt,
-- here's the magic: running a prompt for every row of `my_inputs` table
FROM `$PROJECT_ID.$DATASET_ID.my_inputs`
),
-- tell the LLM not to be super creative, but just return the highest probability results always
STRUCT(
0 AS temperature,
1 AS top_k
)
)
)
Here's the final production query that uses some regex parsing to extract structured data from the LLM response:
CREATE TABLE `$PROJECT_ID.$DATASET_ID.my_outputs` AS (
SELECT
my_id,
CASE
-- if it gives back raw JSON, use it
WHEN LEFT(JSON_VALUE(ml_generate_text_result["predictions"][0]["content"]), 1) = '{'
THEN SAFE.PARSE_JSON(JSON_VALUE(ml_generate_text_result["predictions"][0]["content"]))
-- if it gives back JSON inside a markdown block, use it
ELSE SAFE.PARSE_JSON(REGEXP_EXTRACT(JSON_VALUE(ml_generate_text_result["predictions"][0]["content"]), r'```json([^`]+)```'))
END AS sentiment_analysis,
FROM
ML.GENERATE_TEXT(
MODEL `$PROJECT_ID.$DATASET_ID.my_llm_1`,
(
SELECT
my_id,
CONCAT(
"Please provide, as a JSON object with only entity and sentiment, an entity sentiment analysis of bonds, annuities, and CDs in the following text: ",
-- remove backticks that might interfere with parsing response
REGEXP_REPLACE(text_content, r'`', "")
) AS prompt,
FROM `$PROJECT_ID.$DATASET_ID.my_inputs`
),
STRUCT(
0 AS temperature,
1 AS top_k
)
)
)
I'm hoping it will get easier to extract structured results soon, using jsonformer for example.
Thanks to the BigQuery ML team for implementing this. I think it's the obvious place for LLM!
Ready for easy AI?
Skip the ML struggle and focus on your downstream application. We have built-in sample data so you can get started without sharing yours.