How to run a BigQuery query in Python

This video explains
How to run a BigQuery query in Python

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Industries in Orbit Looping

--

Chapters
00:00 Question
00:48 Accepted answer (Score 15)
01:17 Answer 2 (Score 2)
01:36 Answer 3 (Score 1)
02:03 Answer 4 (Score 1)
02:45 Thank you

--

Full question
https://stackoverflow.com/questions/4500...

Accepted answer links:
[BigQuery Python client lib]: https://github.com/GoogleCloudPlatform/g...
[https://googlecloudplatform.github.io/go...]: https://googlecloudplatform.github.io/go...
[BigQuery Python client tutorial]: https://cloud.google.com/bigquery/docs/r...

Answer 4 links:
[https://googleapis.github.io/google-clou...]: https://googleapis.github.io/google-clou...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #googlebigquery

#avk47

ACCEPTED ANSWER

Score 17

You need to use the BigQuery Python client lib, then something like this should get you up and running:

from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()

https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html

See the current BigQuery Python client tutorial.

ANSWER 2

Score 5

Here is another way using a JSON file for the service account:

>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
...     print(row)

ANSWER 3

Score 3

This is a good usage guide: https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html

To simply run and write a query:

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'

job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""

# Start the query, passing in the extra configuration.
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location="US",
    job_config=job_config,
)  # API request - starts the query

query_job.result()  # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))

ANSWER 4

Score 1

I personally prefer querying using pandas:

# BQ authentication
import pydata_google_auth
SCOPES = [
    'https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
]

credentials = pydata_google_auth.get_user_credentials(
    SCOPES,
    # Set auth_local_webserver to True to have a slightly more convienient
    # authorization flow. Note, this doesn't work if you're running from a
    # notebook on a remote sever, such as over SSH or with Google Colab.
    auth_local_webserver=True,
)

query = "SELECT * FROM my_table"

data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')