Why won't this duckdb query of s3/parquet data save 'EXPLAIN ANALYZE' profiling info?

95 Views Asked by At

(UPDATED 3/10)

Based on this duckdb docs page on profiling, I would have thought that my code snippet below should save a json file of profiling/timing stats to a query_profile.json, which I should be able to use to generate an html file with python -m duckdb.query_graph query_profile.json

However, my code below (reproducable as it just hits a public s3 bucket, though you'll need your own aws creds in your own .env file) does not produce such a query_profile.json file:

import duckdb
import s3fs
from dotenv import dotenv_values

# load environment variables from .env file
ENV = dotenv_values(".env")

# Configurable query params
TAXI_COLOR = "yellow"
YEAR = 2023
PROFILE = True

# where to save result (data) locally
dbfile = 'taxi_data.duckdb'

# where to save profiling results
profile_file = 'query_profile.json'

# Define the S3 glob pattern to match the desired parquet files
s3_glob_path = f"s3://nyc-tlc/trip data/{TAXI_COLOR}_tripdata_{YEAR}*.parquet"

# query the s3 parquet data using duckdb
with duckdb.connect(database=dbfile) as con:
    
    # load extension required for reading from s3
    con.execute("INSTALL 'httpfs';")
    con.execute("LOAD 'httpfs';")

    # Set the AWS credentials to access the S3 bucket
    con.execute("SET s3_region='us-east-1';")
    con.execute(f"SET s3_access_key_id = '{ENV['AWS_ACCESS_KEY_ID']}';")  
    con.execute(f"SET s3_secret_access_key = '{ENV['AWS_SECRET_ACCESS_KEY']}';")  

    # Enable profiling and save the profiling results directly to a file
    con.execute(f"SET profiling_output='{profile_file}'")
    con.execute("SET profiling_mode='detailed'")

    # Execute the query to load and save the data directly to the specified DuckDB file
    tablename = f'{TAXI_COLOR}_tripdata_{YEAR}'
    ea = "EXPLAIN ANALYZE " if PROFILE else ""
    query = f"""{ea}CREATE OR REPLACE TABLE {tablename} AS
                SELECT * FROM read_parquet(['{s3_glob_path}'])
            """
    print(query)
    con.execute(query)

print(f"Data saved to {dbfile} as {tablename}")
print(f"Profiling results saved to {profile_file}")
1

There are 1 best solutions below

0
Tom Ebergen On

I think the issue here is that when you run

con.execute(f"SET profiling_output='{profile_file}'")
con.execute("SET profiling_mode='detailed'")

DuckDB will output the profiling information for SELECT/UPDATE/DELETE statements into the file. If you run an EXPLAIN or EXPLAIN ANALYZE query, the profiling information will not be written to the desired file, but will rather show up in the result. If you run a straight SELECT/UPDATE/DELETE query, then the profiling information is written to the file. and the result is the result of the query.

Let me know if that helps.