Crux dataset Bigquery - Query for Min/Avg/Max LCP, FID and CLS

690 Views Asked by At

I have been exploring the Crux dataset in big query for last 10 days to extract data for data studio report. Though I consider myself good at SQL, as I have mostly worked with oracle and SQL server, I am finding it very hard to write queries against this dataset. I started from this article by Rick Viscomi, explored the queries on his github repo but still unable to figure it out.

I am trying to use the materialized table chrome-ux-report.materialized.metrics_summary to get some of the metrics but I am not sure if the Min/Avg/Max lcp (in milliseconds) for a time period (month for example) could be extracted from this table. What other queries could I possibly try which requires less data processing. (Some of the queries that I tried expired my free TB of data processing on big query).

Any suggestion, advise solution, queries are more than welcome since the documentation about the structure of the dataset and queries against it is not very clear.

1

There are 1 best solutions below

2
Betjens On

For details about the fields used on the report you can check on the main documentation for the chrome ux report specially on the last part with data format which shows the dimensions and how its interpreted as show below:

Dimension
origin                          "https://example.com"
effective_connection_type.name  4G
form_factor.name                "phone"
first_paint.histogram.start     1000
first_paint.histogram.end       1200
first_paint.histogram.density   0.123

For example, the above shows a sample record from the Chrome User Experience Report, which indicates that 12.3% of page loads had a “first paint time” measurement in the range of 1000-1200 milliseconds when loading “http://example.com” on a “phone” device over a ”4G”-like connection. To obtain a cumulative value of users experiencing a first paint time below 1200 milliseconds, you can add up all records whose histogram’s “end” value is less than or equal to 1200.

For the metrics, in the initial link there is a section called methodology where you can get information about the metrics and dimensions of the report. I recommend going to the actual origin source table per country and per site and not the summary as the data you are looking for can be obtained there. In the Bigquery part of the documentation you will find samples of how to query those tables. I find this relatable:

SELECT
    SUM(bin.density) AS density
FROM
    `chrome-ux-report.chrome_ux_report.201710`,
    UNNEST(first_contentful_paint.histogram.bin) AS bin
WHERE
    bin.start < 1000 AND
    origin = 'http://example.com'

In the example above we’re adding all of the density values in the FCP histogram for “http://example.com” where the FCP bin’s start value is less than 1000 ms. The result is 0.7537, which indicates that ~75.4% of page loads experience the FCP in under a second.

About query estimation cost, you can see estimating query cost guide on google official bigquery documentation. But using this tables due to its nature consumes a lot of processing so filter it as much as possible.