I'm trying to implement a nested query to pull no more than one sample per log, and I think I know how to implement its components separately:
- Query a set of logs that contain data relevant to my analysis:
SELECT
runs.object_type as object_type,
runs.name as log_name,
from project.runs.latest_runs
WHERE object_type = "ROCKET"
group by object_type, log_name
This results in a list of log names, e.g. "log_name_2021_09_01", "log_name_2021_09_03" etc.
- Query no more than one event with a specific condition from a single known log:
SELECT
object.path_meters as pos,
object_speed as speed,
log.run as run_name,
FROM project.events.last30days
WHERE log.run = "log_name_2021_10_01"
AND object.speed > 0.0
LIMIT 1
The above query returns no more than one sample for the specified log.
How can I combine these queries to pull samples from a set of logs that is returned by Query 1, and at the same time there should be no more than one sample per log?
Update:
Let's say a DB contains three logs:
- log_name_2021_09_01. The associated object_type to the log is ROCKET. The log contains 100k data samples: 90k of them have object.speed = 0.0, 10k of them have speed > 0.0.
- log_name_2021_09_02. The associated object_type to the log is CAR. The log also contains 100k samples with a similar proportion to log 1.
- log_name_2021_09_03. The associated object_type to the log is ROCKET. The log also contains 100k samples with a similar proportion to log 1.
I'm only interested in logs with the object type ROCKET. Two logs correspond to this condition: log_name_2021_09_01 and log_name_2021_09_03. These log names can be obtained by query 1 depicted above. I'd like to pull only one sample point (with speed > 0) from each of the two logs. That is, in the end I'd like to have a query that returns two samples: one from log_name_2021_09_01 and one from log_name_2021_09_03.
You question omits actual example data, so we're forced to infer much from your description. It is strongly recommended that your questions include sample data and the results you'd want from that sample data. This allows us both a concrete example to base our understanding on, and provides a test-set for us to use when developing an answer.
That said, the following should be something like what you're looking for... (For each log, it only selects the one row with the highest
pos.)For a more exact answer, please give: