I am trying to subset flights table without the use of nested queries using sqldf library in R. I have calculated median and average values using one query and stored it into different tables. In the end, I am trying to apply WHERE condition to solve but I am not able to understand why the JOIN function is behaving the way it is. I have two questions.
- How is it able to perform INNER JOIN without two keys specified after ON argument?
- It is basically putting median and average value in all the rows after JOINS. Why?
Can anyone explain what is going on here?
library(sqldf)
average = sqldf('SELECT AVG(distance) AS avg FROM flights')
median = sqldf('SELECT MEDIAN(distance) AS med FROM flights')
result = sqldf(
"SELECT flights.*
FROM flights
INNER JOIN average_q1 ON flights.distance
INNER JOIN median_q1 ON flights.distance
WHERE distance BETWEEN median_q1.med AND average_q1.avg"
)
1) The code in the question is not reproducible (see posting instructions at the top of the r tag page) so we will use the following:
giving:
2) This can be done using a window function which avoids the second table:
3) The aggregation can be combined right into the SQL statement like this:
4) This alternative is a different way of combining the SQL statements using a
withclause, also known as a Common Table Expression (CTE)